| Exploring regularization during parameter search in alkane torsion fitting and CHO bond/angles | @Trevor Gokey | (TG presents slides) Slide 3 Slide 5 Slide 7 CBy – A few things: Slide 3: The problem that you seem to be talkng about seems to be identical in form to a showstopping problem I had when I was coming up with RESP. (in my 1993 RESP paper). It seems like you’re trying to mitigiate the same issue as me, except instead of charges, it’s other parameters. If I didn’t introduce a restraint/regularization, then that was my starting point and there was a showstopping instability. So I added a harmonic restraint. But what I found is that, for charges, there’s no good canonical, knowable reference value for the harmonic restraint. Now I’d use AM1BCC charges to fit it. But instead I used a hyperbolic restraint. There, by squaring the (something), I avoided the need to know the restraint value beforehand… So I’m suggesting that you don’t do a type 1 regularization (harmonic restraint), but instead do a hyperbolic restraint – In other words in this equation, don’t square the final term. Then you could set the reference for bonds to be 0, the reference for sp3 carbon angles to be 109.5, etc. DM – Big point seems to be that “ideally we want to add a significant penalty if we move the parameter a little, but it doesn’t grow exponentially for larger displacements” TG – LPW has implemented a L1 regularizer JW – I think what CBy is describing is an L1 regularizer CBy – It’s not exactly L1, since there’s a smoothing function applied around 0. CBy – This would be an important thing to test before applying it to a larger parameterization. CC – These types of regularization also correspond to different prior distributions for parameters. L2 norm (what Trevor is using) is a Gaussian prior. L1 norm (absolute value) is a Laplacian prior. Hyperbolic norm (Chris's suggestion and used in RESP) is Jeffrey's prior. CBy – When you change to the L1, you need to change your prefactor (omega_reg). You’ll need to do some experiments to “tune” this to a reasonable value. A good experiment may be trying to recover C-C or C#C bond force constants. I’d see how low/high I could set that restraint, even when the real value is far from the initial value. It’ll be hard to do this right, but you can make it “win” by carefully choosing the value.
Slide 11 CBy – Looking at the Y axis – How do I normalize this? For splits 5 and 7, are you normalizing for the number of 4-membered rings? TG – There’s no regularization or optimization in this case – This is just taking initial values (from mod seminario?) CBy – Then what’s the conclusion? TG – The first three parameters are necessary to get Sage-level performance. Also that there were a bunch of splits on 4-membered rings, but my test set didn’t have any 4 membered rings, however these improved performance by REMOVING some values from the parameter training.
CBy – Re: Orange bar - Are we really better than that? What’s the justification here? TG – I think we’re better than the orange bar because that FF had a bunch of unphysical values. CBy – Could be good to include the unphysical values from the orange FF on this slide to illustrate this point
(Summary slide) CBy – I like point 4 DM – Maybe could use components from bespokefit as initial guesses? CBy – (kinda for fun) – I was complaining so much about getting the internal angle of a cyclopropyl (where any value will give you a 60 degree angle), and see whether this would suggest pruning it.
CC – Any penalty for total number of parameters? TG – Yes, we penalize for total number of parameters, parameter complexity (number of bits), and number of torsion periodicities.
TG – Question: Is it sane to want to restraint all torsion ks to 0? CBy – Yes JW – An L1 regularizer will help identify “nonsenseical” torsions phases by setting their ks to 0. This works because the penalty applies immediately when the k deviates from 0, whereas an L2 (quadratic) regualrizer gives it a near-0 derivative around 0, so some ks will become nonzero due to noise.
TG – Should vdW well depths also be restrained to 0? CC – Important to note that we have both a global regaularization weight, as well as a prior width for each type of parameter. So the regularization weight in general is separate from the specific prior widths. That’ll be good to keep in mind while determining these parameters. JW – In a future meeting, I’d love to see the time interval between splits. You’d mentioned that these should become shorter and it’d be cool to see the details. Both because it’s cool and for project planning.
|