General FF fits with smee discussion

General FF fits with smee discussion

Participants

  • @Jennifer Clark

  • @Finlay Clark

  • @Lily Wang

  • @Daniel Cole

Goals

  • Get up to speed on current status of smee projects

  • Plan initial experiments and end points

  • What needs to be done, if anything, so we can rapidly experiment + benchmark + share results?

  • Who will do them, time frames, which meeting to present results? US + Aus + UK is challenging…

Discussion topics

Item

Notes

Item

Notes

Current takeaways from Finlay’s work on smee + bespoke and Jen’s work

  • Training to SPICE with linearized bonds and angles, and Sage torsions starting from 0, gets performance close to Sage, with a set of validated hyperparameters (learning rate and so on), in 48 GPU hours, or 6 hours with mini batching

    • There’s much interdependence between linearized harmonics, scaling, learning rate, etc. A rigorous comparison has not been done.

    •  

  • Automatically generating torsion SMARTS improves performance from Sage, although it results in an order of magnitude more parameters

  • Automatically generating torsion + angle SMARTS improves performance again, but adds substantially more parameters (depending on the specificity and features included)

    • Including ring membership in angle SMARTS seemed to be helpful for improving performance, especially on angle ICRMSD

  • Bespoke work suggests the need for optimized geometries (which Jen’s work is also currently backing up)

    • It seems to be helpful to include conformations which have relatively low QM energies but have large MM LJ clashes, likely so that the valence terms (torsions) can correct for this.

  • Finlay questions

    • JC: how does minibatching work again?

      • FC: it’s like batching but you basically take more steps. You batch on data rather than types. You randomize batches each time.

      • JC: that makes sense, GPU can’t really calculate all gradients at once.

  • Jen questions

    • DC: we saw a lot of issues with parameter drift over time, so angle and improper parameters would drift over time even though the energy loss was drifting

      • LW: were you regularising?

      • FC: Not in general. I played around with regularising impropers. We saw that the impropers are very tightly coupled to the angles, especially the eq values drifting off

    • FC: also, the linearisation is supposed to balance between the force constant and equilibrium values. On the impropers, they are consistently improved no matter what we do, even if we train with Sage types.

    • LW: some of the equilibrium values are going to 0 in Jen’s fits which is worrying

      • FC: we fixed this with some of the linearised harmonics

Aligning goals

  • OpenFF: Aim is to do the entire Sage 2.4.0 re-fit on smee

    • Ideally both vdW and valence parameters, but depends on pace of infrastructure development – may just refit valence parameters, so that should be the focus of this discussion

    • Requirements: non-negligible improvement on bulk benchmarks. Improvement in targeted areas.

  • Danny + Finlay?

    • DC: easy-to-use, flexible, fast general valence fit. At least scope to integrate non-standard functional forms down the line but not priority initially.

      • LW: do you need to update smee every time you experiment with a new functional form?

      • FC: depends on the function. UBs were easy, others needed updating in smee. But smee makes it easy to work with and Smirnoff-plugins were easy to use.

    • DC: worth mentioning Finlay also starting to look at protein FF with similar infrastructure, but we see that more as bespoke-fitting at the moment.

Open questions (not necessarily answered by this work)

  • Optimized geometry data – where do we get it?

    • ideally all our data is at the same level of theory (and ideally for evaluating workflows using current OpenFF benchmarks, it’s at the benchmark level of theory)

    • SPICE: 1+M conformations will be expensive to recalculate

      • Industry benchmark: 77k

    • Could we use the OMol25 dataset? They claim to have a) recomputed the entirety of each community source, which includes SPICE, and b) optimized the GEOM dataset. Does that give us enough coverage of the space? Not sure what elements are in GEOM

    • Note – no Hessians in OMol25, though

    • Experiment: combine SPICE data in OMol25 + a selection of the optimized GEOM data and see where that gets us?

      • LW: had a look at GEOM data in OMol25, looks like it covers all the Sage elements

      • DC: note dataset filtering is important to remove unconverged QM / very high energy points.

  • Can we extend the torsion specification approach to bonds and angles (and ideally cluster parameters afterwards?) (Jen’s project)

    • Is it practical to try to cluster the torsions?

    • (FC) It’s easy to extend to bonds + angles (and shouldn’t be too hard to extend to impropers). I’ve run some fits with fairly specific angles (and not very specific bonds) and torsions which improved performance on the industry benchmark again compared to just specific torsions.

      • Should we extend to impropers?

      • LW: IMO yes! Our impropers don’t do well (see Sage 2.0 paper, also someone gave a talk about this a couple years ago…)

  • How do we unify this approach with a workflow where we may want to keep some parameters frozen? i.e. particularly alkane and/or protein torsions

    • a) is this firstly necessary to keep alkane torsion profiles correct? How does existing smee-spice FF do?

      • Latest fits with more specific angle + torsion types generally did well on alkane torsions (FC)

    • b) Could we ensure we have fairly specific types for these cases then just exclude them from the relevant parameter config during training? We avoid training linear torsions currently (FC)

      • LW: works for proteins, may be challenging for alkane torsions

    • JC: if you defined alkane torsions as just [CX4][CX4][CX4][CX4] that’s quite generic, but Finlay’s recursive smirks also has specified hydrogens on them

      • LW: are the Hs in the central bond?

      • FC: least specific was just central bond, most specific had everything neighbouring all atoms). So most specific would distinguish between CX4H2 on atom 1 and CX4H1 on atom1. It does fine in the low energy areas for the alkane data, the energy barriers are a bit off.

      • DC: ideally you wouldn’t need a wizard to come in and define what things need to be kept constant between FFs. You could just have protein and lipid targets.

  • Can we improve on the current hierarchical approach for generating specific SMARTS patterns?

    • Continuous typing is likely the future, though seems sensible to have a shot with this approach given relatively low effort with SMEE.

    • Other training schemes are possible, for example hierarchical training where we initially train on all data with the least specific SMARTS, then repeat training with more specific SMARTS added and some regularisation on less specific SMARTS etc. May be over-complicated.

      • LW: is this for regularisation or to make smirks that smartly becomes more specific?

      • FC: the issue we were having were less specific smarts that didn’t have a lot of data. I wondered if I could initially train everything with non-specific types and regularise at each stage to the one before. This way we can avoid the smarts with very little coverage.

      • JC: I had the same thought that if you go too specific to start off with that’s pigeonholing yourself to what is represented in the dataset and you’ll do poorly in something that hasn’t seen before … on the “too many types” question, this may depend on specificity levels. So we have multiple specificity levels fit

      • FC: we could try just having two levels, one very specific and one not, see how many parameters we end up there

    • How many types are too many types

      • LW: good question… one specific concern with torsion types is ill-posing, e.g. torsion A and torsion B are always fit together when training so no unique solution, but can arise independently in test/real data.

    •  

  • Should we regularise?

    • Seems sensible for propers/ impropers – important for Espaloma/ ByteFF

    • If so, to what?

      • LW: did regularising to 0 yield good results?

        • FC: JH’s initial fits regularised to 0, possibly too strongly and performance suffered. I regularised impropers but that harmed test performance. What we did with the bespoke FF was mildly regularise the im/proper torsions, to the Sage values.

        • DC: where Sage was poor it did move nonetheless.

      • LW: not sure I have many ideas outside of 0 or some amalgamation of Espaloma torsions. Could try regularising to previous FFs but worry about drift

    • Open PR to add this to descent

        • LW: can look into this

  • Is altered functional form out of scope (I assume so)?

    • Mainly thinking about improving impropers to something harmonic

    • Currently improper force constants are badly behaved and strongly coupled to angles

    • LW: agree, Jessica Maat started looking at this before she left. Would say it’s out of scope for Sage 2.4 but looking into this would be great

    • image (2).png

       

  • Experimentation with other parts of the process?

    • Other optimizers? Simulated annealing / basin hopping?

    • FC: played a bit with L-M optimizer, we got similarish results to Adam. The parameters look a bit different. This was with bespoke valence training.

  • Training to dimer data?

    • Has not yet resulted in FF perofrming better than Sage

Experiments

  • Q1: does optimizing to optimized-geometry data and torsiondrive data improve fits? (LW has started this – taking over Jeff’s BTS project)

    • Using QCA data at OpenFF level of theory, also asking if just minima is required or trajectories assist.

  • Q2: does optimizing to the SPICE data in OMol25 + the GEOM data in OMol25 improve fits? – JC to start

  • Q3: Restricting the specification approach to just bonds and angles?

    • FC volunteered to look at. What features do we use? Symmetry (e.g. carboxylates)? Hyperspecified types are more prone to that. We need to do some work on making sure we don’t mess up symmetries.

    • JC: the FF I’m going to send you won’t have this issue because bond types are generalized.

    • LW: NAGL solves this by averaging all resonance forms, could maybe use this to validate or diagnose symmetry issues

  • Q4: Using two layers of specificity – one general, one very specific, seeing if that improves outliers

  • Q5: The regularisation

  • Should make sure everyone’s using the same mini-batching + learning rates

  • Q7: comparison of linearised harmonics to non-linearised harmonics, nailing down some of the hyperparameters

    • FC Workflow:

    • FC: I think we have a good solution with this now for bespoke fitting.

    • DC: from memory it was a game changer bringing this in.

    • LW: agree, would be useful to compare minibatch + normal vs minibatch + linearised and identify how much improvement came from there.

  • Meetings – keep in Newcastle for now

  • Work

    •  

Action items

JC: send FF to FC. Will specify all hyperparameters (learning rate, no mini batching, scaling, etc … )

(This is actually Sage 2.3 despite the name)

Decisions