Day 1 Notes
9:10 - Eastwood
JW – This shows me how much the previous roadmap didn’t lay out concrete enough goals to evaluate whether we achieved them
DM – The PTM stuff also was kinda interesting - I think we’d imagined having this as an FF development but really we can get this into people’s hands using technical developments.
MT – Would be good to grade ourselves on these things in a future session.
9:30 - Cole
MG – Is smee just doing steepest descent minimization?
DC – Adam(?) optimizer, stochastic optimizer. Steepest descent-ish.
MG – I think that adam optimizer picks batches of parameters to vary at random and search more efficiently. But does this present a problem wrt detemisticness/
DC – I think it’s not a problem, since all params start at 0 and the landscape is pretty convex. But I haven’t done enough runs to say this confidently.
MG – What do you mean by SPICE level of theory?
DC – I use it to refer to the level of theory used for SPICE dataset, I think wb97-something or other
DM – Did you get clear results for using sage vs. non-sage periodicities?
DC – Data was kinda jumbled in transition when JH left, need to double check.
MG – What makes this faster than FB?
DC – GPU vs. CPU, and torsion scanning vs. using single points. So the real comparison would be versus FB using ab initio
LW – I think CC switching to ab initio reduced training time from weeks to days
DM – What would we lose moving to smee?
TG – Fitting to frequencies
DC – I don’t think there’s any fundamental reason this couldn’t be added. Should be easier than adding new things to FB.
Smee doesn’t do torsiondrive fitting (need to treat as ab initio targets). Avoiding expensive torsiondrive fits
MS/TG/DM – Nudged elastic band style methods may allow getting around this computational bottleneck
MG – Purpose of doing minimizations isn’t to fine HIGH energy points, it’s to map out local minima
MS – But if you can get close enough to that, maybe you’ll get the minima right anyway
TG – Can limit the number of openmm steps in a minimization, and it turns out that you get most of the value in 200 steps, whereas the default limit will go for thousands.
MS+MG – Was nonbonded training done to condensed phase or gas phase dimers?
DC – Condensed phase.
DM – could be a topic for deep dive
LW – How many vdW parameters did you train? All C,N, O types?
DC – Started from Sage LJ types. Would need to double check whether all CNO vdW params were fit.
MS – Was water model being modified as well?
DC – Yes, otherwise TIP3P would have spoiled other aspects of optimization.
JE – From my perspective, the answer to “what do we need to see to justify switching to smee?” is “same performance, lighter maintenance burden”. I think some of the performance concerns are alleviated by your data, but we’d hoped that switching to smee would let us do larger datasets. Is that correct?
DC – Yes, I think the way that batching works will let us scale to much larger datasets.
JE – Speculation on maintenance?
JW – Could look at unique deps - if smee
MT – Also worth looking at contrast to ForceBalance - It has a PDB handler, its own molecule object, etc. So it’s worth appreciating where it’s helped us get to, but
MS – Does smee interface with evaluator for condensed phase stuff?
DC – I don’t think so, I think SB copied a bunch out of evaluator into smee.
TG – IIRC smee does condensed phase by hand directly with OpenMM
LW – I’d be keen to see how a smee-fit FF works on condensed phase calcs. And how well it scales up to datasets of more mixtures.
DC – Agree.
LW + MG – Would it be a useful conssitency check to do a toy opt with smee and FB, and look for big differences? I think both of these tools are trying to do the same thing, so it’ll be interesting to see whether they get to the same place
DC – I think they looked pretty similar in my limited experience, but torsion scans make it more complex.
MS – Would be good to validate on torsion scans.
TG – could do a fit with smee, and then validate using sage as a single point… (see recording ~74 mins)
MT – Could be good to hammer out what kinds of differences we’ll look at, and how big they have to be to represent a “big” problem. Also worth appreciating that doing the same FB run 3 years apart will give slightly different numerical results.
MS – That’s true, but there’s a distinction between exact reproducibility and good-enough reproducibility.
MG – more important to catch particular anomalies
DC – when we try to fix pathologies in parsley or sage, do we introduce new pathologies?
10:20 Mike Gilson
JC – Is there anything we could use that’s less sensitive to folding transitions that’s still experimentally motivated? This seems really sensitive to sampling, is there something else that could be used? Ex xtal structure sims to reproduce scattering data.
DM – We’d done some work on sims of xtal structures and it does seem to be relatively sensitive to FF. but it ends up being really expensive. It’s a matter of getting enough repeating units which can make for bug systems.
MG – My worry in a way, if we do xtals, is that we miss something important about globular protein dynamcis. It’s possible that matching to NMR is overkill but it helps ensure that we’re not omitting states.
JC -- But native contacts is just one coordinate, maybe there’s another that could be good.
MG – From these data and the correct weighting of conformatons, we’re measuing against NMR, so the choice of coordinate shouldn’t directly affect that.
…
MS – The native contacts fraciton is just being used to enhance sampling, not to the NMR measurements themselves.
MG – could do unrestrained fitting… we did this initially but the protein quickly unfolded.
DC – Could you look at the number of SMIRKS types used in ff14SB and Rosemary for torsions to determine whether we’re doing something comparable?
MG – I think CC looked into this and the number of types is comparable.
11:00 - Wang
MG – (slide 7) Define “equal weifhts”
LW – Default weights mean that each training target contributes normally according to our standard training method. Equal means that protein opts and TDs are equally weighted in the objective.
NAGL transition slide
JC – How are we planning to handle NAGL being its own charge method (even though it’s closer to OE AM1BCCELF10 than AT) instead of being Yet Another AM1BCC Provider?
(General) – it’s basically a generation change, and deprecating the old behavior
MS + MG – Would be good to punctuate/justify this change with a paper highlighting the point that conformation dependence is bad.
MG (slide 16) – Did we find the root of the problem with energy spikes
LW – Yes, was a bug in OpenMM barostat.
MG (slidde 17) – How significant are these differences relative to the reported uncertainties?
MS – Some are bigger.
LW – We’re getting tehse preictions from very low numbers of samples - 50ish - from 500 frames from 2ns sims
MG – 2ns is pretty short, that could explain large uncertainties
MG (slide 19) – Does this amount to longer production runs?
LW – Kinda, I set the sims to require more (500?) uncorrelated samples, so the sims on average ran longer. So two changes here - pre-equilibration was added, and sims could extend runs to reach desired uncertainty (which roughly double simulation length on average, but brought number of samples up to 500ish from 50 before)
MS (slide 20) – Might be best to use exptl viscosity to decide which compounds to discard (without waiting for sims to discover the compounds are hard)
LW – We don’t have measured viscosity data for all our compounds
JC – Could we monitor correlation time form sims to decide when to toss compounds?
LW + DM – Not exactly straightforward to decide what property to monitor correlation time for
MS – could kinda manually reject compounds based on expeirence.
JC – It might be cheap-ish to have a sanity check inside evaluator to check for this kind of problem in equilibration, and terminate+exclude this data before going to production. We’ve done this for HFEs where we filter out compounds that have long correlation times for torsions where the replicates don’t agree.
AF – Re nagl charges for proteins - How does NAGL enforce correct integral charge?
LW – There’s an equilibration layer there to ensure that the final charges add up
AF – So if terminal groups on a nagl-charged protein change, that could slightly change charges throughout?
LW – Yes
MT – What is the chain of blockers to a NAGL releasE?
LW – I need to refine evaluator protocols, then get those into a release, then do a refit, . Also SMIRNOFF spec change.
BM – For evaluator, if we had it discard viscous molecules, how would users who WANT to do viscous mols use evaluator for that?
JC – Could allow this in evaluator settings - something like a “compute budget”
MS – Could also document around this - Warn that viscous mols can’t necessarily be handled
MT – As evaluator owner - I understand our objective to be “get to a FF release” - so I’m in favor of the shortest path there, so if we simply document “this isn’t suitable for viscous mols” and toss those out of the training set, then we can move forward.
MG + MS – Agree
JC – JC’s solution (of adding an equilibration pre-screening step) could be better for other users, and could generalize and help us in other work.
MS – My experience with auto-judging data based on how things are statistically converging adds tons of edge cases. So I’d be in favor of offering complete information to the user about that’s happening, but not take any automatic action on it, instead make them manually confirm process changes.
MG – Might be good to look at two things that can be released - FF and software. This lets us release the FF, and not be blocked by the software.
LW – If we do manual screening, we’ll want to document and record how it was done in the publication. So we could publish the script/some background that told us to remove it, but the dataset itself for the final fit wouldn’t include it.
JW – Agree with MS - I’m in favor of loudly telling the user that a component is behaving badly, but NOT automatically remove it.
BM – Agree.
11:40 - Mobley
JW – on “same protonation state” plots - did OpenFE take the protonation state from “single protomer” Ross systems?
DM – Yes
MG: what do you mean by “doing something” with the charges?
DM: I think FEP+ does some charge correction and OpenFE doesn’t
MG: I think there’s a couple reasons charge changes are difficult. One is the overall change, but also that it would affect conformations and others
DM + JC – Agree
(“overall ddG bias” slide) JC: not clear if larger DDG bias is due to poor repeatability or just larger error
AF – On method - Looks like FEP+ is just doing one replicate - How are they calculating error bars?
DM – I think they’re doing statistical error whereas OpenFE is looking at variation across replicates
JC – Schrodinger doing a MBAR/cycle closure thing.
DM – OpenFE error would be expected to be larger since they’re looking at replicated
JC – “You can’t spell sulfonamide without F, U, and N!”
JW – Is there a strong takeaway from this data?
DM – I don’t think so. DHahn would probably say that a lot of the current issues are in system prep - sidechain protonation, other prep details. I think the big takeaway is seomthing like DHahn’s JCIM paper where he looks at significance-tested
JC – Ariana Clerkin’s effort to getehr more systems, and running at scale on F@H, should let us do this type of analysis more regularly
DM – Agree - Doing this as a regular benchmark will
DC – What’s the threshold for getting a FF benchmarked on this?
DM – Technical blockers?
…
JH (chat) – Can pass OFFXMLs to alchemiscale
(General) – We’re note exactly sure what the process is to start doing regular alchemiscale runs. We’ll need to come up with a process for doing alchemsicale submissions and managing compute.
MG – How do we plan to get over some of the known barriers to FE accuracy WRT prep? That seems like it’ll be the bottleneck to making FF accuracy be the limiitng thing in these calcs.
JC – …
MG – Right
JC – And this will heolp us reconsider whether to revisit host-guest systems. That could help get HFEs back on the table.
LW – Do you know what else OpenFE is planning to add to benchamrks?
DM – I said the previous stuff was in progress largely because it was just wraw data/not analyzed deeply.
JE – OpenFE plans to run schrodinger’s “stress test” set.
JC – Folks should join Ariana Clerkin’s PL dataset meetings if they’re interested. Forthcoming update to LiveCOMS best practices paper
DM – Also interested in coming up with a process/principled way to update PL benchmarking sets more broadly from the community - ex “here’s why we recommend changing these inputs, based on reproducible calculated results”.
PB – Is the OpenFE PLBenchmark repo doing this right now?
JC – Yeah, you can raise stuff like this in the issue tracker.
DM – Right, but OpenFE team doesn’t want to own responsibility for human curation.
JC – A lot of this can be automated.
2:00 pm
Discussion
JC: still many items left over from last year’s roadmap, as well as new exciting things. There’s a tendency to want to do everything but we should prioritise getting more done with less. Should work out where to focus to simplify everyone’s lives.
DM: I know FB is one of the big headaches of the infra team. Should try to get other people onto these tasks as science capacity is diminished
MS: CU students can also contribute on the science side
JW: also thinking of adding JMitchell time
DM: if we could relieve infra burden by going to NAGL and smee, that could free up some infra time to help with science maybe
JE: can pick up strategy discussion tomorrow. We could consider, for example, the benefits between spending a year overhauling infra to free up science advances after that, vs the risks of losing funding from doing so.
Water models – Barbara
MG (slide 4): clarifying whether wall is permeable
BM: yes
MG: what is the 1000 mols system size based on?
BM: based on looking for reasonable uncertainties and balancing with compute expense
MS: we did come to this conclusion before noticing the system-size bias. Interestingly 2k mols is only a 40% increase in compute on 1k mols.
MG: the trend is occurring only for mixtures?
MG – Occurring for isopropanol as well as water? Looks like only the mixture has the trend. The pure substances aren’t so problematic. I’ve seen octanol boxes forming structure, so might be box size effects.
BM – I can look into it.
MS – Can be funny equipartition issues when using rigid water with nonrigid other components.
JC – Could use MPS to do multiple replicates on GPUs? Not to be confused with MIG which is only on high end cards ex A100, but is similar in principle and is recommended to have multiple sims on a single GPU.
MS – Using A40s, doing OpenMM sims via evaluator.
MS – Wondering if we can get more GPU compute as well.
JC: Joung and Cheatham looked at HFEs – have you considered their approach?
BM – We’re trying to use osmotic coefficient since there’s so much exptl info. Compared to HFEs but 1) the exptl data ins’t that reliable and sims are expensive and 2) ion-water RDFs will be straightforward and there is exptl data for that.
MS – Merz has optimized a lot of ions to OPC, someone else to TIP3P-FB. So we could do validation on that. Also could look at coordination numbers. Thankfully it seems like mismatching ion-water FFs leads to clearly bad results so we’ll know if we’re doing it right.
JCl: one pitfall with HFEs and RDFs might be that FFs don’t get the correct chaotropic behaviour for particular transport properties. I tested one particular TIP4P-FB FF that was re-fit and it was not correct. Steve Ricks ( ? ) has a paper on scaled charges
MS: scaled charges are not generalizable
JCl: I think the chaotropic behaviour woudl be better treated by osmotic coefficients
BM – Yes, and also osmotic coefficients have a lot of exptl data, but since we’re developing ion FFs to work with mixtures (not just water), there isn’t data on HFEs for ion-organic mol mixtures, but there is osmotic data.
MS: e.g. there’s osmotic coefficients of ions in alcohol water mixtures
JCl: are there any other trade-offs between the flat-bottomed vs harmonic approaches?
BM – 1) we need to check wheteher it works with polyatomic salts, and 2) we could decide based on which is easiest to add to evaluator. Also will depend on how long it takes to validate.
MS: analysis of harmonic is quite finicky and requires manual tweaking somewhat often. I would worry about that in an optimization loop.
AF – Do you see beenfit to continuous distribution of concentration in harmonic potentials vs. flat bottom potentials for FF optimization?
BM: depends on what user is looking for.
MS: if you want it to match the experimental curve, you need a single simulation.
BM – If it works, can test multiple concs in a single run using harmonic.
Polarisability – Willa
JW: does AM1-BCC-dPol use different BCCs?
WW: yes
JW: and the dPols are taken by parameter-type?
WW: yes, per element
MS: did you look at multiple temperatures?
WW: in the final model yes, but not in the optimization
JC: if you reduced number of polarisable sites just to O, is it faster?
WW: no
JC: is that an implementation issue?
WW: it doesn’t skip the computation
JC: could be good to reach out to OpenMM developer to implement it in OpenMM itself and just skip computation if polarisability is 0
MS: or could just jump to 4-site in the future
DM + MS (slide 12) – The reviewer on your paper will give you trouble about not matching the RDF vs. expt extremely closely. In particular the first peak.
WW: TIP4P have good 2nd and 3rd peaks, but first peak is very high. (TIP3P performs poorly).
WW: literature search indicates rigid 12-6 water models have to sacrifice the first peak to get the second and third right.
MS – OPC3 might not be so good, but all the 4-point water models do a good job of getting the LOCATION of the peak right, but they still struggle with getting the HEIGHTS right.
MG/MS: publishing strategy discussion
MS: IMO you have to show you’re better than 4 points, as they’re only 10% slower
JW – Slide 14 - Can you arbitrarily change the Y axis to overlap lines better?
Auto typing – Trevor
Slide 5
TG: initialized values with Sage
TG: hoping dimer interaction energies gets me progress without expensive condensed phase simulations
MG: my impression is SAPT is not intended to give FF-suitable decomposition between electrostatics and vdW. Looks like a great proof of principle that the splitting works, but it may come down to what you fit to
JC – Motivation of SAPT is to try and match FF deconstruction
MG – That may be, but my experience with SAPT if that the contributions don’t look very much like our functional forms/what I expect from different class 1 FF components
JC: was the original idea from CBayly to look at gradients for parameter splitting
TG: issue with parameter search is that each FF candidate has its own fit, so need something that works quickly per iteration
JC: I was talking about gradients from condensed phase fits for clustering / splitting
TG: this is similar to chemper approach where you find the SMARTS to describe a split. This inverts it. It’s brute force-y but a feasible timeline vs possibly not getting a solution at all with chemper
MG – Another thing to fit to could be to use the DES set with fixed AM1BCC charges with different dimer distances/orientations and then subtract out the electrostatics and the remainder must be the LJ.
TG: that was suggested, but you can get really large errors in electrostatic energy, resulting in poor fits. Could maybe prune training dataset to accommodate.
MG: could you do the typing with QM data and optimize the parameter values against condensed phase?
TG – That could be done.
JW: do you think besmarts is at the point where we can ask for thousands of torsions and get Schrodinger-like quality?
TG: it may be slow. That many parameters would take a while to split.
JW: what’s the complexity of a fit?
TG: a simple search for a torsion generates ~50 candidates. If you have 1000 candidates, that’s 50k FFs to fit. If you have 1000 molecules to fit data to, that’s expensive. Custom FFs may be more tractable
JW: would it be easier to start from scratch or start from Sage?
TG: it’s complicated. Splitting periodicities is difficult; I brute force it.
Interoperability – Matt
Outline:
MS: is there a way to make developing custom valence forms easier? e.g. torsion-torsion
MT – My experience with supporting JOsh H’s, Adam, and other folks' custom nonbonded interactions was that things were way more complicated than I originally anticipated. My best idea here for how to improve is to improve documentation of specific things.
JC – SMIRNOFF spec in theory accomodates algebraic functions for valence terms. Could Interchange accept offxmls with this in it?
PB – Interchange uses OpenMM which can handle arbitrary algebraic functions.
JW: I think we don’t currently support custom functions in the FF definition in the expression tag
MT: Yes, I think our infrastructure would currently fail on that.
DM: PB has some use cases for this
JW: I wonder if we could just not validate but hand directly to OpenMM. Additional custom attributes could be difficult
MG – Value to parsers for common file formats? Like MOE outputs or other things?
MT – Very limited experience with this. But it’ll be.a cost/benefit tradeoff with our partners
JCl – I’ve never used OpenMM, but does it accept tabular potentials/could Interchange handle this?
MT – My experience with tabulated potentials si that they’re not well standardized/they are very engine specific. But maybe they’re not that high priority.
MS – What will be the highest value things we can add to interchange to expand out applicability?
JW – I think the from_openmm and combine stuff is HUGELY powerful for interop
(discussion on the optimal point of letting people do stupid/unvalidated thigns versus being more throough/in-house about it)
DM – for example we don’t want to invest a ton of energy into letting people use GAFF
MS – It’s possible that importing Amber-generated GAFF mols will be important for letting people do cool sims/benchmark.
MT: we’ve done a lot of work trying to balance a purist approach that no one can follow, vs ingesting everything and guessing how it’s meant to work.
MT – Also another thing - I don’t exactly care if interchange as a software tool eats the world (even though with a lot of work it could) - but the adoption that we really care about is adoption of our force fields, NOT our infrastructure. So I think we should keep this in mind as we do planning tomorrow.
JE – This is a great setup for tomorrows+wednesday’s session
PDB loader – Josh
Slides: https://docs.google.com/presentation/d/1SxftFOIz5NxSxaPfj5JZn5OK6SVU6K_b9qMDSWUlOUc/edit?usp=sharing
JE: the context behind this is that we were going to increase the amount of the PDB we can handle, but this also clears the way for our PTM workflow
JM: we are referring to PDBs, not mmCIFs – for our users, we need to support PDB. mmCIFs are on our roadmap.
DM: pharma users expect to be able to use PDB.
JW – PDBx/mmCIF support is concretely on the roadmap for phase 2
MS: would be interested in trying this when it’s released.
MT: speed-up is great (saving 5x time). I have concerns about scaling, especially if a lot of it is the RDKit conversion. Could we skip that?
JM – Current thinking on RDKit stuff is that we could just build it up as an RDKit molecule instead of converting from openff, which would save time. Also, hard to know whether the reason it’s slow is because of my current implementation or because of something inherently slow. Possible it’s inherently slow
JW: this is our first try, and JM already has a refactor in mind. I would expect nx gain in time.
MT – Re slide about adding custom residue - Is this what was referred to as “trivial to extend”?
JM – This is easy compared to existing implementation.
MS – Re mapping on connectivity - Will this use the previous substructure spec/work? TBernat would probably be very helpful with knowledge here.
JW - Yes, this will still support the existing spec, and looping in TBernat is a great idea.
General discussion
JW: what’s the best way to share constructive thoughts for tomorrow?
JE – Start dropping items on the day 2 notes GetReactingAtoms
MT – Has LJPME been deprioritized?
DM – My group had done a little testing with LJPME and we got really bad performance on FE calcs with GROMACS…
MT – My question was about doing a complete refit to LJPME.
DM – Thinking about this - if we did a full refit and said that people should use LJPME for everything, but then binding FE calcs got slower/worse, then it would be back
MS: seeing 20-30% cost
MT – That’s the cost, do we have a sense of the benefit?
JC – Would it make sense to do that at the next big generation update, when we’re refitting to condensed phase?
MS – Also, if we fit to get condensed phase props right, then we SHOULD get long range more accurate too.
…
DM – Dispersion correcitons work worse for binding FE calcs…
MG – To summarize, if you use disp corr during fitting, the you should get better binding fes using ljpme.
…
MS – So I’m saying we SHOULDN’T fit using LJPME until we have things like area per lipid involved in the optimization. It’s intended to make long range stuff better but we don’t have long-range interactions in our fitting yet.
JC:
1) you can get a per-heavy atom cost that makes a difference to free-energy calcs
2) also membranes
MS: if you have a vacuum interface the effect is substantial. Also membranes; hence the recommendation to at least 12 A cutoff
Update summary
https://docs.google.com/presentation/d/14WOK9178CMy8Z_DutvBHsqEYMYcLfxSLU1daCa-cobs/edit?usp=sharing