2022-03-24 Protein FF meeting note

Participants

  • @David Mobley

  • @Michael Gilson

  • @Pavan Behara

  • @Jeffrey Wagner

  • @Simon Boothroyd

  • @Chapin Cavender

  • @Diego Nolasco (Deactivated)

Goals

  • Update on project management

    • Formal decision-making process

    • Add Michael Shirts as an approver

  • Plans for protein FF benchmark

    • Small molecules

    • Proteins

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Project management

 

Review of fitting strategy

@Chapin Cavender

  • Charges

    • ELF-10

    • Library charges for water (TIP3P) and 26 canonical amino acids (AM1BCC-ELF10)

  • Lennard-Jones

    • Copy from Sage

  • Valence and torsions

    • Fit Sage types (null model) or new protein-specific torsions

    • Target Sage QC training dataset and new QC datasets

      • Optimization dataset for capped 1-mers

      • TorsionDrives on (phi, psi) and (chi1, chi2) for capped 1-mers

    • Fit torsions/valence simultaneously or sequentially

Protein FF benchmark

@Chapin Cavender

  • Validation dataset

    • Used to choose between models (null vs protein-specific torsions)

    • Beauchamp/Pande dataset

      • 32 small peptides (2 to 5 residues)

      • Chemical shifts and scalar couplings

      • 500 ns trajectories

    • MG – This sounds like a good plan. If there’s no accuracy difference between null and protein-specific torsions, what do we do?

      • CC – Could bring in folded proteins from other datasets to gather more data.

      • DM – Agree, there’s a big scientific question here. If it looks like we DO need protein-specific torsions, then we don’t need to prove anything new. But if we DON’T need protein-specific torsions, I’d need more evidence to believe the results.

      • MG – It would be nice if we didn’t need protein-specific torsions. That would be a simpler problem to solve overall

      • SB – “protein specific torsions” = just backbone? just sidechain?

      • CC – I was thinking “backbones and sidechains”, but we could split this into more models. Like “backbone and just the first torsion in the sidechain”.

      • SB – From what I recall, you had talked about doing “just backbone” and then maybe experiment with adding a bit more specificity

      • SB – Do you want to gather full feedback on how many torsions to add from the coauthors on the paper? They may have good feedback.

      • CC – I think it’s very unlikely that I’d get fewer answers than coauthors by asking that question.

      • DM – You could ask it, and then just summarize their responses and pick one or a few answers (which may just be the original plan)

      • MG – This would head off problems down the line.

  • Sage benchmarks

    • CC - If we’re not planning to refit the LJ terms, then there’s no need to rerun these.

    • MG – If we’re not changing the fitting method, and we wouldn’t be …

    • CC – Agree

    • SB – For the null model, I’m imagining some mix of data including torsiondrives, and some weighting of the fiitting targets, and it would be possible that the target weights would lead to a substantially different fit. So I’d recommend doing at least the QC benchmark, and optimally also the phys prop.

      • CC – Agree

    • DM (chat) – agree, yeah, the physical properties shouldn’t change significantly (e.g. we’ve looked before at how valence parameter changes affect solvation free energies, and it’s not a statistically significant amount).

      • DM (chat) – (Because the molecules are usually small and not very flexible so the valence parameters CAN’T impact them much)

      • MG – My expectation is that it won’t matter too much.

      • SB – For Sage, when we did the benchmarks, we did it after refitting LJ, and not torsions. OMadin is running the benchmarks again after refitting both, so we’ll soon know how big of an effect that we see.

      • CC – I’d expet the change to be relatively small, since all we’re doing is adding a comparativly small number of peptide datasets.

    • CC – So, consensus seems to be that the QC dataset is necessary to benchmark against, and that the physical property benchmarks are NOT required (unless OMadin’s study shows a significant difference as a result of refitting torsions)

    • QC test dataset

    • Physical property test dataset - solvation and transfer free energies

  • Test dataset

    • Used to demonstrate model performance

    • Continuous trajectories in triplicate

    • Prioritize diversity of protein systems or diversity of force fields tested?

      • MG – There’s again a question of “what’s the role of these studies if we’re not releasing this FF?”. We can make some assumption that we’re releasing something LIKE this FF, but we can’t guarantee its similarity to a full release.

      • CC – I kinda view this as “achieving the NEED” goals on this project is sufficient for this proof of concept study (to demonstate teh accuracy of the protein parameters), whereas the “WANT goals” are the requirements for the release candidate (like vsites, fi that’s decided)

      • MG – I would use different words than “WANT” and “NEED”, since those will confuse people in the future.

      • JW – We could make the results of this study available in a conda package, we’ll just need to be careful about what we call it (so, we can’t call it openff-3.0.0), but I’d be happy for us to host it in a user-loadable format.

      • SB – Agree. And we can tune how much we advertise this.

      • (General) – Depending on how things look when the proof-of-concept FF after this project, (and the progress on vsite/other scientific developments) we can decide to release Rosemary either with or without vsites. This will need to be evaluated in the future.

      • MG – Will PTMs be handled by rosemary?

        • JW – That depends on whether graph charges are ready when rosemary is released. I don’t plan to make an alternative.

        • SB – Graph charges are in progress. I’m preparing a comprehensive training and test set which should be ready soon. That will unblock work but I can't estimate how long it will take after I leave.

        • MG – One thing I’ll need to see to believe the GNN is that symmetric atoms should be assigned identical charges. Like, the Os on a carboxylate should get identical charges.

        • SB – Agree. this is a huge pain - I’ve been dealing with the symmetry problem with RESP, and it’s a really hard problem. So I’m working on continuous feature definitions that account for resonance, and I have carboxylates/sulfonamides exactly in mind. But large/extended resonant systems are really difficult and I’m tying to ensure they’re in the training set. But let’s also be realistic - AM1BCC implementations can miss symmetry too, or assign different BCCs based on the particular input resonance form.

        • MG – That’s right, and really, the benchmarks should show if we’ve broken something or done an unacceptably poor job.

        • SB – Agree. The benchmarks will show us what’s going on.

        • JW – There are BCC assignment differences that I see between AmberTools and OE, so there are already issues like that in place.

      • CC – WRT PTMs, we should probably have PTMs as a specific benchmark if we are going to advertise them.

        • SB – Will the LiveCOMS review cover PTMs?

        • CC – No. But LChong and VVoelz have been working on datasets with peptide mimetics that we could test against. So these will give us “small peptide” targets to test against, but I’d really like to also have pre- and post-phosphorylation proteins to show accuracy on a larger scale.

        • (Decision) – PTM fitting targets/benchmarks will NOT be included in the initial phase of this project. Once the infrastructure to assign charges to PTMs is available, this decision may be revisited.

          • (MG and CC agree, CC will summarize this on the project page)

    • Force fields and water models

      • Need

        • OpenFF/TIP3P

        • Amber ff14SB/TIP3P

      • Want

        • Amber ff19SB/OPC

        • CHARMM36m/TIP3P

        • a99SB-disp/TIP4P-D

        • OpenFF/OPC

    • Need: Robustelli/Shaw a99SB-disp dataset

      • 4 folded proteins (largest 120 residues)

        • Scalar couplings

        • 10 μs trajectories

      • 10 disordered proteins (largest 140 residues)

        • Chemical shifts and scalar couplings

        • 30 μs trajectories

    • Want: Mao/Montelione dataset

      • 41 folded proteins

      • Chemical shifts and NOEs

      • 10 μs trajectories

    • Aggregate sampling

      • Need: 700 μs

      • Need + protein want: 1.5 ms

      • Need + FF want: 2 ms

      • Need + protein want + FF want: 4.5 ms

  • PB – Compute requirements?

    • (General) – We were invited by JC to run this on F@H. There are concerns about whether we can really get the access/submission priority we need, and get CONTINUOUS trajectories on F@H. Or whether we can do short trajectories seeded by enhanced sampling and stitch them together. If we DON’T do continuous trajectories (and instead do a large number of short trajectories), there will be another burden of proving that several single trajectories are equivalent to a long trajectory. It is claimed that some groups already on F@H have software available for doing enhanced sampling.

    • (General) – We don’t know whether lots-of-short-runs or a-few-long-runs are more accurate (either in terms of what WE think, or what REVIEWERS think)

    • (General) – We need to figure out whether F@H will give us submission priority and GPU runners sufficient to do long trajectories as soon as possible, so that we can line up alternatives if needed.

    • PB (in chat) – if it doesn't work with F@H alternative would be making your benchmark scripts more plug and play, that would help in recruiting people to run subsets of the larger set on their hpc clusters, like lilac, tscc, boulder, some on Vytas' or Bert's clusters

    •  

Action items

Decisions