Driver
Approver
Contributors
Stakeholder
Objective | Derive force field parameters for proteins consistent with the OpenFF small molecule force field. | ||||||
Primary Driver | |||||||
Approvers | |||||||
Supporting Drivers | |||||||
Stakeholders (these people will be tagged in project update notifications on Slack) | |||||||
Project Manager | |||||||
Page Owner (only this person can edit this page) | |||||||
Decision authority | Unanimity of Primary Driver and all Approvers (absences are vetos), only in “Biopolymer FF call” meetings Veto authority: Primary Driver, any Approver | ||||||
Discussion/notification venue | Fortnightly “Biopolymer FF call” meetings (decision forum). NOT “FF release call” meetings. #ff-biopolymers channel on OpenFF slack and “FF release call” meetings (notification and discussion, no major decisions allowed here. It is not assumed that meeting attendees have read the slack discussions, they must be summarized during meetings to be considered in decisions) | ||||||
Meeting notes | |||||||
Due date | 2022-01-01 | ||||||
Key outcomes |
| ||||||
Status |
|
SMIRNOFF format
We need SMARTS strings that can specify protein-specific terms for general amino acids. To summarize the discussion here: https://openforcefield.atlassian.net/wiki/pages/createpage.action?spaceKey=MEET&title=2020-04-01 AMBER FF porting meeting notes01%20AMBER%20FF%20porting%20meeting%20notes
Amber ff14SB was ported to SMIRNOFF format by using SMARTS strings that capture an entire amino acid, differentiating between main chain and terminal residues and between protonation/tautomeric states
Previous approach is not extensible for modified or synthetic amino acids
Need general SMARTS strings for backbone and side chain torsions in polypeptide chains
[#6X3](=O)-[#7X3:1]-[#6X4:2]-[#6X3:3](=O)-[#7X3H1:4]-[#6X4]
will tag ψ for all residues except proline
...
Needs to be determined. In particular, did Parsley train on dipeptides or tripeptides for any of the 20 canonical amino acids?
Dataset name | Dataset type | QC method | Molecules | QCA submission |
---|---|---|---|---|
Dipeptides | ||||
Tripeptides |
Protein-specific datasets
Cerutti tetrapeptides are a set of 185 tetrapeptides with sidechains X-Y-X
, X
in [Ala, Gly, Ser, Val]
, and Y
in [Ala, Arg, Ash, Asn, Asp, Cys, Glh, Gln, Glu, Gly, Hid, Hie]
excluding (X == Ser && Y == Glu)
. David Cerutti selected multiple conformers for each tetrapeptide.
Dataset name | Dataset type | QC method | Molecules | QCA submission | Status | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
OpenFF Protein Fragments Initial | Optimization | B3LYP-D3BJ/def2-TVPP | 16 tetrapeptides with sidechains X-Ala-X and X in [Ala, Gly, Ser, Val] |
| |||||||
| Optimization | B3LYP-D3BJ/DZVP | Cerutti tetrapeptides with constraints to avoid hydrogen bonds |
| |||||||
OpenFF Protein Peptide Fragments constrained v1.0 | Optimization | B3LYP-D3BJ/DZVP | Cerutti tetrapeptides with constraints to avoid hydrogen bonds |
| |||||||
OpenFF Protein Peptide Fragments unconstrained v1.0 | Optimization | B3LYP-D3BJ/DZVP | Cerutti tetrapeptides with no constraints |
| |||||||
OpenFF Protein Fragments TorsionDrives v1.0 | TorsionDrives on ϕ, ψ, ω, χ1, and χ2 | B3LYP-D3BJ/DZVP | Cerutti tetrapeptides |
(22 / 845 errored) | |||||||
OpenFF PEPCONF OptimizationDataset v1.0 | Optimization | B3LYP-D3BJ/DZVP |
(6000 / 7560 errored) | ||||||||
OpenFF Benchmark Ligands | TorsionDrive | B3LYP-D3BJ/DZVP |
|
Model
We envision several tiers of models, presented below in order of increasing anticipated effort. We will generate and benchmark lower-effort models first and use the results of the benchmarks to inform decisions about higher-effort models.
...
Copy library charges from existing protein force field: Amber ff14SB (RESP, unchanged from Amber ff99) or Amber ff15ipq (IPolQ)
Pro: easy to implement
Pro: we know these work pretty well in the Amber context
Con: we are no longer in the Amber context
Generate library charges for the 20 canonical amino acids (main chain and terminal) using AM1-BCC (RESP2)
Pro: consistent with Parsley; for example, we want the parameters of a serine side chain to look a lot like those of ethanol, since we have reason to believe these parameters play well with the other parameters in the FF
Pro: Lily Wang has evidence that AM1-BCC charges of fragments are similar (< 0.1 e) to the charges from a larger polymer (see https://zenodo.org/record/4977401#.YNuCk34pCpp )
Con: more effort to generate
Generate charges on-the-fly using graph convolutional networks (see https://openforcefield.atlassian.net/wiki/pages/createpage.action?spaceKey=MEET&title=2020-04-01 AMBER FF porting meeting notes01%20AMBER%20FF%20porting%20meeting%20notes)
Maybe don’t take this approach unless/until it is also being used for the small molecules
...
A major decision for this model is which sidechains should have unique torsions that override the general peptide backbone torsion. We envision using the Protein Fragments Optimization and TorsionDrive datasets as the primary training data. Then, other datasets such as PEPCONF can be used as validation data to make decisions about the model. Alternatively, automated chemical perception (Chemical perceptionPerception) may be used to identify dihedrals that are not described well in the Parsley training set. The resulting model will likely be the candidate for the first protein force field release.
...
Experimental datasets are being curated to evaluate protein force fields. These datasets will be published as a LiveCoMS review, described here: /wiki/spaces/COMMS/pages/1927413777. It will be useful to identify a small number of key benchmarks that can interrogate distinct physical properties of proteins and that can be completed relatively quickly (~1 month). These key benchmarks will be used to evaluate force field models and make decisions about more complex models.
Milestones and deadlines
Milestone | Owner | Deadline | Status | Notes | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Generate Null Model with Amber library charges | 2021-07-09 |
| ||||||||
Generate Null Model with AM1-BCC library charges | 2021-08-01 |
| Waiting on infrastructure for getting polymer charges from fragments | |||||||
Choose key benchmarks to quickly evaluate force field models | Biopolymer FF group | 2021-08-01 |
| In parallel with LiveCoMS review | ||||||
Run key benchmarks for Null Models with library charges | Maybe others | 2021-10-01 |
| |||||||
Decide on QC data for PST Model | Biopolymer FF group | 2021-08-01 |
| Started by Dave Cerutti | ||||||
Run QC calculations for PST Model | 2021-10-01 |
| Started by David Dotson and Trevor Gokey | |||||||
Fit PST Model with one general term for all sidechains | 2021-11-01 |
| ||||||||
Decide on sidechain-specific terms for PST Model | Biopolymer FF group | 2021-11-01 |
| |||||||
Fit PST Model with sidechain-specific terms | 2022-01-01 |
| ||||||||
Run key benchmarks for PST Models | Maybe others | 2022-03-01 |
| |||||||
Generate charges using graph convolutional networks | 2022 |
| Need update on feasibility from Chodera group | |||||||
Fit CMAP model, if necessary | 2022 |
| Waiting on CMAP infrastructure | |||||||
Fit PSTLJ model, if necessary | 2022 |
| Manual or automated LJ typing |