2020-04-01 AMBER FF porting meeting notes

Date

Apr 1, 2020

Participants

  • @Jeffrey Wagner

  • @John Chodera

  • @David Mobley

  • @David Cerutti (Deactivated)

  • @Christopher Bayly

Goals

  •  

Discussion topics

Item

Notes

Item

Notes

Design ideas

  • DLM – JC has the idea of making a SMARTS for each whole amino acid. CIB points out that it won’t give us a generic backbone that we can graft other sidechains onto. First goal is probably to bring in faithful reproduction of AMBER in however we can, then look into extensibility

Bayly Philosophy / Proposition for protein FF starting point

 

CIB – At last OpenFF meeting, I talked about doing transferrable backbone parameters.

Image above is an instance of overall idea. First line Identifies general amide bond. Second shows whole backbone without any substituents. Third is the same thing, starting at a different point. Fourth is an example of selecting only psi.

Philosophy: Try, and at every stey of the way, take advantage of the small molecule FF. Ensure that they’re as portable as possible into biopolymer FF. Envision process as turning small mol FF INTO a biopolymer FF. Start with backbone, then start fitting chi1s. With the chi1, there’s one central bond, but two torsions (where bond branches UPSTREAM into phi, and DOWNSTREAM into psi). These 4 parameters may do most of the work of getting biopolymer parameters into our FF. chi2 and higher often seem overfit, and should be covered by small molecule/transferrable parameters.

Bottom line is a carefully-crafted SMARTS that should select protein backbones very specifically. The H1 in the last tagged atom is important for excluding backbone substitutions – Peptiods and prolines are deliberately excluded by that H1.

Summary: Trying to take protein FF at every step of the way, and having it contribute the peptide-specific torsions and the two torsions that contribute to chi1, and chi2+ only if needed.

DC –

Chodera – roadmaps

Short term = Amber ff14SB

  • Here it’s best to write entire residue SMARTS, tagging the appropriate atoms for each parameter. Simplest way to get numeric values is to write SMARTS patterns for each parameter and automating the matching of those from an AMBER parameterization pipeline, facilitated by ParmEd. In this case, the AMBER FF will be appended to the Parsley FF, and thus any complete residue matches will automatically be applied during the parameterization process.

Medium term = use SMIRNOFF types for everything, combine everything into main FF line

  • Get charges from graph-based methods, or fragmentation and charge generation. Make targeted improvements per CB’s suggestion above, based on QM, as well as other physical data we can integrate into off-evaluator (formerly “propertyestimator”). Also automated type(/SMARTS) refinement.

DLM – Does medium term roadmap include adding specific peptide backbone terms?

Discussion

DC – How broad do we need to make the dataset to determine these parameters? It seems that backbone charges are transferrable between neutral AAs, but positive and negative charged AAs needed different ones.

JDC – In the short-term, we’ll use library charges. But in the long term, we’ll calculate charges on the fly using an efficient and accurate method (details TBD, possibly graph or fragmentation-based). This will also unlock unnatural AAs.

CIB – Graph conv. charges are OK by me. But when Wendy did RESP for AMBER charges were only applicable to same AA as was calculated. This means that we screwed with the peptide bond a bit, since electrons can be shared over peptide bond. To capture this in library charges, we’d get a combinatorial explosion of trying to include upstream and downstream AAs. But I looked into angle-based corrections, which may handle this gracefully.

JDC – Graph conv model would be able to model charge flow over bonds. Also, charge increments can also provide some flexibility on this front.

CIB – JDC’s plan is good operationally, since it will get us a SMIRNOFF AMBER FF soon. Including carefully-crafted backbone SMARTS will keep us from getting in a “land war”, with protein parameters promiscuously being applied to non-proteins.

JDC – We can start working on gathering QM and physical data for proteins.

DLM – We covered this last week. There, we agreed that we’d start with John’s first step. We also decided to talk with Carlos and Lee Ping to discuss level of theory. We’ll likely need to regenerate data so that it lives in QCA.

JDC – Maybe we should kick off the QC calculations first, since they’ll take a while.

DLM + JW – Agreed. There’s lots of computing power available for use on QCArchive right now.

JDC – Could also use Josh Horton’s QCSubmit package, which is approaching readiness. Ligand Expo might also be good, doing some analysis to find polymeric chunks

DC – Forcebalance performance is somewhat concerning to me – Fitting to 10 torsiondrives took a long time.

DLM – We’re planning to phase out FB in favor of a more modern tool. May need to eb parallelized

CIB – SMARTS matching to proteins is slow. Could divide protein into subgraphs based on residue name, and then validate residues.

JW – Perceiving residues by splitting at peptide bonds might also provide performance improvement. This would also be toolkit-independent.

CIB – First step involves converting AMBER .lib files into

DC – Writing the SMARTS should be pretty straightforward. Was thinking about SMARTS matching in mdgx. Is there some sense or arbitrariness in how SMARTS are written?

CIB – Yes, you can choose to express the same SMARTS in multiple ways. It’s possible to canonicalize SMILES (and therefore SMARTS), but each toolkit does it slightly differently.

DC – Concerned about standardizing SMARTS strings, it would be nice if we had an automated way to generate standard representation.

JDC – All SMILES are also SMARTS, so one process to get SMARTS for each residue would be to generate it with caps (using eg. tleap), delete the caps, and get the SMILES of the remaining AA. There’s some question about how to handle the needed “wildcard” at the edges of the AA, but the central part should be straightforward. This automation would be reusable for modified AAs and peptoids in the future

CIB – It’s important to add explicit hydrogens, to keep the matching specific.

CIB – We all agree about the initial ff14SB porting. But we can make preparations now to make the second step easier (merging parameters into Parsley FF). So let’s know which parameters we’ll add directly from AMBER ff, and which parameters we want to have remain general (current Parsley types)

DC – ff14SB has 900 parameters. How do we want to transfer those?

JDC – Could automate both processes. First, generate a script to generate all capped peptides using tleap. Then, load into OETK, chop off caps, and acquire the SMILES, keeping atom index information. Then, match parameters by comparison between systems. (algorithm suggestion, may not be correctly transcribed)

DLM – In other words, this process can be entirely automated. Tagged SMILES can be generated by above method in OETK. Then, extract parameters from parameterized AMBER system, and find which parameters correspond to which atom indices. (algorithm suggestion, may not be correctly transcribed)

CIB – (algorithm suggestion, will post useful examples in #smirnoff channel)

Action items

@David Cerutti (Deactivated) will push to organize meeting to plan peptide enumeration and submission to QCArchive

Decisions