2022-05-19 Protein FF meeting note

Participants

@Chapin Cavender
@Diego Nolasco (Deactivated)
@Michael Gilson
@Michael Shirts
@Daniel Cole
@Pavan Behara
@Jeffrey Wagner

Goals

Nucleic acid support
Protein QC datasets

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Nucleic acid support	@Chapin Cavender	Janssen RNA team working on modified antisense oligonucleotides reached out about OpenFF nucleic acid support Chapin is meeting with the Janssen team on Monday, May 23 OpenFF is prioritizing support for proteins due to direction from governing and advisory boards An OpenFF force field release supporting proteins will not happen before 2023 OpenFF is committed to supporting canonical nucleic acids, but the timeline for this is not clear MS – Agree that proteins should come first. Seeing more funding opportunities for NAs. So this could be the right time to start thinking about things like making a community consensus on benchmarking. I imagine that learnings/techniques from protein FF project will extend to NAs. CC – Agree. I imagine that protein sidechains are trickier, whereas NA backbones are trickier. MS – Re “canonical NAs” - We should say that we will support those first, but will also provide tools to handle noncanonical residues. So, the goal (but without a timeline) is that noncanonical NAs will be easy to handle. Avenues for nucleic acid support Parametrization of nucleic acids with Sage/Rosemary plus graph charges, which is unlikely to be competitive with contemporary RNA force fields like Amber OL3 Nucleic acid specific torsions for canonical nucleic acids Bespoke torsions (or infrastructure to support fitting bespoke torsions) for covalent nucleic acid modifications MS – There may be an intermediate level, where once we have generalizable canonical NA torsions, then noncanonical NAs should be trivially covered by FF combination with the small molecule FF. MS – So, the to-do in the next few months is to figure out the process for coming up with data/process to validate NA FF. This would be a good followup to the protein FF paper, could bring in Tom Cheatham as a coauthor. CC – So, when I talk with Janssen on Monday, I’ll say that we don’t support this yet and they should use AMBER. MS – Could also say that, with additional support, we could do this faster. CC – I’ll mention that with a donation of GPU compute we can likely accelerate protein FF timeline. MS – I may have some spare PhD student-time that could contribute here to biopolymer work. Could be available in the coming Fall/Winter.
Protein QC datasets	@Chapin Cavender	Status of protein QC datasets 2-D TorsionDrives on (phi, psi) for capped 1-mers 25 / 26 complete 2-D TorsionDrives on (chi1, chi2) for 23 capped 1-mers with backbone constrained to alpha helix or beta sheet 44 / 46 complete Three TorsionDrives (CYX backbone, GLU sidechain beta, LYS sidechain alpha) are failing on a small number of grid points (<1 %) due to geometry convergence. Should these still be used in parameter training? CC – We do have data for MOST grid points on the scans, but since the torsiondrive hasn’t completed we don’t know that they’re the “minimum energy” confs for the grid points MG – I’m in favor of moving ahead JW – Same here. They seem to be really strained/clashy confs MS – Do we need these data points? CC – Basically there are some torsiondrives that would be missing a few grid points. MS – What did Lee Ping Wang say? CC – He said that the failures seemed valid - The force gradients were really high. MG – Chapin showed me some ramachandran plots what showed the regions where the incomplete grid points are MS – Are they points that SHOULD (ramachandran-wise) be high energy? CC – Yes MG – A good course of action could be, basically, “move forward and document what we saw” MS + CC + JW – Agree Decision: We’ll move ahead assuming these torsiondrives are complete and are in an appropriate state to begin fitting. CC will also ask LPW to weigh in on this decision, and we can go back if LPW says that it’s necessary to push the torsiondrives to become more complete. Additional QC datasets Optimization dataset for 26 capped 1-mers TorsionDrives on (phi, psi) for 52 capped 3-mers 26 Ace-Ala-X-Ala-Nme 26 Ace-Val-X-Val-Nme Optimization dataset for 52 capped 3-mers
Protein FF WBS		OpenFF_WBSes MS – We want to make sure that our benchmarks meet or exceed the level of scrutiny that other published FFs have met. MG – It’d be helpful to have a split between “benchmarking against other protein FFs”, but there could be a “want”…? Basically, we should agree on a subset of the systems+measurements from the LiveCOMS paper CC – Our current planned set of systems+measurements is the amber99sb-disp set of proteins by paul robustelli. JW – The yellow branch (xtal) here could be run in parallel with the “NMR observables” branch. So if MS has a PhD student interested in biopolymers then they could work on that. MS – I’ll have that conversation with my student and we can follow up on this in the future MS – PAper timeline? CC – Depends heavily on when I can wrangle coauthors to get their feedback in. MG – I’m a bit stressed about it, since to do a good job of getting it smoothed out and putting it into shape is a huge amount of work. Also the google docs are really overwhelming. CC – I’ll get it into overleaf after the current round of edits. MG – I think it’s fair to ask for a timeline. CC and I will talk about it in our future meeting and we’ll report back.
NA paper coauthors		Tom Cheatham Giovanni Bussi Jiri Sponer Pavel Hobza

Meetings

2022-05-19 Protein FF meeting note

Participants

Goals

Discussion topics

Action items

Decisions

Related content