Janssen RNA team working on modified antisense oligonucleotides reached out about OpenFF nucleic acid support
Chapin is meeting with the Janssen team on Monday, May 23
OpenFF is prioritizing support for proteins due to direction from governing and advisory boards
An OpenFF force field release supporting proteins will not happen before 2023
OpenFF is committed to supporting canonical nucleic acids, but the timeline for this is not clear
MS – Agree that proteins should come first. Seeing more funding opportunities for NAs. So this could be the right time to start thinking about things like making a community consensus on benchmarking. I imagine that learnings/techniques from protein FF project will extend to NAs.
CC – Agree. I imagine that protein sidechains are trickier, whereas NA backbones are trickier.
MS – Re “canonical NAs” - We should say that we will support those first, but will also provide tools to handle noncanonical residues. So, the goal (but without a timeline) is that noncanonical NAs will be easy to handle.
Avenues for nucleic acid support
Parametrization of nucleic acids with Sage/Rosemary plus graph charges, which is unlikely to be competitive with contemporary RNA force fields like Amber OL3
Nucleic acid specific torsions for canonical nucleic acids
Bespoke torsions (or infrastructure to support fitting bespoke torsions) for covalent nucleic acid modifications
MS – There may be an intermediate level, where once we have generalizable canonical NA torsions, then noncanonical NAs should be trivially covered by FF combination with the small molecule FF.
MS – So, the to-do in the next few months is to figure out the process for coming up with data/process to validate NA FF. This would be a good followup to the protein FF paper, could bring in Tom Cheatham as a coauthor.
CC – So, when I talk with Janssen on Monday, I’ll say that we don’t support this yet and they should use AMBER.
MS – Could also say that, with additional support, we could do this faster.
CC – I’ll mention that with a donation of GPU compute we can likely accelerate protein FF timeline.
MS – I may have some spare PhD student-time that could contribute here to biopolymer work. Could be available in the coming Fall/Winter.
Protein QC datasets
@Chapin Cavender
Status of protein QC datasets
2-D TorsionDrives on (phi, psi) for capped 1-mers
25 / 26 complete
2-D TorsionDrives on (chi1, chi2) for 23 capped 1-mers with backbone constrained to alpha helix or beta sheet
44 / 46 complete
Three TorsionDrives (CYX backbone, GLU sidechain beta, LYS sidechain alpha) are failing on a small number of grid points (<1 %) due to geometry convergence. Should these still be used in parameter training?
CC – We do have data for MOST grid points on the scans, but since the torsiondrive hasn’t completed we don’t know that they’re the “minimum energy” confs for the grid points
MG – I’m in favor of moving ahead
JW – Same here. They seem to be really strained/clashy confs
MS – Do we need these data points?
CC – Basically there are some torsiondrives that would be missing a few grid points.
MS – What did Lee Ping Wang say?
CC – He said that the failures seemed valid - The force gradients were really high.
MG – Chapin showed me some ramachandran plots what showed the regions where the incomplete grid points are
MS – Are they points that SHOULD (ramachandran-wise) be high energy?
CC – Yes
MG – A good course of action could be, basically, “move forward and document what we saw”
MS + CC + JW – Agree
Decision: We’ll move ahead assuming these torsiondrives are complete and are in an appropriate state to begin fitting. CC will also ask LPW to weigh in on this decision, and we can go back if LPW says that it’s necessary to push the torsiondrives to become more complete.
MS – We want to make sure that our benchmarks meet or exceed the level of scrutiny that other published FFs have met.
MG – It’d be helpful to have a split between “benchmarking against other protein FFs”, but there could be a “want”…? Basically, we should agree on a subset of the systems+measurements from the LiveCOMS paper
CC – Our current planned set of systems+measurements is the amber99sb-disp set of proteins by paul robustelli.
JW – The yellow branch (xtal) here could be run in parallel with the “NMR observables” branch. So if MS has a PhD student interested in biopolymers then they could work on that.
MS – I’ll have that conversation with my student and we can follow up on this in the future
MS – PAper timeline?
CC – Depends heavily on when I can wrangle coauthors to get their feedback in.
MG – I’m a bit stressed about it, since to do a good job of getting it smoothed out and putting it into shape is a huge amount of work. Also the google docs are really overwhelming.
CC – I’ll get it into overleaf after the current round of edits.
MG – I think it’s fair to ask for a timeline. CC and I will talk about it in our future meeting and we’ll report back.