Nucleic acid datasets and parameterization (KT) | Slide 6: CC: what metric did you use for clustering? LW: what nonbonded terms? KT: openff 2.0 vdW, fitted my own charge model KT: challenges of validating RNA FF 10 us simulations look like they have the right behaviour, but if you keep running longer, you start seeing weird conformations. Not sure this is the current best approach JC: I think you’re providing the good for a good starting point CC: I’d use the same kind of observables in benchmarking proteins and NMR, so similar to what you’re doing now. There’s not as much crystal data for RNA as proteins. Would also look at bigger motifs, more than single-stranded, so looking at e.g. tetraloops or 10-mers. Have you looked at those yet? KT: not yet, since initial tests were failing
KT: I compared J-coupling observables to experiment and they looked good, but other metrics don’t, so they may not be enough
Slide 32: water models with AMBER
LW: where did you get your conformers? KT: from database. Experimental datasets have to be stable so they’re stacked, which is biased towards anti JC: could we easily generate syn conformers ourselves to fill in the gaps (e.g. unfavourable areas)? e.g. with MD simulations? KT: that was one of the motivations for creating the nucleoside dataset with torsion scans, but need to handle hydroxyl sugar interactions carefully. I don’t know the right way to handle this at the moment. CC: my impression is a lot of these might come from NB interactions that aren’t calibrated properly, e.g. a too-favourable interaction between H-donor and PO4 group JC: could we bring in additional data that gives insight into balance between these interactions? Can help regularise LJ? Xtal-phase data could help with balance between interactions
KT: is OpenFF going to work on nucleosides in the future? LW: is there consensus on adding implicit solvent? CC: not really consensus, sizable proportion including solvent considerations (implicit or explicit) CC: can also start with gas-phase calculations and re-tune against NMR KT: what I’m doing now is starting from espaloma 0.3 which is fit to QM data (including RNA) and re-tuning to 3J couplings
JC: who is currently generating datasets? CC: nominally me JC: how does deposition of new datasets work now? Could MWieder help? JW: ML datasets live on different server as they’re too big for OpenFF. We have a Tuesday submission meeting KT: can anyone attend the Tuesday meetings? JW: depends on dataset size KT: is qca-dataset-submission still active? JW: yes
JC: so things have now fragmented so there are multiple MolSSI QCFractal instances, where ML and OpenFF is separate and aren’t necessarily usable for the others JW: OpenMM is using the ML instance for the SPICE 2.0 dataset. Last time we tried to do a large datasets, MolSSI asked for some funding. JC: have things changed now that MolSSI’s circumstances have changed? JW: large datasets would need to be discussed by lead team JC: is OpenFF going to stick with small datasets generated at current level of theory?
LW: can we still use datasets contributed to other instances like the ML instance? JC: what would be helpful? Individual people owning their own datasets, or a dedicated extra person?
|