Section on non-Bragg diffraction is incomplete and will be drafted by Michael Wall’s postdoc (and former Mobley lab member) David Wych
Plan for presenting experimental datasets in review article
For crystal datasets, is a PDB ID sufficient?
DM - some data are rerefined, so some data may need to be curated
CC - clone data in PDB to keep version consistent
For NMR datasets, what metadata is desired?
CC – This will be trickier, these are less standardized. One question is “what METADATA do we need for NMR datasets?”. We definitely need observables + molecules. But how do we handle cases where chemical shifts are only available for a subset of residues?
CC – I’d mentioned this at the last meeting with the NMR subgroup. Conclusion was that different datasets are standardized to different extents.
JW: We did this before with NIST data, would be a good reference
SB – This model would be good, but I wouldn’t copy it too closely (it was a bit rushed). What I’ve found is that, instead of just scripts, it’s better to do a python library that is USED by scripts to do the filtering. This can be a really high-value thing to get right the first time. Let’s gather JW + SB + CC to discuss in detail.
CC –
SB – It’s easy to lose track of rationale, so important to document clearly.
Infrastructure needs - implement forward model in Evaluator or prepare input for external software?
JW – Alternatives here?
CC – There are existing tools of varying quality for predicting eg. chemical shifts
SB – The first step to implementing this in evaluator is “figuring out the workflow” – What exactly is the input? Then “what steps are needed to predict the observable?”. Once we know this, we’ll have a better idea for what the implementation will look like. After that we can wrap arbitrary software to accomplish the task. But it’s not unprecedented - A good example would be the host-guest targets in evaluator. Whether this happens inside or outside evaluator isn’t that important – It can be really flexible with plugins.
DM – Totally agree that wrap existing stuff is the way to go. To begin with we don’t want to be inventing stuff, just doing whatever is normal/standard/best practices or “good practices”. Later we can push the envelope.
DM – And totally agree that crystal simulations are super expensive. These are things where one might simulate hundreds of thousands to millions of atoms for hundreds of nanoseconds to microseconds. Definitely not validation.
CC – Two things:
For the validation sets I’m thinking of, the systems will be solvated 2-4 AA peptides. So the workflow will be “take an ensemble of PDB structures and get a trajectory, then predict the chemical shift”. So we won’t need to do docking or anything like that, and the simulations should be relatively short (just need to sample conformation landscape of these small peptides)
SB – Yeah, it’d be good to have a sort of sketch of exactly which tools we’d use and how the data flow would look.
We’ll need to do similar types of things for more expensive test datasets. Those would be bigger proteins + ligands. So I wonder if there’s some way we can be forward-thinking about the design we could reuse components
JW – I think there are enough unknowns that we shouldn’t aim to reuse designs/components off the bat. We’ll be hunting down edge cases for months. Also validation studies will likely be needed
SB – Agree
SB – Also keep in mind that, if we do try a bunch of different analysis methods, that we may want to design a way to reuse the trajectory info so we don’t have to rerun expensive sims.
CC – Andrew white published a NN method for determining chemical shifts. I don’t know how this will stack up to other options like ShiftX.
JW – I’d be cautious about the precision/accuracy of NMR validation – It’s possible that none of the tools are accurate enough to improve the FF in the regime we care about.
CC – Not clear that these are designed for trajectory analysis.
SB – What NMR observables/analyses do AMBER folks use? We should at least meet that as a standard.
CC – Agree. There are parameter sets for the NMR models that other folks have used, so we can be sure to use those as well to ensure our work is comparable.
Infrastructure needs - calculation of pairwise radial distribution functions
CC – In theory you need to do a grand canonical ensemble. In practice you can do a big simulation box and analyze only a subset of the box. Then you can just make an RDF that you pass on to subsequent analysis steps.
JW – This should be doable.
CC – May be implementations already available in MDAnalysis/MDTraj, so shouldn’t be a huge lift.
LW – I can confirm that RDFs are available in MDA.
Action items
Decisions
No labels
0 Comments
You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.
0 Comments