2021-03-24 Industry Evaluation Meeting notes

Date

24 Mar 2021

Participants

Jeffrey Wagner
Gary Tresadern
David Dotson
Thomas Fox
Alberto Gobbi
Bai Xue
Bill Swope
Christina Schindler
David Hahn
Ian Craig
Joshua Horton
Lorenzo D'Amore
Pieter in’t Veld

Discussion topics

Item	Notes
Questions on submission process?	IC – You mentioned that there were problems with iodine-containing molecules. How should we take care of that? DD – We considered adding a step that removed iodine-containing molecules, but haven’t built that yet. If you could identify iodine-containing molecules in your README, then that would let us filter the datasets. (General) – It will save everyone’s time to add a workflow step to remove iodine-containing molecules AG – I don’t see much difficulty in doing a grep to remove iodines. DD – We’ll have the developers huddle after this and decide whether to supply a simple grep \| rm command, or to make a more general substructure filter. AG – I need to go to my lawyers about sharing this data, once I have the final set of data to share. This is regardless of other steps.
Upcoming: Schrodinger command tree demo	AG – Which host is used for FFBuilder jobs? DH – Default is localhost, but you can use a workstation queue CS – Are these defined in SCHRODNGER_HOSTS file? DH – yes AG – Which version of schrodinger? DH – 2020.4, though LD has tested with 2020.2 CS – would 2021 be OK? DH – Yes, I have slides on using the 2021 version with OPLS4 CS – This is just a wrapper, so if I’ve pre-computed my ff builder jobs, I could just take from those? DH – Yes AG – Has someone discussed with Schrodinger about publishing these results? DH – This is on the slides, each partner will need to discuss with Sch before sharing. AG – I’d recommend that we not share these results until we’ve discussed as a group about what to do. Neither publicly nor with schrodinger
Upcoming: Torsiondrive one-shot command
Discussion: extension of analysis features	AG – Could we look for the global minimum in a QM calculation, and then ask how many times is there a low-energy MM conformation within a given RMSD of that minimum. dE and RMSD cutoff could vary and the result could be something in 2 dimensions GT – Was somehting like this done in the original Lim+Hahn work? DH – This sounds similar to the original match-minima analysis, with a custom cutoff in the RMSD. AG – The goal would be to identify whether there’s a MM local minimum in the neighborhood of a QM minimum. AG – Could either start all optimizations from unoptimized generated conformers, or start MM minimizations at QM minima. BS – Possibility of doing dipole moments? Look for difference in classical vs. quantum dipole moments. This was discussed last fall. I think the information is in the psi4 output, but we’d also need to figure out how to get the dipole moments from MM. AG – How would we account for aligning vectors? Maybe just consider magnitude initially. could be a 2D analysis with RMSD and magnitude GT – Wonder if we could extract outliers in torsional fingerprint. Would be useful internally to identify the worst-offending torsions. Then we could share those substructures with OpenFF without sharing the whole molecule. DD + DH – We’re not familiar enough with the internals of TFD to know whether we could extract the “most offending” torsions. XL – Could look at which SMIRKS correspond to the worst energies compared to QM. DH – That’s tentatively planned, as point 5. AG – If we shared some form of this data, it would be valuable to submit it for subsequent round of FF development.
Season 2 discussion	If we ran a season 2, what kinds of questions would you want to answer? XL – More focused analysis on torsions. CS – Agree BS – Would torsional analysis be able to give barrier heights between stable conformations? (General) – Do other degrees of freedom relax? AG – The torsion of interest is held fixed, but all other dofs are allowed to change. I wonder if we should look more closely at the high-energy/barrier defining confs. GT – Interested to know how OpenFF can work for covalent interactions. Aromatic-aromatic, dimeric fragments, etc. JW – Bulk property fitting? This is planned for Sage GT – Not exactly bulk properties we’re looking at. If you look at what DFT people are doing, they’re looking at noncovalent interactions for dimers, aromatic rings, etc. CS – INTRAmolecular interactions, like hbonds. DH – This hasn’t been our interest so far, since we haven’t reparamaterized AG – Begdb.org has lots of QM datasets for things LD – IOCHem-bd.org Coordinated season or rolling development? CS + TF + XL – I like deadlines and discrete seasons. AG – Solvation free energies? BS – Would be good to do polar and nonpolar AG – We don’t have AG – Conformation generation for macrocycles XL + CS – Agree JW – Conformer generation is outside our scope at the moment, but we could couple with an existing method and do ranking better CS – Could run high temperature MD Parameters for season 2? DD – For example, number of molecules? optimizations? torsion drives, which FFs, ML potentials? BS – Some measure of stiffness of molecules at low energy confs. So some analysis around entropy, vibrational frequencies, hessians, etc. If molecules are floppier, they may bind better, but if they’re stiff they may not. CS – In terms of dataset size, I liked having a range of dataset size (100-1000 molecules). Also ML potentials. XL – Agree with ML potentials. ANI would be great. CS – Would like to test against CGenFF as well. DD – Dataset size? AG – We’ve run larger sets. I can report on the differences in distribution later. CS – I ran my 1000 and it took longer than expected, so that’s a good limit IC – Agree XL – We could go bigger, but we’re also OK with 1000 DH – If we include torsiondrives, things will get a lot more expensive CS – Could counteract this by having more constraints on dataset composition (like molecule size/rotatable bonds to make torsiondrives manageable) TF – Could we have a tool to select diverse torsions from a larger set? I narrowed down my set using random selection but a diversity filter would be good. CS – Same. Agree XL – Could pick molecules that use a maximum diversity of FF parameters CS – Agree TF – Random selection of 1000 from a million should be quite diverse. AG – Random selection will probably miss sampling torsions involving things like S and F. TF – Having a tool for selection of diverse torsions would be great. DD – Could couple this to coverage report step. JW – Is it a concern that we’re using the same QM method for training and testing the FF? GT – I’d like to see a comparison of the method we’re using to “ground truth” CS + XL – This is a good point. This is somewhat concerning now. DD – Would it be fair to select a different commonly used QM method for comparison? TF – Could have each partner run a small internal benchmark of different QM methods. XL – Could take molecules from the public set and run them with a more detailed method AG – DL-PNO method is recommended and only has 3x the computational cost. JW – Would running on the public set with a more detailed level of theory as well as our default method of theory be appropriate to test this, or would folks also want to do internal tests? (General) – This would be appropriate.
Remaining roadmap for season 1	Desired additional features SMARTS filter JH will draft PR, JW will review Not important where it goes (maybe `utils`) Should be exposed as a new step, though we don’t know where it will finally live, so we shouldn’t hard-code any paths/default paths (so like not using `5-export-qm`) Schrodinger commands DH – Providing this functionality is quite important. (General) – Could just have people upload a new tarball if/when they perform schrodinger calculations. DD – In the guidance for the data export, we could have them include a date string, and let them update their set by submitting a new tarball of results.
Personnel for season 2	JW – Personnel assignments are partly up to PIs (for DD and JW), partly up to Janssen (DH and LD). DH – Will be participating as a partner in season 2, can advise but won’t be doing direct implementation. LD – Will talk to DH and GT, somewhat cautious about doing production coding. DD + JW – Wouldn’t be alone. We’re planning on training you either way. JH – As a Cole lab member, I’m interested in including QUBE in benchmarking, so I’m interested in its health overall. DD – This has been enjoyable, the work has significant impact. I’ll keep thinking about this. In general we should continue doing this kind of benchmarking. So maybe we come back to it after a break? Eg season 2 in the Fall, with nicely-implemented torsiondrives and other new features. This would also put less pressure on personnel allocation.

Date

Participants

Discussion topics

Action items

Decisions