2021-03-24 Industry Evaluation Meeting notes

Date

Mar 24, 2021

Participants

  • @Jeffrey Wagner

  • @Gary Tresadern

  • @David Dotson

  • Thomas Fox

  • Alberto Gobbi

  • Bai Xue

  • Bill Swope

  • Christina Schindler

  • @David Hahn

  • Ian Craig

  • @Joshua Horton

  • @Lorenzo D'Amore

  • Pieter in’t Veld

Discussion topics

Item

Notes

Item

Notes

Questions on submission process?

  • IC – You mentioned that there were problems with iodine-containing molecules. How should we take care of that?

    • DD – We considered adding a step that removed iodine-containing molecules, but haven’t built that yet. If you could identify iodine-containing molecules in your README, then that would let us filter the datasets.

    • (General) – It will save everyone’s time to add a workflow step to remove iodine-containing molecules

    • AG – I don’t see much difficulty in doing a grep to remove iodines.

    • DD – We’ll have the developers huddle after this and decide whether to supply a simple grep | rm command, or to make a more general substructure filter.

  • AG – I need to go to my lawyers about sharing this data, once I have the final set of data to share. This is regardless of other steps.

Upcoming: Schrodinger command tree demo

  • AG – Which host is used for FFBuilder jobs?

    • DH – Default is localhost, but you can use a workstation queue

  • CS – Are these defined in SCHRODNGER_HOSTS file?

    • DH – yes

  • AG – Which version of schrodinger?

    • DH – 2020.4, though LD has tested with 2020.2

    • CS – would 2021 be OK?

      • DH – Yes, I have slides on using the 2021 version with OPLS4

  • CS – This is just a wrapper, so if I’ve pre-computed my ff builder jobs, I could just take from those?

    • DH – Yes

  • AG – Has someone discussed with Schrodinger about publishing these results?

    • DH – This is on the slides, each partner will need to discuss with Sch before sharing.

    • AG – I’d recommend that we not share these results until we’ve discussed as a group about what to do. Neither publicly nor with schrodinger

Upcoming: Torsiondrive one-shot command

Discussion: extension of analysis features

  • AG – Could we look for the global minimum in a QM calculation, and then ask how many times is there a low-energy MM conformation within a given RMSD of that minimum. dE and RMSD cutoff could vary and the result could be something in 2 dimensions

    • GT – Was somehting like this done in the original Lim+Hahn work?

    • DH – This sounds similar to the original match-minima analysis, with a custom cutoff in the RMSD.

    • AG – The goal would be to identify whether there’s a MM local minimum in the neighborhood of a QM minimum.

    • AG – Could either start all optimizations from unoptimized generated conformers, or start MM minimizations at QM minima.

  • BS – Possibility of doing dipole moments? Look for difference in classical vs. quantum dipole moments. This was discussed last fall. I think the information is in the psi4 output, but we’d also need to figure out how to get the dipole moments from MM.

    • AG – How would we account for aligning vectors? Maybe just consider magnitude initially. could be a 2D analysis with RMSD and magnitude

  • GT – Wonder if we could extract outliers in torsional fingerprint. Would be useful internally to identify the worst-offending torsions. Then we could share those substructures with OpenFF without sharing the whole molecule.

    • DD + DH – We’re not familiar enough with the internals of TFD to know whether we could extract the “most offending” torsions.

  • XL – Could look at which SMIRKS correspond to the worst energies compared to QM.

    • DH – That’s tentatively planned, as point 5.

    • AG – If we shared some form of this data, it would be valuable to submit it for subsequent round of FF development.

Discussion: Season 2

  • If we ran a season 2, what kinds of questions would you want to answer?

    • XL – More focused analysis on torsions.

      • CS – Agree

      • BS – Would torsional analysis be able to give barrier heights between stable conformations?

      • (General) – Do other degrees of freedom relax?

        • AG – The torsion of interest is held fixed, but all other dofs are allowed to change. I wonder if we should look more closely at the high-energy/barrier defining confs.

    • GT – Interested to know how OpenFF can work for covalent interactions. Aromatic-aromatic, dimeric fragments, etc.

    • JW – Bulk property fitting? This is planned for Sage

    • GT – Not exactly bulk properties we’re looking at. If you look at what DFT people are doing, they’re looking at noncovalent interactions for dimers, aromatic rings, etc.

  • CS – INTRAmolecular interactions, like hbonds.

    • DH – This hasn’t been our interest so far, since we haven’t reparamaterized

    • AG – Begdb.org has lots of QM datasets for things

    • LD – IOCHem-bd.org

Coordinated season or rolling development?

  • CS + TF + XL – I like deadlines and discrete seasons.

  • AG – Solvation free energies?

    • BS – Would be good to do polar and nonpolar

    • AG – We don’t have data to compare to

  • AG – Conformation generation for macrocycles

    • XL + CS – Agree

    • JW – Conformer generation is outside our scope at the moment, but we could couple with an existing method and do ranking better

    • CS – Could run high temperature MD

Parameters for season 2?

  • DD – For example, number of molecules? optimizations? torsion drives, which FFs, ML potentials?

  • BS – Some measure of stiffness of molecules at low energy confs. So some analysis around entropy, vibrational frequencies, hessians, etc. If molecules are floppier, they may bind better, but if they’re stiff they may not.

  • CS – In terms of dataset size, I liked having a range of dataset size (100-1000 molecules). Also ML potentials.

    • XL – Agree with ML potentials. ANI would be great.

    • CS – Would like to test against CGenFF as well.

  • DD – Dataset size?

    • AG – We’ve run larger sets. I can report on the differences in distribution later.

    • CS – I ran my 1000 and it took longer than expected, so that’s a good limit

      • IC – Agree

    • XL – We could go bigger, but we’re also OK with 1000

    • DH – If we include torsiondrives, things will get a lot more expensive

      • CS – Could counteract this by having more constraints on dataset composition (like molecule size/rotatable bonds to make torsiondrives manageable)

      • TF – Could we have a tool to select diverse torsions from a larger set? I narrowed down my set using random selection but a diversity filter would be good.

        • CS – Same. Agree

        • XL – Could pick molecules that use a maximum diversity of FF parameters

          • CS – Agree

        • TF – Random selection of 1000 from a million should be quite diverse.

        • AG – Random selection will probably miss sampling torsions involving things like S and F.

        • TF – Having a tool for selection of diverse torsions would be great.

        • DD – Could couple this to coverage report step.

  • JW – Is it a concern that we’re using the same QM method for training and testing the FF?

    • GT – I’d like to see a comparison of the method we’re using to “ground truth”

    • CS + XL – This is a good point. This is somewhat concerning now.

    • DD – Would it be fair to select a different commonly used QM method for comparison?

    • TF – Could have each partner run a small internal benchmark of different QM methods.

    • XL – Could take molecules from the public set and run them with a more detailed method

    • AG – DL-PNO method is recommended and only has 3x the computational cost.

    • JW – Would running on the public set with a more detailed level of theory as well as our default method of theory be appropriate to test this, or would folks also want to do internal tests?

      • (General) – This would be appropriate.

Post: Remaining roadmap for season 1

  • Desired additional features

    • SMARTS filter

      • JH will draft PR, JW will review

      • Not important where it goes (maybe utils)

      • Should be exposed as a new step, though we don’t know where it will finally live, so we shouldn’t hard-code any paths/default paths (so like not using 5-export-qm)

    • Schrodinger commands

      • DH – Providing this functionality is quite important.

      • (General) – Could just have people upload a new tarball if/when they perform schrodinger calculations.

      • DD – In the guidance for the data export, we could have them include a date string, and let them update their set by submitting a new tarball of results.

Post: Personnel for season 2

  • JW – Personnel assignments are partly up to PIs (for DD and JW), partly up to Janssen (DH and LD).

  • DH – Will be participating as a partner in season 2, can advise but won’t be doing direct implementation.

  • LD – Will talk to DH and GT, somewhat cautious about doing production coding.

    • DD + JW – Wouldn’t be alone. We’re planning on training you either way.

  • JH – As a Cole lab member, I’m interested in including QUBE in benchmarking, so I’m interested in its health overall.

  • DD – This has been enjoyable, the work has significant impact. I’ll keep thinking about this. In general we should continue doing this kind of benchmarking. So maybe we come back to it after a break? Eg season 2 in the Fall, with nicely-implemented torsiondrives and other new features. This would also put less pressure on personnel allocation.

Action items

@Joshua Horton will draft PR for SMARTS-filter component; will be used as a new step in the Season 1 protocol, but should be written as a general-purpose component that could be used in 1 or more places in given protocol; @Jeffrey Wagner will review
@David Dotson will ensure date string included in guidance for data export, allowing partners to submit as many times as desired

Decisions