2020-09-16 Bespoke Check In Meeting notes

Date

16 Sep 2020

Participants

Discussion topics

Item	Notes
Pharma partner/Roche benchmarking	DM – Xavier Lucas (Roche) wants to run local ANI jobs JH – Should be possible. Will meet with DD and XL soon JW – JH, are you interested in being involved to the tune of up to 5 hours/week with industry benchmakring paper? JH – Yes – Seems to overlap with my goals, and I could push the bespoke workflow as well. HX – What is scope of benchmarking? DM – Similar to preprint on chemrxiv. Different pharma partners would do the same study on internal datasets. https://chemrxiv.org/articles/preprint/Benchmark_Assessment_of_Molecular_Geometries_and_Energies_from_Small_Molecule_Force_Fields/12551867 HX – Will read. What’s the scope of test set? DM – 2000ish molecules, 20,000ish geometries HX – We’re doing something similar with CNO compounds. Chemistry is a bit more constrained than paper set. DM – We could use more simple molecules for our fitting, if you’re willing to share. HX – Our dataset is focused on small fragments. Allows simpler enumeration of hierarchical torsions. Would love to check out your molecule set. Ours et has fewer molecules. OpenFF molecule set: HX – When doing torsion drives, how are structures propagated along the scan? JH – TorsionDrive method/wavefront propagation. https://chemistry.ucdavis.edu/news/driving-torsion-scans-wavefront-propagation HX – Is there an initial MM minimization step before running QM? JH – No, we take starting structures directly from RDKit/OpenEye. HX – Is that done for each structure in the scan point? JH – No, only the first scan point is done using a cheinformatics-derived starting structure. After that, the scan points are propagated from the QM-minimized structure of a neighbor. JW – For generating starting structures, we’re making “OpenFF conformer generation” as the first tool in our CLI JA – Could we do semiempirical initial minimization to get a fast, good starting point for more details QM? DM – Somewhat concerned that we’d be led into local minima JH – Unsure whether this would be a systematic problem.
General Check in	JH – Polishing up bespoke workflow. Currently can submit molecule from commandline, will trigger ANI scans, unsure about output spec. Need to optimize FB settings – Some outputs seem wonky. Fragmentation/graph matching still has some bugs. Large dependence on initial conformer method (OE vs. RDK) – How to control for this? DM – Ask Relay about how they handle this? They’re big RDK users. JH – Would like to check whether full QM shows the same problem as ANI. JW – Start recording these cases in an issue on bespokefit so that we can squarely attack this problem later. DM – Ready for beta testers? Lots of pharma folks would be interested. JW – Should we meet about the spec for the minimum viable product? JH – Yes (General) – JH should be involved in spec discussion for atom map refactor/implementaion JW – Atom map spec? JW – System object development will reuqire more extensive atom mapping JH – Don’t really store atom map in molecule, store it nearby in a separate dictionary. Currently lose track of atom mapping during fragmentation. Would be preferable to keep track of this. (General) – Broad adoption of bespoke workflow will require removal of fragmenter OE dependecny. But in short term, fragmentation isn’t required. For fragmenter backend replacement – JACS benchmark set has good diversity.
Galileo meeting prep	Give example of bespokefit? No easily-deployable conda package built, so let’s skip this.
QCSubmit demo	PB’s current understanding Submissions begin as QCA dataset submission PR DD has a workflow that prepares and submits those jobs. JH – The problem that QCSubmit solves Helps build dataset for submission. QCArchive is new technology and there were no input format converters, and ambiguity about standards. QM representations of molecules lose track of graph. This led to ambiguous interpretation of output molecules when we need to recover their graph. QCSubmit objects are heavily-validated python objects. Includes logic for avoiding doing torsion scans on, eg, linear torsions. Torsions are the most important thing to get right when looking at a molecule’s energy surface. Users have inputs in a variety of formats. If they have SMILES, then we need to generate “good” input coordinates. If they have other 3D formats, the conversion pathway to QCSpec input is different. Install instructions and source code for QCSubmit Initial tasks for Pavan Reproduce previous submission – OpenFF biaryl set

Date

Participants

Discussion topics

Action items

Decisions