2021-01-13 Benchmarking for Industry Partners - Development Meeting notes

Date

Jan 13, 2021

Participants

  • @David Dotson

  • @David Hahn

  • @Joshua Horton

  • @Jeffrey Wagner

Goals

  • Updates from project team members

  • Identify and address development issues encountered

  • Identify incomplete areas of the protocol

  • Identify and address project risks

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Updates

 

  • JW: big goal is to add dataset expansion option

    • refactored first two steps so all reading/writing happens in CLI layer

    • fixed problem with RDKit generating really bad conformers - so bad, double bonds nowhere near flat

      • when toolkit tries to load, gets confused perceiving stereochemistry of these bonds

      • fixed by writing file out to tmpfile, read it back

    • refactored tests to use click.testing

    • switched conformer generation to use a different alignment, rms method. Using GBRMS.

    • Planning to update Josh’s conformer generation branch with these changes and get parallelized generation merged

    • Dataset appending next

  • DH: worked on analysis tools under report subcommand; extended a bit using different input

    • demo of compare-forcefields; working on tests next

      • JW: how well does it handle missing data?

      • DD: should only need to handle cases of missing MM results where QM conformer present

    • demo of match-minima: CSV outputs indicating the MM conformer closest to the given QM conformer

    • demo of plots: individual conformer and aggregate plots

      • DD: for plot generation, we specify the reference method; can we use that information to leave the reference method’s values out of the plots?

        • DH: can do

      • DH: implementing tail parameters analysis next

    • Ready for review?

  • JH:

    • index dropping work; almost resolved

    • the smirnoff99Frosst issue is harder to solve; not a problem in QCSubmit, so could be in one of QCEngine, openmmforcefields; problem doesn’t show up on Mac (probably due to case-insensitivity of FS); thinking it’s the capital F in Frosst

    • analysis env:

      • DH: possible that seaborn has a negative interaction with click

  • DD:

    • Adding feature richness to optimize

      • Can pull up progress part

      • Can extract results for specific molecules

      • Added debug-from-server – Attempts to rerun locally. In the future I’d like to make this return the successful result to the server.

    • DD – Currently about a 15% error rate

    • JW – What is target error rate?

    • DD – 5%. I think we can get there using local runs when server fails (debug-from-server) and better geomeTRIC settings..

    • DH – Getting a lot of miscellaneous errors when I run compute

    • DD – The errors you’re seeing are probably related to file system access errors. Should contact admins to see if they can increase access limits.

High-value effort investments?



  • More widespread testing

  • GeomeTRIC/QCEngine errors

  • smirnoff99Frosst loading errors on linux

  • Testing party for protocol

    • 1/14, same time as this call

  • optimize submit worrisome outputs; squelch RDKit log messages

    • JH: only warning qcsubmit does is if the molecule is charged

    • JW: protons added/removed, etc. - might be RDKit

    • JH: is that RDKit trying to rationalize the input?

    • JW: appears so

  • optimize submit: when an error happens with a molecule, would be helpful to know which; when doing molecule iterations, always indicate which is being worked on

 

 

 

Action items

@Jeffrey Wagner will add in dataset appending approach for validation, conformer generation
@Jeffrey Wagner will add conformer generation changes to parallel PR, merge
@David Hahn will implement tail parameters analysis
@David Hahn will refine plots to leave out reference method (always 0)
@David Hahn will ensure analysis can handle cases where QM conformers present, but not corresponding MM conformers
@Joshua Horton will track down smirnoff99Frosst validation issue on Linux; ensure we can use it for calculations
@Joshua Horton will add index dropping fix to QCSubmit
@David Dotson will add consistency between optimize execute and optimize execute-from-server commands
@David Dotson will add server completion to optimize execute-from-server
@David Dotson will add RDKit squelching to optimize submit-molecules; will add output indicating molecule being processed to submit-molecules

Decisions