2020-11-06 QCA Submission Meeting notes

Date

Nov 6, 2020

Participants

  • @David Dotson

  • @Trevor Gokey

  • @Pavan Behara

  • @Joshua Horton

Goals

  • New advancements

    • Ambertools pinning #149

    • QCSubmit PCM validation

  • New submissions

  • Upcoming infrastructure improvements

    • STANDARDS-based versioning #137

    • Dataset index on qca-dataset submission #147

    • Local Optimization executor

  • Upcoming science support

    • Selection of TorsionDrives from a dataset by SMARTS matching (prototyped; need PR into QCSubmit)

    • PCM-based implicit solvent pathway

    • ESPs and wavefunction storage

      • Use of AO density?

  • Larger advances

    • Automated FF coverage gap identification, torsion prioritization, submission generation

    • Benchmarking (dashboard, etc.)

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

PCM support

Josh

  • PCM support now in QCSubmit, validation included, release cut

  • release includes updates to address Hyesu’s basis set issues

  • TG: Hyesu’s submission no longer blocked; will give it a final review and submitted

  • 0.1.0 is the version, has basis set exchanged included as dependency

    • will get in touch with Jeff for opinion on using main omnia channel

BCC refit study

Trevor

  • Need to resolve question on getting back density from wavefunction components

  • TG: in Daniel’s snippet, taking out all the occupied orbitals, multiplying together to get the densities; if you want to get the entire wavefunction, may be out of scope; if we just want to use Simon’s recharge, should have everything we need

  • TG: once you have dA, can use Daniel’s other script; make sure not using symmetry

  • JH: will send notebook I was using; will work on this together

ANI1x, ANI1cxx

Josh

  • Ready for review!

Optimization local executor

David+Trevor

  • Trevor’s Optimization local executor in QCSubmit?

  • QCEngine cli?

  • Trevor will make a PR with the script against QCsubmit; this will start discussion on CLI components, where it can go

  • TG: Does it make sense for this debugging functionality (pulling e.g. an Optimization from a live QCA instance to run locally).

  • BP: first instinct it’s a separate package

  • TG: Pulls ids, runs geometric

    • doesn’t touch the generated task, so no auth required

    • if we just pulled the task, would be a much smaller script

  • JH: would it make sense to generate the task locally as part of this script?

  • BP:

    • def run_proc_id(proc_id): task = client.query_tasks(base_result=proc_id) res = qcengine.compute_procedure(*task.spec.args) return res

       

  • TG: related: Lee-Ping would like a way to get geometric inputs from the corresponding QCArchive tasks specs

Enamine REAL

Trevor

  • 2% of molecules that couldn’t be built

  • some prototypes exist to do the coverage

  • TG: one validation that might become important is a chirality check; we’re seeing some downstream failures when we pull molecules later

  • JH: that’s strange; recent dataset?

  • TG: get a bunch on older datasets; chirality undefined

    • in one case, when you generate the CMILES tags, puts chirality on one center, but then when you load in with OpenEye says it’s non-chiral

  • JH: no longer require nitrogen stereochemistry; are these nitrogens

    • TG: Think they’re SO2

  • JH: next step is allow users to provide the CMILES and the molecule separately

  • JH: we could maybe quickly load the CMILES in and try to make the conformers for it

  • TG: if we need to iterate over molecules, do a [m for m in dataset.molecules] (versus Molecule.from_qcschema() from the toolkit via contacting the server directly)

Protomer/tautomer enumeration

Josh

  • John suggested this is important; Pavan will spearhead preparing this with help from Josh

  • One potential dataset for this is the Genentech one

  • Bayer set used for fitting

    • Optimizations and Torsiondrives were fitted with; if this is part of the benchmark set Bayer wants to use, we don’t want to use it for benchmarking

  • Josh will drop a question into benchmarks-partners for clarification on Bayer and Genentech datasets; former had bits used for fitting, latter is being prepared for submissions (is it the same set they want to use as their public contribution for benchmarking?)

STANDARDSv3

Trevor

  • On track to have a PR up in a week or so

Action items

@Trevor Gokey copy/paste the optimization pull/run script into QCSubmit
@Trevor Gokey will review the ANI1x compute addition on benchmark ligands
@Joshua Horton will check in with Jeff for opinion on releasing QCSubmit on main omnia channel
@Joshua Horton will send Trevor notebook used for psi4 wavefunction reconstitution; work with @Trevor Gokey to converge on solution, communicate to @Simon Boothroyd
@Pavan Behara will prepare protomer/tautomer state enumeration dataset, with assistance from Josh
@Trevor Gokey will aim for STANDARDSv3 PR up in next couple weeks(!)
@David Dotson will add MM compute specs to sandbox CHO dataset

Decisions