/
2020-07-31 QCA Submission Meeting notes

2020-07-31 QCA Submission Meeting notes

Date

Jul 31, 2020

Participants

  • @David Dotson

  • @Trevor Gokey

  • @Joshua Horton

  • Ben Pritchard

Goals

  • Where do CMILES go?

    • Are they attached to molecule records, or somewhat less coupled?

    • Are molecules in a new dataset submission accepted, and are the CMILES retained, if the molecules already exist in QCA?

  • Roadmap refresh: Quantum Chemistry Infrastructure / QC*

  • Trawl automation issues for additional functionality:

  • Review previous tasks, clear where possible.

  • What gaps in our current pipeline do we want to address next?

    • Submission is getting easier, but what about end-user use for:

      • fitting

      • benchmarking

      • scientific investigations

  • New datasets

    • First direct submission from David Cerutti

  • Datasets being processed

    • Rowley Biaryl appears stuck

  • ESP WFn storage

    • There is currently no way to specify the wavefunction protocol in basic datasets, we can do this manually or should we get this in fractal?

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

CMILES - where do they go?

Josh

  • Trevor: does it make sense to write a service for QCF that assigns CMILES based on the structural conformation; if they change, this generates a new record

    • perhaps use the openforcefield toolkit as a service to generate CMILES on the fly?

  • Josh: CMILES allow us to calculate MM energies reliably; does nothing for the QM

    • this is a problem for RDKit, because e.g. QCSchema doesn’t say anything about where the partial charge is

  • David: one way forward: Josh is adding CMILES to the Molecule (not just the entries) in QCSubmit, so all future submissions will have CMILES on the molecule.

    • We’ll proceed with this for the next several datasets, and we’ll need to assess that this gives us what we need as users for doing MM benchmarking work

    • Once happy, we can than solution the pathway for injecting CMILES into previous datasets that don’t currently have them

  • Trevor: Need to look at the service layer to see what can be implemented for CMILES handling

ESP Wfn storage

Josh

  • Josh: isn’t a way to specify the wavefunction protocol directly without adding it after the fact (_add_compute)

  • Ben: danger here is that the wavefunction will get stored for every intermediate

    • adding this to the dataset class is probably a straightforward PR

  • Josh: can do! Will address

Roadmap refresh

David

Reviewed roadmap refreshed by David and Jeff Wagner; changes and clarifications welcome at any time

David Cerutti submission

David

David: I will push for creating the PR, DC is creating an SDF writer so that we can run the fragments through the toolkit to generate e.g. CMILES

  • there may be other ways to get to CMILES, but this is a path to getting there

Rowley Biaryl INCOMPLETE

David

David: we have 1 INCOMPLETE Optimization that appears to be stuck; looks visually like cases we’ve seen on older datasets, but shouldn’t be due to a schema change

  • will address in a few weeks together with Ben, as this appears to be due to some non-atomic transaction handling in the database, resulting in rare cases of task completion but INCOMPLETE status remaining on the procedure.

  • Parking Rowley Biaryl for now in “Requires Scientific Review”

A solution to this is adding task regeneration for any INCOMPLETE procedures with no attached task when attempting a client.modify_tasks restart.

Philosophical discussion

Ben

Ben: Two pillars at play in QCArchive:

  • Immutability of calculations already run

  • Deduplication of inputs

These two pillars create a problem where when you really want to re-run a calculation on the same molecule, you can’t.

  • there are a lot of hidden variables in QM calculations that aren’t necessarily captured in the deduplication approach

Considering changing these pillars, e.g. making deduplication opt-in instead of automatic and forced

Immutability: calculations should probably remain immutable, questionable on other things (extras)

Ben plans to start soliciting feedback from users on what their pain points are so these can be included in any major refactor

Action items

@Joshua Horton will create a way to add the wavefunction protocol to the Dataset class so that only wavefunctions for the final configuration is stored
@Joshua Horton will add CMILES to the Molecule objects we are submitting to QCArchive in addition to the entries
Ben Pritchard will gather pain points and needs from current QCArchive users to fold into refactor discussions
@David Dotson will reach out to David Cerutti to get PR up for his first direct submission

Decisions