2020-05-14 QCFractal Meeting notes

Date

May 14, 2020

Participants

  • @Jeffrey Wagner

  • Ben Pritchard

  • @David Dotson

  • @Jessica Maat (Deactivated)

  • @Joshua Horton

  • @Chaya Stern (Deactivated)

  • @Matt Thompson

  • @Trevor Gokey

Goals

  •  

Discussion topics

Item

Notes

Item

Notes

Updates from MolSSI

  • None

Manager check-in

  • Increased number of services – Should have seen a bump in utilization

  • Both PRP and UCI HPC3 managers are running

    • HPC3 has some problems with jobs going stale – Trevor restarts every few days

    • BP – Do these always happen at the same time of day

    • TG – Doesn’t seem like it

    • TG – Increasing number of retries may fix this

User questions

  • CS – My previous dataset has torsion atoms mislabeled. Need to fix and submit again

    • TG – Can we rename existing dataset?

    • BP – Not clear that we can

    • CS – Can we change the first attempt to “fragmenter paper first try”

    • TG – How about “fragmenter paper v2”

    • CS – “OpenFF fragmenter validation 1.0”

    • CS – Can we not list the collection?

    • BP – I saw “invalid atom index” errors when I logged into the server to restart it

    • TG + CS – Adding a PDF of torsions being driven to each submission would be good

    • Could be required for submission:

      • CS made simple RDKit script to do this. But it’s also important to check the tagged indices in the job.

      • Could do MCSS match to verify the same atoms are tagged

    • JW – Should each dataset be versioned? (“fragmenter paper 1.0”, “1.1”?)

      • BP – Want to be able to add version to dataset, and reference dataset by name

    • BP will delete the previous “fragmenter paper” dataset

      • But parent molecules should remain (which seems to be current behavior)

    • TG – In the PR, we should list the molecules that we’re removing

    • DD – For the molecules/torsion scans that we DON’T want, will the hashes for those match with the resubmissions?

      • BP – They shouldn’t since the indices will be different, so they’ll hash to a different value

    • TG – Could have a latest tag to indicate what is the most “recent” version of a collection/dataset

  • Storing molecule connectivity/SMILES

    • Can SMILES/connectivity be stored in molecules? Datasets? procedures? Do we want to update the behavior moving forward?

    • JH – Made it so that MM opts using QCEngine will pass through CMILES

  • Dataset naming

    • JM – Can we group together all of the datasets used for fitting openff-1.2.0?

    • We could make a new dataset, which contains exactly the same inputs as were used for the fitting

    • JH – Could store parent_dataset in each molecule’s metadata

    • TG – Once 1.2.0 is released, we can collect molecules used for fitting from HJ, and make an “aggregate dataset”. This can also include “parent dataset' field

    • JM – What process should we folow for generating and labeling this dataset?

    • JW – Once HJ is done with the fitting, we can meet to go over the QC data and identify which molecules should be included in the new dataset

  • BP – Input for a dataset interface refactor?

    • BP – I think the optimization datsets have the best interface so far.

    • BP – I don’t like that some dataset types, like Hessians, don’t offer a lot of fine-grained control over submission. YOu can basically just call “compute”, and you don’t have a lot of control over what happens for each individual job

    • JH – It’s tricky that I can’t get the gradient from a geometry optimization without renaming the daaset

    • BP – Don’t like vagueness around “get data”, “get record”, etc…

    • TG – In my code, I collect a ton of IDs, and I don’t know whether they’re molecules or records. Would be good to have more clarity on that

    • JH – Agree that it would be good for normal datasets to look like optimizations.

    • TG – Having a special dataset name or set of docs for hessiandataset would be helpful

    • JH – Would be nice to have a link between optimizations and hessians

    • BP – When people do a property calculation, they want it linked to an optimized geometry. So this is kinda a general problem I should tackle

    • BP will open an Issue for more feedback on this, will share link on #qcfractal

  • CS – What time is QCSubmit meeting tomorrow?

    • 9 AM PAcific / noon eastern

 

 

 

Action items

Decisions