2020-05-29 QC Submission Meeting notes

Date

29 May 2020

Notes

What should molecule names be?
- These will be limited to 100 characters and are all lowercased
- The 100 character limit will fail for molecules with 30-40 heavy atoms
- The lowercasing will mangle SMILES, because lower/upper case indicates aromaticity. We could reconstruct the aromaticity by doing explicit H SMILES, but this would hit the character limit and require some weird interpretation.
- For now, molecule name will be canonical isomeric smiles
We’re making a “polishing” script and will commit that into branch.
Should we include hessians?
- JW – Let’s not, for now, since this is a bit experimental
There are lots of options in Josh’s JSON user optimzation_procedure. Should we include all of them?
- Yes
Why are IDs being appended with characters? (like a, b, c)? See polishing scripts increment_mapping and such.
- We’d prefer for these increments to be integers (like, when the same molecule appears many times), but there’s a comment here that indicates that this is a SECOND layer of mapping, on top of integers. So we’re going to leave this as-is.
Notes for Josh
- BP – program, method, basis, driver keywords should move into QC spec.
  - TG – Should be able to support multiple specs in one submission.
- Where should metadata be injected? (it’s probably obvious, we just haven’t dealt with it)

Notes from David + Trevor:

Trevor:
- we want to aim for COMPLETE on all datasets
- for cases where COMPLETE is not possible for a molecule, drop in a new version of dataset
- major version e.g. 1.0 would give us an indication of complete