Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion topics

Item

Notes

  • What should molecule names be?

    • These will be limited to 100 characters and are all lowercased

    • The 100 character limit will fail for molecules with 30-40 heavy atoms

    • The lowercasing will mangle SMILES, because lower/upper case indicates aromaticity. We could reconstruct the aromaticity by doing explicit H SMILES, but this would hit the character limit and require some weird interpretation.

    • For now, molecule name will be canonical isomeric smiles

  • We’re making a “polishing” script and will commit that into branch.

  • Should we include hessians?

    • JW – Let’s not, for now, since this is a bit experimental

  • There are lots of options in Josh’s JSON user optimzation_procedure. Should we include all of them?

    • Yes

  • Why are IDs being appended with characters? (like a, b, c)? See polishing scripts increment_mapping and such.

    • We’d prefer for these increments to be integers (like, when the same molecule appears many times), but there’s a comment here that indicates that this is a SECOND layer of mapping, on top of integers. So we’re going to leave this as-is.

  • Notes for Josh

    • BP – program, method, basis, driver keywords should move into QC spec.

      • TG – Should be able to support multiple specs in one submission.

    • Where should metadata be injected? (it’s probably obvious, we just haven’t dealt with it)

Notes from David + Trevor:

  • Trevor:

    • we want to aim for COMPLETE on all datasets

    • for cases where COMPLETE is not possible for a molecule, drop in a new version of dataset

    • major version e.g. 1.0 would give us an indication of complete


Action items

  •  

Decisions