What should molecule names be? These will be limited to 100 characters and are all lowercased The 100 character limit will fail for molecules with 30-40 heavy atoms The lowercasing will mangle SMILES, because lower/upper case indicates aromaticity. We could reconstruct the aromaticity by doing explicit H SMILES, but this would hit the character limit and require some weird interpretation. For now, molecule name will be canonical isomeric smiles
We’re making a “polishing” script and will commit that into branch. Should we include hessians? There are lots of options in Josh’s JSON user optimzation_procedure . Should we include all of them? Why are IDs being appended with characters? (like a, b, c)? See polishing scripts increment_mapping and such. We’d prefer for these increments to be integers (like, when the same molecule appears many times), but there’s a comment here that indicates that this is a SECOND layer of mapping, on top of integers. So we’re going to leave this as-is.
Notes for Josh BP – program, method, basis, driver keywords should move into QC spec. Where should metadata be injected? (it’s probably obvious, we just haven’t dealt with it)
Notes from David + Trevor: Trevor: we want to aim for COMPLETE on all datasets for cases where COMPLETE is not possible for a molecule, drop in a new version of dataset major version e.g. 1.0 would give us an indication of complete
|