DD Will set up a survey to look into the types of queues and resources available at the partners and send out this soon.
XL how hard is it to install psi4 as we need this in our cluster.
DD We have a lot of experience with this in our public archive runs, we have many production envs available and this should be very easy for everyone to setup. DD shows live example of how easy this is to setup.
7 parties interested in running calculations on public QCArchive
1300-1400 compounds each, selected according to individual criteria (as discussed)
structures generally patented
GT: should it be a 3D SDF with single conformation, or multiple?
DH: 3D SDFs better because SMILES don’t have chirality information
XL: may choose things similar to what’s in pubchem, elsewhere
JH: if we don’t specify everything, we’ll be relying on rdkit to generate conformations, etc.
DH: we’re planning for conformer generation in the workflow
JH: could fill in the gap if not provided
GT: default is folks provide a single conformation at least, we generate/fill in up to 10.
GT: what about charged compounds?
going to need to know the bonding for charged molecules in particular
JH: if it is charged, definitely want charge specified; all the initial fits weren’t done on charged molecules
GT: think you will be able to handle them; question remains on the basis set
JH: if the charges are defined in the file, that would help
GT: need to determine which fields in the SD file we use
just id, charge?
GT: a thousand neutral molecules in 3d with hydrogen, then as part of the workflow charge them with rdkit
DD: the public QCArchive submission can be used as a test approach with high visibility; can decide today on a reasonable, perhaps minimal input spec and see where problems arise
GT: neutral 3d input, take a week to look for a reasonable open source ionization predictor
Conclusion: neutral 3D input, with hydrogens specified; deferred decision on protomer enumeration
we will pursue a open-source solution for protomer enumeration; if none, up to each partner if they want to do it, and what tools they use
6-31+G** of interest, running a test set over next few weeks
GT: sticking to the same basis set is definitely important
DD: should we ask Hyesu to present basis set conclusions at next call?
sounds good, will ask if she’s willing and ready
Action items
Joshua Horton will do a research cycle on existing open-source protomer enumeration software options
David Dotson will reach out to Hyesu to schedule a presentation on basis set performance findings in ~3 weeks
David Dotson will prepare MM compute spec for PhAlkEthOH dataset for our benchmarking tooling evaluation and testing
David Dotson will schedule implementation coordination same time as this call each week through December
Gary Tresadern will communicate conclusion on input format: neutral 3D SDF with hydrogens specified; individual choices on selection criteria for compounds
David Dotson will complete software approach proposal; split out component work into issues on openff-benchmark to coordinate development
Add Comment