2023-08-08 OpenFF QCA Working Group Meeting Notes

Participants

Agenda

DD : increase frequency of meeting to once weekly?
- General agreement
- Aug 29: MolSSI will take over
PE : remaining questions on submission, execution in the context of SPICE
- SPICE v1.x was submitted and managed via openforcefield/qca-dataset-submission; how should future iterations of SPICE be submitted/managed?
- input for each dataset is defined by an HDF5 file; where does the code exist to parse this?
- how do we specify spin multiplicity for each molecule in Psi4? This information is not included in the input HDF5 file we used
- what is the preferred route for re-running known bad calculations?
  - previously it was to create a new version of the same dataset with entries translated by >1bohr for each bad record; is this still recommended?
- is there a way to set constraints to restrict Psi4 version to execute with?
- what must I do to set up a worker on a cluster for running calculations? Documentation for this (https://molssi.github.io/QCFractal/admin_guide/managers/setup.html ) is currently empty
- the code reviewed in the PortalClient tutorial is not in the main branch of the repository; is it actually ready to use?
MolSSI QCArchive user group
- user questions / issues / feature request
  - QCSubmit constraints question
- server instance statuses
  - QCArchive Legacy
  - QCArchive OpenFF
    - currently retains everything from Legacy
    - will lose ML datasets at some point
  - QCArchive ML
    - mostly only contains ML datasets
  - QCArchive Validation
    - MolSSI internal projects
  - QCArchive Demo
    - test instance
- compute resources statuses
- call for new users
- trainings
  - upcoming PortalClient trainings
  - upcoming compute manager trainings
- deployed stack versions:
  - QCArchive Legacy
    - 0.15.8.1
  - QCArchive OpenFF
    - 0.50.0b13
  - QCArchive ML
    - 0.50.0b13
  - QCArchive Validation
    - 0.50.0b13
New datasets
- SPICE 2.0
- OpenFF Optimization Diverse Fragments with Iodine (w/ ESPs)
- OpenFF Optimization Diverse Fragments with Bromine (w/ ESPs)
- OpenFF Optimization Hypervalent Sulfurs (w/ ESPs)
- OpenFF DNA
Updates from stakeholders
- OpenFF
- Genentech
- Chodera Lab / ASAP Discovery
- MolSSI
QCFractal development
- QCFractal v0.50.0 - imminent
- v0.70.0 milestone:
Additional business
- MolSSI QCArchive Working Group start date: 8/29

Discussion topics

Notes
PE : questions on submission, execution in the context of SPICE BP : `openff-qcsubmit` DD – SPICE 1 was submitted and managed using qca-dataset-submission repo. openFF was willing to take on the dataset composition and submission for that, since we thought it would be simple. It ended up being more challenging than anticipated. Since SPICE isn’t an OpenFF project, OpenFF declined to do much more past SPICE 1.x. JW – recommended way to proceed depends on what you need to keep track of if you’re starting from graph molecules, `openff-qcsubmit` makes sense to start from however, `openff-qcsubmit` is currently broken (does not support new `PortalClient` API) PE – current recommendation is to use `QCPortal` directly? JW – at the moment, yes; assuming you don’t have graph molecules PE – what’s a graph mol? JW – kekule structure PE – input from HDF5 files was parsed using what code? DD – that implementation lives in `openff-qcsubmit` PE – so can’t be used at the moment? DD – not at this moment; potentially in a few weeks once `openff-qcsubmit`is compatible with latest QCFractal BP – Contents of HDF5? PE – Coordinates and mapped smiles BP – That could probably be brought into its own script. PE – how do we specify spin multiplicity for each molecule in Psi4? This information is not included in the input HDF5 file we used BP – I thought these were all closed-shell PE – I think spin multiplicity still needs to be specified. PB – I think we chose the default one if neutral, if charged… 2*S+1 PE – I don’t know of a simple rule that would give a default multiplicity BP – It assumes closed shell singlet. That should work unless there’s an odd number of electrons. PE – Differnet ways of rerunning the bad calcs when there was a bad version of psi4 DD – What we would have done is created a 1.(N+1), where the entries that we want to recalculate are translated by 1 bohr, and leaving the others alone. So the the new dataset would have all the molecules, but only the mols that we want recalculated would be rerun. Is this still the way we’d want to do it today, BP? BP – In the new version, records can be deleted. But if you want to keep the old version around, I’m planning to implement a feature that will let you submit duplicates. But before that happens you can still do the shift-by-one-bohr trick. PE – How can I pin to use a specific version of psi4? BP – There’s no way to pin to use a specific version of psi4, but you can do it via the tag system. A manager will only pull down records with tags that it accepts. So you can run managers with a specific version of psi4, and then have them only accept jobs with the same tag. PE – So I can tag all my jobs with a psi4-1.6.1 tag, and only the appropriate managers would pull down those jobs. DD – Yes, but the tags are all human-set, so the person who starts up the managers would need to ensure the right psi4 is installed and the tag is set. PE – Which version to use? I’ve tested 1.6.1 but there are newer ones. DD + BP + PE – … DD – So we should rerun these tests on newer versions (1.7, 1.8) before we submit the whole dataset. PE – Manager setup instructions? BP – Working on updating those. The old setup docs may work to some extent but there are some differences. Also there are enough different possible cluster setups that a one-stop-shop compute manager guide will never be complete. PE – New QCF code - Is it ready for use, or should we hold off until it’s finalized? BP – Ready to use. Planning on no breaking changes for a while. Will post new docs and install instructions. New docs at https://molssi.github.io/QCFractal/index.html New packages at `qcarchive` anaconda label https://anaconda.org/qcarchive/ DD – Could we move the changes in `next` to the main branch, and unlabel the conda packages? BP – Not sure, very busy this month, not before september. JW – forwarded contact from quantum computing group at IBM
In progress datasets DD – None Compute resources status DD – None, but working on getting Lilac running. DD – New people who want to run compute managers? CI – Looking at some active learning stuff and incorporating into our code. NF – I’m interested in joining with SPICE 2.0 from Genentech’s side. DD – PE, could be good to collect interested parties with SPICE2 DD – Training session videos are up and linked on the #qcfractal channel. DD – Server software status/versions? BP – CI – With the new branch that’s being developed, are the old things languishing/if we submit issues will things be done? BP – To the `next` branch, yes. To the `master` branch, no. I know there was some trouble with the pydantic version bump. JW – MThompson and JRGuerra retroactively pinned old QCPortal packages, but any future packages will need their recipe pinned. CI – Should I be buidling against new branches? If so, how? BP – Yes. I’ll send docs. JW – Yes, build against new versions. New docs at https://molssi.github.io/QCFractal/index.html New packages at `qcarchive` anaconda label https://anaconda.org/qcarchive/ DD – SPICE dataset discussion? PE – SPICE is intended to be growing with time. So SPICE 2.0 will explore more chemical space, things like protein-ligand interactions, ligand-water, include coverage of boron and silicon. DD – There was a talk at SciPy about other folks who used SPICE in their own work. NF – Myself and other scientists were planning a QC dataset for training potentials, and when we found Peter’s paper it was exactly what we needed. So we’re interested in joining forces. So what we talked about in the SPICE2 meeting was flagging the interesting datasets and providing compute. DD – One of the challenges was that SPICE2 doesn’t have a home - there are resource contributions but no core driver yet. PE – Kinda. We have resources at Stanford that we can provide for doing this. We should have more nodes this time around than last time. Everything is managed through the SPICE dataset GitHub repo. DD – What’s between us and submitting SPICE2? PE – Manager documentation should be the big one, after that I can get it running. LW – OpenFF datasets - These are hung up on QCSubmit but the three listed in the agenda are top priority. Would like to have that submitted in the next month, and have results in the next month or two. Highest priority dataset is Hypervalent Sulfurs. JW – I think we can do that PB – Update on bromine and iodine? LW – Sulfurs are highest priority since we don’t have any of those, I think bromine and iodine would come after that. JW – `openff-qcsubmit` three major issues blocking update QCF API change Removal of ChemicalEnvironment from OFFTK Problematic molecules in existing datasets (like industry benchmark set) I’ll fix 2 and 3 first, then will work on 1.

Action items

David Dotson will create 1.3 versions for SPICE datasets to correct for wrong results as part of Chodera Lab activities

psi4 version 1.61

2023-08-08 OpenFF QCA Working Group Meeting Notes

Participants

Agenda

Discussion topics

Action items

Decisions