2021-12-10 QC meeting notes

Participants

@Jeffrey Wagner
Ben Pritchard
@Pavan Behara
@Joshua Horton
@David Dotson
@Chapin Cavender

Goals

Updates from MolSSI
- deploying psi4 1.5, incompatibilities of qcelemental 0.24, qcengine 0.21 with qcfractal 0.15.7
Compute
- QM workers on Lilac
- XTB workers on Newcastle
- QM workers on TSCC
- QM, ANI, XTB workers on PRP
New submissions
- submission issues with OpenMM datasets - Gateway Timeouts
- dipeptide dataset
  - brand_raw errors on PRP
- ML datasets for OpenMM
  - multi-molecule issues with QCElemental - psi4 bug fixed with QCSubmit
User questions/issues
Science support needs
Infrastructure needs / advances
- psi4 on conda-forge

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Updates from MolSSI	Ben	BP: updated `nginx` on gateway, we were running BP: Thinking about resolutions for storage issue. May need OpenFF’s opinion on some things DD: Could chat right after this call BP: Great. Let’s do that BP: refactoring is moving along, things are getting easier DD – I’m excited for in-server error cycling and being able to slice requests to just get specific result(). BP – You currently can’t pass the protocols (like orbitals and eigenvalues) for a single-point optimization, right? Because they’re not available in QCSchema. This is because qcengine depends on qcschema. There would be some other options, but updating this should help DD - Would this enable us to do something like saving the wavefunction only on the LAST frame of an optimization? BP – That would be a new protocol, but this would be possible. DD – Yesterday I was doing some research for the protein-ligand automation and was looking into pragmatic practices for RESTful APIs, may be able to offer guidance.
Compute		TSCC CC – 2 managers, 99.19 % and 100 % success. DD – will spin up QM workers with this compute tag on PRP, watch for errors; haven’t observed errors locally on this set `brand_raw` issue DD – All of the `brand_raw` issues that I’ve seen have been from PRP can check `py-cpuinfo` in old Docker image prior to current one on PRP; may be a candidate for pinning Previous py-cpuinfo issue breadcrumb: DD – Newcastle workers are doing great, not failing with memory issues. Thanks, JH. JH – For gfn0 there’s an environment variable that needs to be set. It’s something like `share/xtb` that needs to be pointed to. But it’s no problem for me to keep running these at newcastle. `export XTBPATH=<path_to_miniconda>/miniconda3/envs/qcarchive-worker-openff-xtb/share/xtb` DD – Had ANI and XTB workers on PRP, they weren’t getting work so I shut them down. DD – Are any queues/datasets not progressing fast enough?
New datasets		Dipeptide 2-D TorsionDrives CC – ~2000 optimizations per day DD – I suspect that our previous issues were from a really flaky manager that would starve out the stable managers. So I’ve also added my local computer and seen high success here. Now I’ll try getting managers going on PRP. CC, do you think that this would be explained by a flaky node on TSCC? CC – This would be consistent with a flaky node at TSCC. Pubhcem sets 1-6 SPICE set This is a revision of the earlier set, needed wcombine patch DD – Trying to submit the openmm datasets above, we’re getting gateway timeouts. Tried running locally and still got gateway timeouts. BP – Is it just too big? DD – I think so, the collection object gets pretty massive. I haven’t been able to submit any of them. Not sure what options we’d have to resolve this on the MolSSI side. BP – It does seem likely that it’s too big. I could confirm if we tried to submit it while I watched the server. D D – Could we try that after this session? BP – Yes. DD – Multicomponent molecule issues? PB – JH added a funciton to QCSubmit to split a multi component OpenFF molecule, and this seems to be working. PR 257 PB – When we first tried this, it didn’t go through. Do we need to bump the version on this? DD – Let me look into this (queries server)… The 1.0 version does exist and has some records, so let’s call this resubmission 1.1.
User questions or issues		DD – Worked with Willa last week on running her own QCFractal instance on their resources. So Willa’s using this locally to iterate on RESP datasets.
Science support needs?
Infrastructure needs		DD – psi4 conda forge port? JW – I haven’t heard anything new about this, probably still blocked on libint2 issues
Other topics		JW – we have a lot of multicomponent things in the toolkit, and considering how to make these single component (currently a `Molecule` object can have multiple unconnected sets of atoms) JH – we do take advantage of this, actually; we use this as a way to maintain an order to the atoms of the whole system did have to work out a way to split up the system, which we did with `networkx`, but yeah we currently rely on this behavior in the toolkit JW – if you have a fragment, it has to be contiguous? JH – the only place this would cause issues is part of the workflow; would need to switch to a list of Molecules perhaps if RDKit can do this by itself, give it a list of Molecule objects, let RDKit handle it would get us a hash for the whole system JW – topologies are going to be hashed, too, soon

Action items

@David Dotson will compare versions of pycpuinfo between previous PRP Docker image and current (they are the same version, 8.0.0)

@David Dotson will set env variable from @Joshua Horton on PRP XTB deployment

@David Dotson will scale up openff-tscc compute tag execution on PRP

@David Dotson will troubleshoot submission issues with Ben; deploy workaround or solution

@Pavan Behara will set the version of qca-dataset-submission#257 to v1.1

2021-12-10 QC meeting notes

Participants

Goals

Discussion topics

Action items

Decisions