2021-12-10 QC meeting notes

Participants

  • @Jeffrey Wagner

  • Ben Pritchard

  • @Pavan Behara

  • @Joshua Horton

  • @David Dotson

  • @Chapin Cavender

Goals

  • Updates from MolSSI

    • deploying psi4 1.5, incompatibilities of qcelemental 0.24, qcengine 0.21 with qcfractal 0.15.7

  • Compute

    • QM workers on Lilac

    • XTB workers on Newcastle

    • QM workers on TSCC

    • QM, ANI, XTB workers on PRP

  • New submissions

    • submission issues with OpenMM datasets - Gateway Timeouts

    • dipeptide dataset

      • brand_raw errors on PRP

    • ML datasets for OpenMM

      • multi-molecule issues with QCElemental - psi4 bug fixed with QCSubmit

  • User questions/issues

  • Science support needs

  • Infrastructure needs / advances

    • psi4 on conda-forge

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Updates from MolSSI

Ben

  • BP: updated nginx on gateway, we were running

  • BP: Thinking about resolutions for storage issue. May need OpenFF’s opinion on some things

    • DD: Could chat right after this call

    • BP: Great. Let’s do that

  • BP: refactoring is moving along, things are getting easier

  • DD – I’m excited for in-server error cycling and being able to slice requests to just get specific result().

  • BP – You currently can’t pass the protocols (like orbitals and eigenvalues) for a single-point optimization, right? Because they’re not available in QCSchema. This is because qcengine depends on qcschema. There would be some other options, but updating this should help

  • DD - Would this enable us to do something like saving the wavefunction only on the LAST frame of an optimization?

  • BP – That would be a new protocol, but this would be possible.

  • DD – Yesterday I was doing some research for the protein-ligand automation and was looking into pragmatic practices for RESTful APIs, may be able to offer guidance.

Compute



  • TSCC

    • CC – 2 managers, 99.19 % and 100 % success.

    • DD – will spin up QM workers with this compute tag on PRP, watch for errors; haven’t observed errors locally on this set

  • brand_raw issue

    • DD – All of the brand_raw issues that I’ve seen have been from PRP

      • can check py-cpuinfo in old Docker image prior to current one on PRP; may be a candidate for pinning

      • Previous py-cpuinfo issue breadcrumb:

  • DD – Newcastle workers are doing great, not failing with memory issues. Thanks, JH.

    • JH – For gfn0 there’s an environment variable that needs to be set. It’s something like share/xtb that needs to be pointed to. But it’s no problem for me to keep running these at newcastle.

      • export XTBPATH=<path_to_miniconda>/miniconda3/envs/qcarchive-worker-openff-xtb/share/xtb

  • DD – Had ANI and XTB workers on PRP, they weren’t getting work so I shut them down.

  • DD – Are any queues/datasets not progressing fast enough?

New datasets

 

  • Dipeptide 2-D TorsionDrives

    • CC – ~2000 optimizations per day

    • DD – I suspect that our previous issues were from a really flaky manager that would starve out the stable managers. So I’ve also added my local computer and seen high success here. Now I’ll try getting managers going on PRP. CC, do you think that this would be explained by a flaky node on TSCC?

    • CC – This would be consistent with a flaky node at TSCC.

  • Pubhcem sets 1-6

  • SPICE set

    • This is a revision of the earlier set, needed wcombine patch

  • DD – Trying to submit the openmm datasets above, we’re getting gateway timeouts. Tried running locally and still got gateway timeouts.

    • BP – Is it just too big?

    • DD – I think so, the collection object gets pretty massive. I haven’t been able to submit any of them. Not sure what options we’d have to resolve this on the MolSSI side.

    • BP – It does seem likely that it’s too big. I could confirm if we tried to submit it while I watched the server.

    • D D – Could we try that after this session?

    • BP – Yes.

  • DD – Multicomponent molecule issues?

    • PB – JH added a funciton to QCSubmit to split a multi component OpenFF molecule, and this seems to be working.

  • PR 257

    • PB – When we first tried this, it didn’t go through. Do we need to bump the version on this?

    • DD – Let me look into this (queries server)… The 1.0 version does exist and has some records, so let’s call this resubmission 1.1.

User questions or issues

 

  • DD – Worked with Willa last week on running her own QCFractal instance on their resources. So Willa’s using this locally to iterate on RESP datasets.

Science support needs?

 

 

Infrastructure needs

 

  • DD – psi4 conda forge port?

    • JW – I haven’t heard anything new about this, probably still blocked on libint2 issues

Other topics

 

  • JW – we have a lot of multicomponent things in the toolkit, and considering how to make these single component (currently a Molecule object can have multiple unconnected sets of atoms)

    • JH – we do take advantage of this, actually; we use this as a way to maintain an order to the atoms of the whole system

      • did have to work out a way to split up the system, which we did with networkx, but yeah we currently rely on this behavior in the toolkit

    • JW – if you have a fragment, it has to be contiguous?

    • JH – the only place this would cause issues is part of the workflow; would need to switch to a list of Molecules

      • perhaps if RDKit can do this by itself, give it a list of Molecule objects, let RDKit handle it

      • would get us a hash for the whole system

      • JW – topologies are going to be hashed, too, soon

Action items

@David Dotson will compare versions of pycpuinfo between previous PRP Docker image and current (they are the same version, 8.0.0)
@David Dotson will set env variable from @Joshua Horton on PRP XTB deployment
@David Dotson will scale up openff-tscc compute tag execution on PRP
@David Dotson will troubleshoot submission issues with Ben; deploy workaround or solution
@Pavan Behara will set the version of qca-dataset-submission#257 to v1.1

Decisions

Â