/
2021-11-19 QCA Submission meeting notes

2021-11-19 QCA Submission meeting notes

Participants

  • Ben Pritchard

  • @Chapin Cavender

  • @Pavan Behara

  • @Joshua Horton

  • @David Dotson

  • @Jeffrey Wagner

  • @Willa Wang

Goals

  • Updates from MolSSI

  • Compute

    • updated prod envs to use latest qcelemental, qcenegine, qcfractal

    • QM workers on Lilac

    • QM workers on Newcastle

    • QM, ANI, XTB workers on PRP

  • New submissions

    • submission issues

    • dipeptide dataset

    • imposed electric field in QM - wavefunction not stored for optimizations

    • ML datasets for OpenMM

      • multi-molecule issues with QCElemental

  • User questions/issues

  • Science support needs

    • new openff-qcsubmit release

  • Infrastructure needs / advances

    • psi4 on conda-forge

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Updates from MolSSI

Ben

  • BP: been deep in refactor-land; excited about directions for things

    • working on single-point components, going better than I thought it would

    • definitely pleased to have the space to do this kind of work now through beginning of new year

    • JW – Rough ETA? We have a big API break coming in Feb

    • BP – Anticipating release in Feb, alpha in Jan

  • BP – There was a QCEl release last night, but this should be OK since I was careful with pinning.

    • DD – Sounds good. We just updated prod envs and pinned to 0.23

Compute update



  • DD – Updated the prod envs, bumped lots of versions QC* and openff-* versions. Only old package we’re still using is pytorch, we’re staying old on this since torchani hasn’t had a release in the past 1.5 years. .

    • JW – wanted to check if we should use <= pins since we know that OpenFF and MolSSI will have big API breaks in Feb

    • DD – The prod envs all use exact pins

  • DD – JH, PB, CC, did you update managers recently?

    • JH – Yes

    • PB – Yes

    • CC – I can do that this afternoon

  • DD – Where all do we have production workers running? I know we have Lilac, PRP, Newcastle

    • PB – We have public QCA workers on UCI HPC3 as well. Though they’re pre-emptible so they may not finish stuff. I’ll note this on the #qca-compute channel.

  • DD – Do we need more of any type of workers?

    • PB – The XTB jobs aren’t proceeding.

    • DD – I don’t understand what’s going on with these. I’ll investigate these moving forward. (PR #237)

    • JH – May be worth turning off error cycling, in case there are some jobs that will always fail, that will “clog” up the compute every time they get resubmitted.

    • DD – Sounds great. I’ve turned off error cycling on the XTB dataset for now, will turn it back on in a few days. I’ve also started more XTB workers on PRP.

New submissions

 

  • submission issues

    • OpenMM datasets

      • PB – We have issues with datasets being very large. The PubChem set has 100k+ molecules for single-points. So I’ve chopped this up in different ways but even the smaller datasets aren’t submitting.

      • DD – Let me try manually accessing the dataset from #243 on my computer. I’d previously run ds.compute locally to get it submitted. But there’s some issue with batching where the GHA isn’t connecting/able to access it.

      • PB – Do you think there’s a problem with the GitHub Acitons user API limit that’s causing this?

      • DD – I don’t think so. Previously we’d had an issue where we hit the GH API too often using the lifecycle management, and I’ve fixed that. In this case, though, we’re calling compute on a dataset. On GH actions, I consistently get a gateway timeout. On my local box there’s no problem. This just started happening recently on some datasets.

      • JW – Are these datasets abnormally large?

      • PB – No

      • DD – Are any datasets succeeding?

      • PB – Yeah, the recent dimer/DES/torsionnet submissions went through just fine.

      • DD – (the #243 dataset records were retrieved locally) I’ve accessed these and they all have result records. Maybe they’ve just never been prioritized in the queue because they’re medium priority?

      • (general) – The #243 dataset shouldn’t have had its basis and functional split - BP made a fix in the last fractal release that would make sure the first steps ran BEFORE calling the d3 step, but if there was no other step, then the d3 would be run immediately. However it was submitted with an old version of QCF that did the split/submitting ordering.

      • DD – Ideas for next steps?

      • BP – If you go to recompute that dataset with the new version of QCPortal, then the problem should solve itself. This will be kinda awkward because two different versions of the software will be talking to each other. I’d thought about having the server ask about the client version and restrict access for some versions.

      • (General) – We’ll resubmit the dataset using a new version of QCF/QCPortal, and the new dataset will also go up on QCA. Then, any client with the new versions of everything will “see” the dataset correctly, though a client with an “old” version of QCPortal will not be able to access it correctly. But this would be the case even if we made a new dataset – The problem is rooted in how the client “sees” the dataset, not the dataset itself.

  • dipeptide dataset

    • CC – one holdup was how to avoid deduplication on rotamers for torsiondrives; had to make the dataset manually, but not sure if there’s a better solution?

    • JH – Are you thinking of just the initial deduplication step?

    • CC – Yes. There’s a flag in another step called assume_unique_molecules, and if we exposed that in the dataset factory that would help.

    • JH – One hard thing is that there’s a part where inputs are hashed by their fixed-hydrogen inchi key. So if they weren’t deduplicated then there are places where they’d overwrite/clash. So I don’t think this would be totally straightforward.

    • CC – I’ll open an issue to discuss this.

  • imposed electric field in QM - wavefunction not stored for optimizations

    • DD – I met with WWang to get QCF running on UCSD resources. Some additional questions that I haven’t answered yet - Can discuss now.

    • WW – I have things running both on TSCC and locally. Not sure how to access resultrecords.

    • DD – You’ll want to connect to QC server using the client… (I’ll write this up later). If you try to transfer the data by copying the database file you’ll have a bad time. You’ll want to access the results using the API, same as if you were accessing the public QCA. BP, is there a way to store the state of the database on file?

    • BP – The databse has a backup function. Alternatively, postgres in general has a command pg.base_backup (or something similar) that does things safely.

    • WW – TSCC really likes rebooting things, so I’m really interested in ways to make safe+transferrable backups.

    • DD – I could work with you on this. Alternatively, PB does a lot of work with this, and he treats his databases as disposable, and just pulls results off of it all the time.

    • WW – I’d like to make a single point dataset using starting geometries and get it running on TSCC. Could we meet after the Thanksgiving holiday?

      • DD – Yes, that works for me!

  • ML datasets for OpenMM

    • multi-molecule issues with QCElemental

      • JH – PE was canonicalizing

      • JW – would these be solved by e.g. Topology.to_qcschema?

      • JH – yes, that would make sense, some way to get the mapped smiles for entire system; canonicalize individual molecules separately, instead of all together

        • Topology.to_qcschema() and Topology.canonical_order_atoms Topology.to_smiles(mapped=True) methods would be nice for this

      • JW – are the multi-molecule datasets recoverable?

      • JH – they are all failing in compute, so they are unrecoverable; would need new submissions

        • can try out solutions locally, then can report back to Peter on what changes to make to get these through

 

 

  • User questions/issues

  • Science support needs

    • new openff-qcsubmit release

      • JH – New functionality will mostly around checking inputs, trying to head off issues with multi-molecule inputs. Release ETA next week

  • Infrastructure needs / advances

    • psi4 on conda-forge. MT is still scoping this, no firm commitment or timeline yet.

Action items

@David Dotson will monitor qca-dataset-submission#237 for XTB progress, keep XTB deployment on PRP in place
@Chapin Cavender will update prod env on UCSD cluster in use for QCA manager(s)
@David Dotson will re-compute pubchem_set1 using latest QCPortal
@Chapin Cavender will open an issue on openff-qcsubmit on a proposed flag for assuming unique molecules, not deduplicating rotamers on e.g. torsiondrives
@David Dotson will schedule a time post-Thanksgiving to meet with @Willa Wang, solution local QCFractal server backups, submission and data retrieval for experimentation
@Joshua Horton will try out local solutions to multi-molecule canonicalization, communicate to Peter Eastman on changes to submission to make these work

Decisions