/
2024-04-09 QCSubmit dataset submission meeting

2024-04-09 QCSubmit dataset submission meeting

Participants

  • @Jeffrey Wagner

  • @Alexandra McIsaac

  • @Lily Wang

  • @Brent Westbrook (Unlicensed)

  • @Chapin Cavender

  • @Matt Thompson

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Dataset status

@Jeffrey Wagner

  • Low compute the past few days - Another lab on NRP has been submitting a lot of jobs

    • We’re running two kinds of workers on NRP now:

      • RNA: 50x (8 core, 40GB RAM)

      • regular (for xtalpi sets): 1000 x (4 core, 6GB RAM)

  • Apparently we set a record for compute https://twitter.com/MolSSI_NSF/status/1777785574318862703

  • RNA torsiondrives are chugging along, many TDs finished

  • XtalPi torsiondrives are underway, long way to go

  • XtalPi opts dataset seems to be too big to handle, I’m unable to set compute tag.

    • LW: Is it larger than other datasets? Thought it was comparable

    • JW: Think it’s the largest since the QCArchive update, could be related to that. Ben is looking into it but it’s not the most urgent thing we’ve requested from him

  •  

Nautilus onboarding

@Jeffrey Wagner

  • NRP onboarding by j-wags · Pull Request #353 · openforcefield/qca-dataset-submission

  • Recording here [requires OpenFF credentials]

  • Everyone has infinite compute budget, but when you run something, it has to actually use the resources you request, or else the whole namespace will be punished

    • Primary way to get in trouble is spinning up lots of workers with nothing to do, e.g. low % utilization

    • If you’re running jobs (esp with > ~10 cores), please monitor general chat on Matrix, as admins will tag you there before imposing punishment

  • manager.yaml has a queue_tags flag that can be used to manage multiple submissions with very different resource needs

  • Save manager settings as a secret (see “Playbook” in the README) so that credentials can be shared with workers to access QCA. Each secret has to have its own name, and must make a new secret whenever you change the CPU/RAM

  • deployment accessed by kubernetes, manager by QCA

  • kubectl… get pods shows running pods

  • kubectl logs… gets logs of workers

  • kubectl delete… cancels jobs on a deployment

  • Can monitor useage here







Action items

Decisions

 

Related content