Dataset status | @Jeffrey Wagner | Low compute the past few days - Another lab on NRP has been submitting a lot of jobs Apparently we set a record for compute https://twitter.com/MolSSI_NSF/status/1777785574318862703 RNA torsiondrives are chugging along, many TDs finished XtalPi torsiondrives are underway, long way to go XtalPi opts dataset seems to be too big to handle, I’m unable to set compute tag. LW: Is it larger than other datasets? Thought it was comparable JW: Think it’s the largest since the QCArchive update, could be related to that. Ben is looking into it but it’s not the most urgent thing we’ve requested from him
|
Nautilus onboarding | @Jeffrey Wagner | NRP onboarding by j-wags · Pull Request #353 · openforcefield/qca-dataset-submission Recording here [requires OpenFF credentials] Everyone has infinite compute budget, but when you run something, it has to actually use the resources you request, or else the whole namespace will be punished Primary way to get in trouble is spinning up lots of workers with nothing to do, e.g. low % utilization If you’re running jobs (esp with > ~10 cores), please monitor general chat on Matrix, as admins will tag you there before imposing punishment
manager.yaml has a queue_tags flag that can be used to manage multiple submissions with very different resource needs
Save manager settings as a secret (see “Playbook” in the README) so that credentials can be shared with workers to access QCA. Each secret has to have its own name, and must make a new secret whenever you change the CPU/RAM deployment accessed by kubernetes, manager by QCA kubectl… get pods shows running pods kubectl logs… gets logs of workers kubectl delete… cancels jobs on a deployment Can monitor useage here
|