Item | Presenter | Notes |
---|
Updates from MolSSI | Ben | BP: We got hammered by a bunch of requests a little while ago (like 100/second) for entire datasets. So I blocked the IP temporarily and worked with DD to find the person in OpenFF and help them access it in a more healthy way. BP: storage utilization creeping up, likely due to more wavefunctions storage do need to plan a next-gen solution JW: what kind of proactive steps can we take for 6 months to a year BP: this factors into our options at ARC (VT supercomputer center) BP: have some stopgap measures on storage I can take, but they are painful JW: where are the wavefunctions coming from? JW: need to track down if Peter+John have funding for their pubchem set that will drive storage needs JW: If this is just a matter of ordering 20TB of SSD storage then we can just go straight ahead with this. DD: It’s not just a short-term question - 20TB may get us a solution in the short term, but we’ll want to do a more sustainable solution in the long term. Could look at options for continued supercomputer hosting, bare metal hosting in MolSSI office, or cloud hosting. BP – If each wavefunction is ~1MB, and we do a million of them, then that’s a terabyte. If there’s 1000 basis functions, then that’s a bit high…
|
Compute | | DD – QM workers on lilac weren’t given time to clean up. Led to weird job statuses. I’ve opened an PR on QCFractal to mitigate this. JH – We still have QM workers running on newcastle. They timed out today so I’ve resubmitted them DD – Great. We may want to switch to XTB, but let’s discuss that later. CC – TSCC is running right now - One job with 8 workers. I can spin up more if needed. DD – We have QM, ANI, and XTB workers on PRP. DD – With XTB, we have two datasets that are error cycling, and seem to have memory issues? PB – I’m not sure whether it’s a memory issue. The error messages aren’t clear. DD – Memory issues are my first guess, I wonder if they’re getting killed by the queue scheduler for having memory usage too high. My PRP workers have 32GB of memory. PB – 32GB should be fine. DD – JH, do I recall that newcastle was having memory issues with XTB workers? JH – For us it was ANI workers having memory problems. I’ll switch these over to XTB DD – Thanks. I’ll tag them as openff-xtb . Should be updated in a few hours.
JW – Is it possible that xtb is just ignoring our memory limits? DD – #223 had had error cycling turned off for a few days to see if the same jobs were killing the workers repeatedly. I’ll turn error cycling back on
|
New datasets | | JH will take over on Folmsburg Huchison test set Github link macro |
---|
link | https://github.com/openforcefield/qca-dataset-submission/pull/255 |
---|
|
Dipeptide 2-D TorsionDrives Large number of errors (>4000) with return message None Errors with brand field from PRP manager Workers on TSCC have low error rate (<5%) CC will deploy additional managers on TSCC DD will debug errors for openff-tscc compute tag on PRP
JH resolved compute issues with OpenMM solvated amino acid dataset
|
Psi4 update | | DD – Problem with basis sets deploying psi4 1.5, incompatibilities of qcelemental 0.24, qcengine 0.21 with qcfractal 0.15.7 JH – I think we DD – So, DD – Can we confirm that the second point is true? BP – The intercompatibility isn’t too bad, it may just work. JH + PB – We could run it with the old versions of everything, just need to set wcombine=False JH – The keyword probably isn’t harmful, so it’s be safest to BOTH update the workers and ALSO submit a dataset with the new keyword. But in a pinch, just resubmitting with the new keyword is a good solution. PB – Agree.
PB will modify the prepared PRs (like pubchem set 2) to have wcombine=False , and then submit them to make sure that they don’t have the problem. If that works, we’ll make a new submission for the dipeptides which also has the updated keywords. PB – We’ll want to be careful with this, this is 100k records so it’ll be a bit wasteful if it is still broken DD – Is there any other reason that we should update to Psi4 1.5? DD – Do we know if there’s a fundamental incompatibility between Psi4 1.5 and the QC stack? DD will test the new versions against each other
|
User questions? | | |
Science support? | | |
Infrastructure support | | |