| | |
---|
Update Dataset Tracking | Project Board; Slides Complete PR 427: “OpenFF Cresset Additional Coverage TorsionDrives v4.0“ Complete PR 428: “SPICE Dipeptides Partial Relaxation Dataset v4.0“ Started PR432: “OpenFF Protein PDB 4-mers v1.0“ Close to Submitting PR for metals. I think we can use QDS
| LW: would like to check in on the protein PR since it’s so close to completion! There’s a couple errors. AF – Re protein ones, I was expecting a nonzero number of errors. Can take this into consideration for dataset expansion JW – I propose end-of-lifeing LipidMAPS Lipid benchmark
|
QDS Issue | Retagging CI does not Retag … when “_mw“ feature is not used? Only difference is that ds.modify_records() without “mw_“ uses the specification_names keyword and with “mw_“ does not. I traced it back and didn’t find a difference in record retrieval Last time with Cresset we assumed it was a caching issue which should only affect what I see locally… but something it in the timeline of what I did, didn’t seem right. Even here, I ran the submission CI, waited a bit, ran the reprioritize retag and not only did I not see the changes locally, when I submitted the job nothing was picked up with that tag.
| LW: odd, I think the PR I quickly put through was retagged through CI. Maybe it needs to be manually run? LW - To reiterate - manual retagging sporadically works, CI retagging never works? JW – Maybe tasks are unable to be retagged if the server thinks that there’s a worker working on them (like, if a worker claims the task and then is shut down ungracefully, the server might wait hours to ensure the worker is gone) JC – I observed a lot of the weird behavior last friday We will ask BP: Whether there’s anything that could cause a python API call to retag a submission to return successfully, but not have task available for workers If there’s any sort of backups/regular process that runs Fridays that might change server responsiveness.
We’ll icebox this ticket
|
Update GitHub Actions for QDS: Avoid qcsubmit in lifecycle | The TM complexes won’t run through QCSubmit because neither toolkit is reliable. We can easily bypass validation, and I think we can alter the GitHub Actions LifeCycle to avoid QCSubmit easily. If we import the dataset.json into a QCFractal dataset instead of QCSubmit, and add a bz2 deserializer to QDS, it should be straightforward to make this change. See notes | LW: Having had some time to think about this, I think I'm in favour of spending the time to make edits to QCA-DS CI to use its error cycling and retagging functionality with QCFractal datasets via an alternative sqlite pathway, and mostly ambivalent on whether or not to rewrite the validation function vs turn it off. It's worth thinking about why we might want to use QCA-DS in the first place and what we want to include in it; at the end of the day with Jen's machinery it's not essential to use its error cycling and retagging functionality, but instead the repo (from my POV) is: a) convenient b) allows external users to track what's going on c) holds a record of what was done and at least the input dataset. For this last reason I don't think it's very useful to have a stub JSON dataset that just contains the dataset name and type for error cycling purposes; dumping the entire QCFractal object seems more useful to me. Unless there's a good reason otherwise, I think treating QCFractal datasets as sqlite files and QCSubmit as jsons would be much clearer then having everything be a JSON. Alternatively if people value how readable a JSON is, I'd be in favour of a clear file-pattern that makes it obvious when something is QCSubmit vs QCFractal, like qcf-dataset*.json* . If I understand the notes correctly this also makes parsing the object slightly easier Finally some validation does seem useful to me, if only checking metadata fields, elements, specification and so on, but also sounds like it would be a lot more work/code to add this functionality in just for QCF dataset objects. Long-term if this became a pathway for others to submit datasets bypassing qcsubmit we'd want this though.
After this option, IMO the next best would be to do everything off QCA-DS to save effort on the required code and maintenance. This seems like a suboptimal choice though. a) we would like a record of this dataset somewhere eventually (although it doesn't have to be QCA-DS) b) eventually Jen will have additional datasets to work through and the QCA-DS interface is a nice one that allows others to review, comment and participate A lot to think about will follow-up at a later date. Meeting adjourned
|
MolSSI Info / Align Priorities on MolSSI Asks | 2025-03-04 QCA User Meeting New from last QCAUM meeting: Dataset entry/spec/record copying! Doesn’t actually duplicate records, just links to the existing one in the new dataset. Also, Records and specifications can’t already exist in destination dataset (can’t have same name) Cool QCBrowse demo!
| |
Update on clean force field releases | Recent QCFractal update should be great. Josh showed me the ropes with docker images Should we have a docker in each zenodo repo, or make a docker image instance in zenodo that is referenced and periodically updated. | |
Old Issue of the Week | One-click QCArchive data (8/2019) BONUS: Automating QCArchive dataset submission (9/2019) BONUS: Add collection tags to lifecycle (8/2020) | |