| |
---|
Hanging INCOMPLETEs on OpenFF Sandbox CHO PhAlkEthOH v1.0 OptimizationDataset | BP – Could you send some IDs from the failures in this set? DD – [OptimizationRecord(id='32693660', status='INCOMPLETE'),
OptimizationRecord(id='32693661', status='INCOMPLETE'),
OptimizationRecord(id='32693662', status='INCOMPLETE'),
OptimizationRecord(id='32693878', status='INCOMPLETE'),
OptimizationRecord(id='32693879', status='INCOMPLETE'),
OptimizationRecord(id='32693880', status='INCOMPLETE'),
OptimizationRecord(id='32693891', status='INCOMPLETE'),
OptimizationRecord(id='32693892', status='INCOMPLETE'),
OptimizationRecord(id='32693896', status='INCOMPLETE'),
OptimizationRecord(id='32693897', status='INCOMPLETE'),
OptimizationRecord(id='32693898', status='INCOMPLETE'),
OptimizationRecord(id='32693906', status='INCOMPLETE'),
OptimizationRecord(id='32694805', status='INCOMPLETE'),
OptimizationRecord(id='32694806', status='INCOMPLETE'),
OptimizationRecord(id='32694807', status='INCOMPLETE'),
OptimizationRecord(id='32694932', status='INCOMPLETE'),
OptimizationRecord(id='32694933', status='INCOMPLETE'),
OptimizationRecord(id='32694934', status='INCOMPLETE'),
OptimizationRecord(id='32701335', status='INCOMPLETE'),
OptimizationRecord(id='32701336', status='INCOMPLETE'),
OptimizationRecord(id='32703639', status='INCOMPLETE'),
OptimizationRecord(id='32703640', status='INCOMPLETE'),
OptimizationRecord(id='32703641', status='INCOMPLETE')] BP – These look complete from my end. The root cause could be a known bug that I thought we fixed. These were created/submitted AFTER we fixed the bug, though. (General) – The status here is showing incomplete, but these have final molecules associated with them. DD – Could we manually flip these to complete for now? DD + BP will continue discussion offline
|
Large datasets choke REST API:
| https://github.com/openforcefield/qca-dataset-submission/pull/208 DD – We recently tried to do a large expansion of a dataset. When we tried to add the MM compute spec, the metadata was too large for the upload. BP – You’ll want to be able to have an endpoint that looks like /collections/#/entries . I may be able to increase that limit without increasing the manager upload size limit (which IS necessary). The current upload limit is 100MB for this, and 500MB for manager uploads. So I’ll bump the 100MB limit to 250 and we can see if that gets fixed.
|
Torsiondrive of propane hangs with Snowflake | SB – Torsiondrive of propane just hangs, no real way to influence it forward. Ethane and butane work fine. DD – I’m hoping that we can let you run this through geopt in the future. Should I prioritize that? SB – I haven’t had too much trouble using QCEngine directly, so this isn’t urgent. Would it make sense to make a torsiondrive procedure in qcengine instead of putting it at the geopt layer? BP – I’ve thought of this before – It wouldn’t be too much of a problem in QCEngine, but it could get really complex if we try to mirror that in QCFractal DD – I’d thought about this too – It would simplify things by putting more of the process parallelism under one roof. So the thought would be that this could be a qcengine compute_procedure , but this looked a bit complex to me. SB – I could see how QCEngine may not want to have too much nested under it. But generally having a lighter-weight endpoint for running torsiondrives, either in geopt or qcengine, would be a big help. BP – Agree. I’ve had trouble answering how to just run a single torsiondrive to new users. With respect to implementing this, I don’t think it’d be too hard to just call .procedure and have the torsiondrive down there. SB – Would it be helpful for me to open a PR to QCEngine for this functionality? DD + BP – This would be a compute.py → compute_procedure(<input_data with some input specific to torsiondrive>, ”geometric”) call JW: Can see why we chose to build things out in openff-gopt , if some decisions within QCEngine may take longer to resolve around how input structures should be specified BP: #264 is something I’d like to resolve in the medium term, it is relevant to QCFractal in particular SB: we’ve hit those same object issues of settings upon settings applied, keeping it from becoming a mess
|
“Experimental” / “study” submissions (Trevor, Simon) | TG: there’s a dataset that is used in the refit; there are issues about how we name things and the intention in the naming SB – Issue with naming came from my uncertainty in trying to follow this. It seems like this almost needs two version numbers, but I do appreciate that you’re driving standards at all, and I appreciate how complex this is. SB: question for David and Ben: are there constraints on length and characters for spec names? BP: I don’t believe so; all just gets dumped into a JSON blob in the DB SB: should we generally do spec names as method/basis-<other-keyword-choices> ? TG – Combinatorically, “what are all the settings that could make a compute spec unique?” is a really hard question. I think the root issue is that dataset names and spec names aren’t validated. But I think we’d benefit from defining a convention within OFF SB – I think capturing method and basis are probably a good starting point. Then things like grid settings and PCM can also be included. TG – Agree, it does get complicated. Though this is a good starting point SB – We may be able to standardize on some grid naming scheme like what Gaussian uses (and I think psi4 also copies some of it).
SB – Also interested to know about status of standards V3 DD – We’ve adopted it, but I’ve been a bottleneck on implementing support in our automation for it. One of our needs is that our system for dataset support+naming has support in QCFractal. I’m hoping to do a retrospective on how our automated submission has been going over the first year of its existence, and include support for standards V3 in the refactor that that kicks off. SB – I had been copying+pasting from previous submissions whenever I make a new one. So having a clean template for submissions would help me avoid making copy+paste errors TG – We currently have an index of dataset names. Do we want to have a similar index for compute specs?
DD: one switch we can use for experimental datasets is setting their visibility to False on submission; would make it unlikely for consumers to stumble upon and start using seriously JW: my understanding is that we can always create a new collection based on something we found worked well, then point to that in a publication? TG – If we have an existing collection and plug in additional molecules and a different spec, will the old molecules be recomputed with the new spec? BP – Yes. (something complicated that made both TG and BP concerned) (General) – Oh, we should never reuse the name of an existing specification or dataset to mean something different.
|
| |
Submission execution status | DD – We’d noticed that some of SB’s datasets (aniline 2D and additional QM specs) weren’t making progress. This was because they were all high priority, but the industry benchmark set was also high priority and was submitted first, so it was taking all available compute.
|
Science support needs | |
Infrastructure needs / advances | |