Holding off on submission until items pointed out by TG are addressed
Submission standards
TG: Let’s reopen the standards and guidelines issue #111. I want to add more to this such as torsion connectivity and other checks
TG: I will make a Python module in qca-dataset-submission with functions that can be executed on submission data to validate it meets our submission standards. Some of this tooling can make its way into QCSubmit later.
Lifecycle of a submission
DD will write up a LIFECYCLE.md document giving the operational approach we take for datasets after submission. This operational approach will need to be consistent with the STANDARDS.md doc.
INCOMPLETE Optimizations in many datasets
DD: We see long-lived INCOMPLETE Optimizations in many TorsionDrive datasets. I currently have no mechanisms for dealing with these, and it’s not clear what the root cause is.
Root cause possibility: Psi4 exits but exit code isn’t transmitted to parent process?
Can we add a way to force INCOMPLETE Optimizations into an ERROR state so they can be recomputed? Currently they are forever stuck in limbo, though killing the source manager may address the issue based on local tests
These cases appear to be happening overwhelmingly on Lilac; I have been trying to get John to verify that all manager processes are actually dead, but haven’t had luck getting his attention yet. I will draft an email and CC everyone.
BP: Not exactly clear at what layer this happening (QCEngine<->Psi4, QCF Server<->QCF Manager, QF Manger<->adapter queue, etc.); could be multiple contributing problems that create a rare enough event (but common enough to clearly happen in many cases)
DD: I will draft a PR to address the QCEngine issue; this problem is important for us to resolve, and this may be the contributor. I will send BP the INCOMPLETE Optimization specs so he can investigate the backend.
Will continue running local tests, watching for the behavior to emerge again.
DD: Is a good strategy to submit calculations with different input settings, then filter out the ones that worked?
TG: Can we submit arbitrary options to Psi4, to help with convergence issues?
BP: Should be able to by using the extras field
TG: Are there level shifting and fermi smearing options in Psi4?
BP: (follow up 6/16): No there are not. There are DIIS options and damping that can possibly help.
Local testing
TG: Can you (DD) put together a set of instructions for standing up a local QCA instance as you’ve done? Also, separately, how to run individual specs with QCEngine?
DD: Yes, will draft and add to qca-dataset-submission.
Action items
David Dotson will draft QCArchive test environment stand up instructions; QCEngine execution instructions.
David Dotson will write up a LIFECYCLE.md document for how datasets are managed post-submission (or really, the full lifecycle, where submission is one component event).
Trevor Gokey will put together a Python module with validation functions we will use as part of enforcing our STANDARDS.md.
Ben Pritchard will investigate the INCOMPLETE tasks on the backend; this should give clues as to the root cause as well as compensating pathways we can take
Add Comment