DD – Working on standardizing how we handle dataset back-and-forth between submitter and MolSSI/database manager
Project board:
JW – Cool. Would it be possible to get an estimate of how many jobs ran successfully the first time?
DD – I can generate this stat
JM – Do you think the incomplete jobs are actually still running?
DD – We don’t know, and that’s what we’re investigating. There may be a bug in psi4 at the root of all this. We think that most of the incompletes aren’t really running.
JM – Have you looked at the structures?
DD – No, when we think it’s a problem witht ehs tructure, we’ll interface with the submitter.
JH – For things like scf issues in psi4, do you tweak the convergence criteria before resubmitting?
DD – No. If something like that needs to be investigated, we’ll work with the submitter to identify what needs to be changed. I envision that, if only some molecules fail, we’ll modify and resubmit only those, and have the final dataset put all the “good molecules” under the umbrella.
JH – It’s possible, with SCF convergence errors, you can just change some psi4 keywords to resolve them
DD – I think that changing those keywords produces a different job on QCF, right BP?
BP – Right, it’s technically a different job.
JW – Naming? Should a dataset, after many submissions, ever be tagged “final”, or should we have a version number that always increments? It may get complicated when a dataset is “technically” finished, but human error caused some molecules to be excluded from the initial submission.
DD – I’m in favor of incrementing version numbers. “Final” is a recipe for disaster.
(General) – Yes, dataset naming is really hard
DD – Do gen2 v1 and v2 datasets overlap? or are they completely distinct?
JM – I think they’re completely distinct.
JH – The jobs that are failed with NaNs – Are those due to naming issues (those are suffixed with letters)
DD – All the NaN jobs appear to be suffixed with letters. Good observation.