2020-06-18 QCF User Group Meeting notes

Date

Jun 18, 2020

Participants

  • @David Dotson

  • @Jeffrey Wagner

  • @Jessica Maat (Deactivated)

  • @Joshua Horton

  • Ben Pritchard

Discussion topics

Item

Notes

Item

Notes

 

  • Incomplete jobs

    • DD – Working on standardizing how we handle dataset back-and-forth between submitter and MolSSI/database manager

    • Project board:

    • JW – Cool. Would it be possible to get an estimate of how many jobs ran successfully the first time?

    • DD – I can generate this stat

    • JM – Do you think the incomplete jobs are actually still running?

    • DD – We don’t know, and that’s what we’re investigating. There may be a bug in psi4 at the root of all this. We think that most of the incompletes aren’t really running.

    • JM – Have you looked at the structures?

    • DD – No, when we think it’s a problem witht ehs tructure, we’ll interface with the submitter.

    • JH – For things like scf issues in psi4, do you tweak the convergence criteria before resubmitting?

    • DD – No. If something like that needs to be investigated, we’ll work with the submitter to identify what needs to be changed. I envision that, if only some molecules fail, we’ll modify and resubmit only those, and have the final dataset put all the “good molecules” under the umbrella.

    • JH – It’s possible, with SCF convergence errors, you can just change some psi4 keywords to resolve them

    • DD – I think that changing those keywords produces a different job on QCF, right BP?

    • BP – Right, it’s technically a different job.

    • JW – Naming? Should a dataset, after many submissions, ever be tagged “final”, or should we have a version number that always increments? It may get complicated when a dataset is “technically” finished, but human error caused some molecules to be excluded from the initial submission.

    • DD – I’m in favor of incrementing version numbers. “Final” is a recipe for disaster.

    • (General) – Yes, dataset naming is really hard

    • DD – Do gen2 v1 and v2 datasets overlap? or are they completely distinct?

    • JM – I think they’re completely distinct.

    • JH – The jobs that are failed with NaNs – Are those due to naming issues (those are suffixed with letters)

    • DD – All the NaN jobs appear to be suffixed with letters. Good observation.

 

  • Updates from MolSSI?

    • BP – Possible bug in psi4, which causes things to hang. I’ve submitted a PR to get this fixed. Then we’ll need everyone to update their environments.

    • DD – Also investigating issues at QCEngine layer. Lilac runs with single job/submission, so it’s unnecessary to have a pool, which adds another process boundary that makes things complicated.

  • Manager maintenance

    • All managers are off except PRP

  • User questions

    • None

Action items

Decisions