2021-03-19 QCFractal User Meeting notes

Date

Mar 19, 2021

Participants

  • @David Dotson

  • @Pavan Behara

  • @Heejune Park

  • @Trevor Gokey

  • Ben Pritchard

  • @Simon Boothroyd

  • @Joshua Horton

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

QCArchive status

Ben

  • Update failed on production, public instance to latest QCFractal server release

    • there are database inconsistencies that blocked this

      • duplicate molecules; molecules are uniquely identified by a hash, but we actually have duplicates (same hash, same everything)

      • some knock-on effects: likely duplicate optimizations, duplicate torsiondrives

        • id of the molecule is built into the hash for the procedure

      • second inconsistency: duplicate SingleResults (energies, gradients); null columns in MM cases cause deduplication components to not work reliably on these

  • Not so easy to fix, because duplicate cases often have foreign keys in JSON blobs

  • What are the possible solutions?

    • Migration of old database data via server into a new server instance

      • Molecules, collections, etc.

      • this approach could be designed to preserve internal IDs where possible

        • SB: I use IDs to refer to data, since that’s the only reliable way to reference exactly the same data over time

  • SB: one thing that’s missing is a data retention policy that demarcates expectations around data preservation, even through a migration

    • BP: this is something we’re currently thinking on

  • SB: is it possible to specify that I want to create a new dataset with specific record IDs instead of relying on deduplication of resubmission of same data.

    • BP: not really possible at the moment; only possible to specify existing molecules and specs that go into a collection, not existing results

  • HP: how are molecules saved in the database? Is it just XYZ coords, or other information?

Dataset status

David

  • SB: on WBO TorsionDrive Dataset

    • don’t see all of the original molecules that I expected to see from the source set

    • at the very least the CMILES aren’t present as expected?

    • only tautomers and protomers are there, not the base forms

    • JH: QCSubmit is dropping the input for the enumeration stages

      • the toolkit won’t return the input molecule, and QCSubmit only taking what comes back from the toolkit

  • DD: public datasets prepared, putting finishing touches on READMEs, adding dataset PDFs

    • TG: perhaps consider waiting for new basis set if it’s coming soon; there are iodines in some of the datasets

    • SB: autoaux is the one we’re likely to choose; 15% performance decrease, but should give the same results for non-iodine

    • TG: there may be some new basis set names

    • SB: would like to get something fairly soon, so perhaps submit at least one of these upfront?

      • DD: can submit the Swope set with existing basis set

    • [decision] submit Swope set first, will give us starting data, help identify issues in the process

User questions

 

  • HP: Seeing SCF failure in my own datasets, planning to increase scf_maxiter, changing the basis set. Are there other changes I can try?

    • PB: soscf in psi4, turning this on will enable second order consideration during SCF convergence

Action items

@Joshua Horton will address QCSubmit dropping input tautomers/protomers in enumeration component
@David Dotson will finish Swope set for review; submit with existing basis set specification

Decisions