2021-03-19 QCFractal User Meeting notes

Date

Mar 19, 2021

Participants

@David Dotson
@Pavan Behara
@Heejune Park
@Trevor Gokey
Ben Pritchard
@Simon Boothroyd
@Joshua Horton

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
QCArchive status	Ben	Update failed on production, public instance to latest QCFractal server release there are database inconsistencies that blocked this duplicate molecules; molecules are uniquely identified by a hash, but we actually have duplicates (same hash, same everything) some knock-on effects: likely duplicate optimizations, duplicate torsiondrives id of the molecule is built into the hash for the procedure second inconsistency: duplicate `SingleResult`s (energies, gradients); null columns in MM cases cause deduplication components to not work reliably on these Not so easy to fix, because duplicate cases often have foreign keys in JSON blobs What are the possible solutions? Migration of old database data via server into a new server instance Molecules, collections, etc. this approach could be designed to preserve internal IDs where possible SB: I use IDs to refer to data, since that’s the only reliable way to reference exactly the same data over time SB: one thing that’s missing is a data retention policy that demarcates expectations around data preservation, even through a migration BP: this is something we’re currently thinking on SB: is it possible to specify that I want to create a new dataset with specific record IDs instead of relying on deduplication of resubmission of same data. BP: not really possible at the moment; only possible to specify existing molecules and specs that go into a collection, not existing results HP: how are molecules saved in the database? Is it just XYZ coords, or other information? BP: The hash function used to determine identity includes several fields in the QCElemental Molecule: https://qcelemental.readthedocs.io/en/latest/model_molecule.html#molecular-hash
Dataset status	David	SB: on WBO TorsionDrive Dataset don’t see all of the original molecules that I expected to see from the source set at the very least the CMILES aren’t present as expected? only tautomers and protomers are there, not the base forms JH: QCSubmit is dropping the input for the enumeration stages the toolkit won’t return the input molecule, and QCSubmit only taking what comes back from the toolkit DD: public datasets prepared, putting finishing touches on READMEs, adding dataset PDFs TG: perhaps consider waiting for new basis set if it’s coming soon; there are iodines in some of the datasets SB: autoaux is the one we’re likely to choose; 15% performance decrease, but should give the same results for non-iodine TG: there may be some new basis set names SB: would like to get something fairly soon, so perhaps submit at least one of these upfront? DD: can submit the Swope set with existing basis set [decision] submit Swope set first, will give us starting data, help identify issues in the process
User questions		HP: Seeing SCF failure in my own datasets, planning to increase `scf_maxiter`, changing the basis set. Are there other changes I can try? PB: `soscf` in `psi4`, turning this on will enable second order consideration during SCF convergence

Action items

@Joshua Horton will address QCSubmit dropping input tautomers/protomers in enumeration component

@David Dotson will finish Swope set for review; submit with existing basis set specification

2021-03-19 QCFractal User Meeting notes

Date

Participants

Discussion topics

Action items

Decisions