Page Comparison

...

Item	Presenter	Notes
User issues	Hyesu	Problems loading existing dataset to combine with new molecules when I tried to load `dataset.json.bz2`, get validation error that all dataset entries are missing fixed hydrogen inchi JH: recently added inchikey validation, but it means old datasets don’t have it TG: could use older version? [conclusion] Hyesu needs atom map functionality, and so not possible to use old `dataset.json` directly HJ: is it expected to have tautomers enumerated in optimization datasets, or can we skip them? TG: it’s up to you, if you want to have them or not HJ: Do we still have problems with order dependence between enumerating protomers and tautomers JH: only an issue for torsiondrive datasets, if you tag dihedrals first or not JH: for the optimization dataset, can you send me the SMILES list you’re using? HJ: yes, no problem TG: if we add and remove things from the data structure, we’ll have issues like this one JH: will always be able to go back to previous version of qcsubmit; otherwise really hard to support full backwards compatibility for automation, can get away with pulling dataset name and other things out of the data structure without passing it through e.g. `TorsiondriveDataset.__init__` DD: what if take the approach of treating QCSubmit as a strict structure that you must abide by if you want to use its workflow components; can always pull apart old `dataset.json`s directly as pure python objects? TG: now have the problem of requiring expert knowledge of data structures for users JH: would prefer to make the new fields optional so they don’t trigger pydantic validation; once it’s existed for some time and is very stable, can make it required DD: a bit like a reverse deprecation; at some future release the validation will be required JH: yes, I like that TG: another hot take: possible to make the code use inchikey if there, not use it if it’s not? JH: another good idea; if we overwrite the init, see if fixed hydrogen inchi is there. If not, generate and add it [decision] make inchi keys optional from pydantic’s perspective JH: need to consider how we phase in new functionality like this in the future; optional → required
Enamine real	Trevor	Avenues to do submission outside of `qca-dataset-submission`? TG: please put together a procedure for how best to put together a submission that is not submitted via GHA

Action items

Hyesu Jang will send Joshua Horton the SMILES list she is using for assembling the OptimizationDataset, in which tautomer and protomer enumeration is very order dependent
Joshua Horton will make inchi keys optional in openff-qcsubmit dataset entries for now, phase in required usage and decide if they are generated when not present on a newly-loaded set
David Dotson will write up approach to submitting a dataset outside of qca-dataset-submission automation; needed for datasets that will almost certainly take longer than GHA allows for submission
David Dotson modify automation for lifecycle to ensure it doesn’t choke on old datasets missing inchikeys [confirmed that current implementation is unaffected; only uses pydantic model for submission]

Versions Compared

Old Version 1

New Version Current

Key

Action items

Decisions