2021-05-07 QCA Submission meeting notes

Participants

@Trevor Gokey
@Pavan Behara
@Hyesu Jang
@David Dotson

Goals

User questions/issues, new submissions
Infrastructure needs / advances

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
User issues	Hyesu	Problems loading existing dataset to combine with new molecules when I tried to load `dataset.json.bz2`, get validation error that all dataset entries are missing fixed hydrogen inchi JH: recently added inchikey validation, but it means old datasets don’t have it TG: could use older version? [conclusion] Hyesu needs atom map functionality, and so not possible to use old `dataset.json` directly HJ: is it expected to have tautomers enumerated in optimization datasets, or can we skip them? TG: it’s up to you, if you want to have them or not HJ: Do we still have problems with order dependence between enumerating protomers and tautomers JH: only an issue for torsiondrive datasets, if you tag dihedrals first or not JH: for the optimization dataset, can you send me the SMILES list you’re using? HJ: yes, no problem TG: if we add and remove things from the data structure, we’ll have issues like this one JH: will always be able to go back to previous version of qcsubmit; otherwise really hard to support full backwards compatibility for automation, can get away with pulling dataset name and other things out of the data structure without passing it through e.g. `TorsiondriveDataset.__init__` DD: what if take the approach of treating QCSubmit as a strict structure that you must abide by if you want to use its workflow components; can always pull apart old `dataset.json`s directly as pure python objects? TG: now have the problem of requiring expert knowledge of data structures for users JH: would prefer to make the new fields optional so they don’t trigger pydantic validation; once it’s existed for some time and is very stable, can make it required DD: a bit like a reverse deprecation; at some future release the validation will be required JH: yes, I like that TG: another hot take: possible to make the code use inchikey if there, not use it if it’s not? JH: another good idea; if we overwrite the init, see if fixed hydrogen inchi is there. If not, generate and add it [decision] make inchi keys optional from pydantic’s perspective JH: need to consider how we phase in new functionality like this in the future; optional → required
Enamine real	Trevor	Avenues to do submission outside of `qca-dataset-submission`? TG: please put together a procedure for how best to put together a submission that is not submitted via GHA

Action items

@Hyesu Jang will send @Joshua Horton the SMILES list she is using for assembling the OptimizationDataset, in which tautomer and protomer enumeration is very order dependent

@Joshua Horton will make inchi keys optional in openff-qcsubmit dataset entries for now, phase in required usage and decide if they are generated when not present on a newly-loaded set

@David Dotson will write up approach to submitting a dataset outside of qca-dataset-submission automation; needed for datasets that will almost certainly take longer than GHA allows for submission

@David Dotson modify automation for lifecycle to ensure it doesn’t choke on old datasets missing inchikeys [confirmed that current implementation is unaffected; only uses pydantic model for submission]

Meetings

2021-05-07 QCA Submission meeting notes

Participants

Goals

Discussion topics

Action items

Decisions