Issues with loading old BasicResultCollection datasets | Alexandra McIsaac | AMI: recaps FF fitting pipeline. Hessians are used in initial guess with modified Seminario Method, but not in fitting afterwards. Github link macro |
---|
link | https://github.com/openforcefield/openff-qcsubmit/issues/299 |
---|
|
Our pre-QCPortal 0.50 training datasets don’t have CMILES associated with the Hessian calculations, making it impossible to load them directly as a BasicResultCollection , but they are able to be accessed using OptimizationResultCollection.to_basic_result_collection from the corresponding Optimization datasets Is it possible to add the CMILES now, without re-computing the dataset? If it is not possible, do we want to re-compute the datasets? If we don’t want to re-compute the datasets and can’t add CMILES, what do we want to happen if someone tries to load these datasets? What error message and/or workaround should we implement? AMI – Could have a message print that, for old datasets, you can try loading it as an optimization. BW – I added a warning for each entry missing CMILES. So this could be an option for a message that prints if all entries are missing CMILES. Or we could even search for Opt datasets that contain this molecule as a final mol, and print out their names. LW – could print out a summary like “how many mols were missing CMILES out of how many mols in the dataset”. The printouts for each bad mol were quite overhwleming JW – So how about we target: An always-shown summary statement of how many entries were missing CMILES. A by-default not-shown warning for EACH entry missing CMILES (but enable-able with warning/logging level controls). If some/all CMILES are missing, always print a final warning that some legacy datasets are only loadable from an optimization set (usually with the same name as the singlepoint dataset)
|
Issue with coordinate precision in OptimizationResultCollection.to_basic_result_collection
|
| Github link macro |
---|
link | https://github.com/openforcefield/openff-qcsubmit/issues/297 |
---|
|
In our new S dataset, 600/900 conformers have geometries that differ from the Optimization dataset final geometry by >1e-9 A (but <1e-8 A), leading to them not being recognized as the same molecule for OptimizationResultCollection.to_basic_result_collection How to proceed?
|