Update failed on production, public instance to latest QCFractal server release
there are database inconsistencies that blocked this
duplicate molecules; molecules are uniquely identified by a hash, but we actually have duplicates (same hash, same everything)
some knock-on effects: likely duplicate optimizations, duplicate torsiondrives
id of the molecule is built into the hash for the procedure
second inconsistency: duplicate SingleResults (energies, gradients); null columns in MM cases cause deduplication components to not work reliably on these
Not so easy to fix, because duplicate cases often have foreign keys in JSON blobs
What are the possible solutions?
Migration of old database data via server into a new server instance
Molecules, collections, etc.
this approach could be designed to preserve internal IDs where possible
SB: I use IDs to refer to data, since that’s the only reliable way to reference exactly the same data over time
SB: one thing that’s missing is a data retention policy that demarcates expectations around data preservation, even through a migration
BP: this is something we’re currently thinking on
SB: is it possible to specify that I want to create a new dataset with specific record IDs instead of relying on deduplication of resubmission of same data.
BP: not really possible at the moment; only possible to specify existing molecules and specs that go into a collection, not existing results
HP: how are molecules saved in the database? Is it just XYZ coords, or other information?
0 Comments