User questions | CS – My previous dataset has torsion atoms mislabeled. Need to fix and submit again TG – Can we rename existing dataset? BP – Not clear that we can CS – Can we change the first attempt to “fragmenter paper first try” TG – How about “fragmenter paper v2” CS – “OpenFF fragmenter validation 1.0” CS – Can we not list the collection? BP – I saw “invalid atom index” errors when I logged into the server to restart it TG + CS – Adding a PDF of torsions being driven to each submission would be good Could be required for submission: JW – Should each dataset be versioned? (“fragmenter paper 1.0”, “1.1”?) BP will delete the previous “fragmenter paper” dataset TG – In the PR, we should list the molecules that we’re removing DD – For the molecules/torsion scans that we DON’T want, will the hashes for those match with the resubmissions? TG – Could have a latest tag to indicate what is the most “recent” version of a collection/dataset
Storing molecule connectivity/SMILES Dataset naming JM – Can we group together all of the datasets used for fitting openff-1.2.0? We could make a new dataset, which contains exactly the same inputs as were used for the fitting JH – Could store parent_dataset in each molecule’s metadata TG – Once 1.2.0 is released, we can collect molecules used for fitting from HJ, and make an “aggregate dataset”. This can also include “parent dataset' field JM – What process should we folow for generating and labeling this dataset? JW – Once HJ is done with the fitting, we can meet to go over the QC data and identify which molecules should be included in the new dataset
BP – Input for a dataset interface refactor? BP – I think the optimization datsets have the best interface so far. BP – I don’t like that some dataset types, like Hessians, don’t offer a lot of fine-grained control over submission. YOu can basically just call “compute”, and you don’t have a lot of control over what happens for each individual job JH – It’s tricky that I can’t get the gradient from a geometry optimization without renaming the daaset BP – Don’t like vagueness around “get data”, “get record”, etc… TG – In my code, I collect a ton of IDs, and I don’t know whether they’re molecules or records. Would be good to have more clarity on that JH – Agree that it would be good for normal datasets to look like optimizations. TG – Having a special dataset name or set of docs for hessiandataset would be helpful JH – Would be nice to have a link between optimizations and hessians BP – When people do a property calculation, they want it linked to an optimized geometry. So this is kinda a general problem I should tackle BP will open an Issue for more feedback on this, will share link on #qcfractal
CS – What time is QCSubmit meeting tomorrow?
|