Which datasets to use for testing?
MT – I want to make a utility that helps makes json blobs containing ib databases. These would be stored in the repo. Would like them to be the ibstore database format.
LW – So you want to construct datasets on the fly and save them. Probably simplest to use json blobs early on, even though they’re not space-efficient.
MT – So save them or regenerate when running tests? You say save them.
LW – May be hard to automate for every scenario. Every dataset will have different reqs
JW –
MT – Talking about
How should users be able to access entries in datasets?
How to create new datasets on the fly (should make a utility for this)?
MT – Could mock qcsubmit’s dataset.submit to get data
LW – No need to compute at all, could just save the qcsubmit collections that result.
…
JW – Can show how to run simple local jobs on QCF, we do that in qcsubmit’s tests
MT – I’m thinking of not even really running stuff. Rather mocking the .submit
call
…
JW – Wouldn’t need to mock .submit
, instead could mock the dataset retrieval call.
MT – Would need to run .submit
to get a template
JW – Could just run things using the rdkit force field, or sage, to get a real looking set of mockable data.
MT – I’ll probably work on updating to QC* 0.50 first, and then I’ll poke around with trying to mock the return values. Would love a sort of migration guide/blog post on how to update QCSubmit code.