PE and Marcus will be setting up some datasets
Then we need to figure out how to do QCSubmit part
JR – wavefunctions or densities? Could be impractical to store all wavefunctions. We could come up with a strategy to choose some to save, e.g. minima, or high fidelity calculations
PE – we were initially going to store wavefunctions in SPICE 1, but would have filled up QCA in 3 days. Storage is definitely the issue here
BP – we could store everything but do we want to? How big do you anticipate it being?
PE – v1 has ~1.1 million conformations. v2 will have roughly same number of additional confs with 40-50 atoms, about the same size as v1.
BP – we estimated 5-6 TB for wavefunctions for v1.
JC – could we retrieve parts of the dataset instead of everything? Downloading 5-6 TB could be a lot. Could we split up the dataset?
BP – we can certainly store 6 TB.
JR – for xtb calculations we can throw away. For some DFT, for anything higher, we should store the density and/or wavefunction.
JR – CBS limit, we could do two levels with different basis sets for extrapolation
JR – not aware of good work of CBS of densities and wavefunctions
JC – will have to do multiple calculations of subsets anyway, might as well?
JR – agree
JR – the best place to start is to store some subset of DFT wavefunctions, and pick the last and most-optimised structure. Or for a series of conformers, save it for the lowest energy one. I can volunteer to sketch out these heuristics
JC – would we want to figure out how to do this on SPICE 1.0 since we already have that?
JR – sure
JC – are ESPs of interest as well?
JR – we should only save wfn and re-compute it. Should be relatively easy to package code that does it easily for users
BP – I can sit down and give you a storage quota. New server has ~140 TB space. We can also do an attached storage box, probably for an additional ~200TB. It would be spinning disk.