Updates from MolSSI | DD: Ben and I coordinated on QCF-next for a December release I’ll be working with BP on preparing a set of openff-qcsubmit changes that will immediately be compatible with the next branch release. BP is leading this effort, I’m just providing implementation support. Plan to allocate time for this in November. Met with Chodera and Eastman on SPICE issue. This regards Nature not being satisfied with QCArchive acting as an long-term archive for datasets. QCArchive is kind of a misnomer - it doesn’t guarantee archival - so tools like zenodo and figshare are better suited for this. JC and PE agreed and tasked me with creating the data artifacts and spreading these standards through openff. This is also an opportunity to update the “Data” tab on the website. So I’m thinking of using the qcsubmit export functionality to make data artifacts to post on this tab. BP has a postdoc whose role would seem to overlap this - Zena(?) - and they’re thinking of using a tool like ZenoPy to do this. But overall, QCArchive isn’t really an archive - it’s several other things but not that.
PB – Regarding “using QCSubmit to make single-file artifacts”, are you referring to QCA IDs or Records? DD – I think the key insight is that QCA isn’t really and archive and isn’t immortal - in a lot of ways, we’ve externalized costs to them. DD – BP would like to move towards a model where, since QCF will be easier to deploy for individual orgs, OpenFF would stand up its own QCF server. This would reduce the asymmetry of costs between OpenFF and QCF. Right now more than 90% of the data in QCA is OpenFF’s. JW: I still have to think over it, one side it is fair to host by ourselves and on the other side MolSSI’s objective might be to provide this service for the community. Perhaps we need to add a section on the costs for data archival in our future grants, there are a few options we need to assess and figure out. DD – Do we use the export tool at present? PB – So we’ll host our own database/archive? DD – Yes, and BP is planning to rebrand away from the “archive” name. QCF is more of a distributed compute system. PB – Could we host the server on an academic cluster? DD – Yeah, there will be some additional work to setting up multi-user behavior and external access. This also wouldn’t be servicable as a batch job.
DD: Is it possible to include “hosting a server” on oracle proposal or is it too late? DD – Also worth mentioning that we don’t have to allocate a ton of storage. We could just upload complete datasets to Zenodo and then EOL “live” datasets shortly after compute is complete. JW – So the action items moving forward seem to be: DD will work on data archival practices and update website both with dataset and practices JW will bring the “we will need to host our own QCF server at some point in the future” up to the lead team. Options will include AWS hosting or academic center hosting. Also note that it will be possible to massively reduce storage size if we agree to EOL live access to datasets and just put static versions on Zenodo.
|