Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

...

\uD83D\uDDE3 Discussion topics

Item

Notes

Updates from MolSSI

  • DD: Ben and I coordinated on QCF-next for a December release

    • I’ll be working with BP on preparing a set of openff-qcsubmit changes that will immediately be compatible with the next branch release. BP is leading this effort, I’m just providing implementation support. Plan to allocate time for this in November.

    • Met with Chodera and Eastman on SPICE issue. This regards Nature not being satisfied with QCArchive acting as an long-term archive for datasets. QCArchive is kind of a misnomer - it doesn’t guarantee archival - so tools like zenodo and figshare are better suited for this. JC and PE agreed and tasked me with creating the data artifacts and spreading these standards through openff. This is also an opportunity to update the “Data” tab on the website. So I’m thinking of using the qcsubmit export functionality to make data artifacts to post on this tab.

    • BP has a postdoc whose role would seem to overlap this - Zena(?) - and they’re thinking of using a tool like ZenoPy to do this.

    • But overall, QCArchive isn’t really an archive - it’s several other things but not that.

  • PB – Regarding “using QCSubmit to make single-file artifacts”, are you referring to QCA IDs or Records?

    • DD – Records.

  • DD – I think the key insight is that QCA isn’t really and archive and isn’t immortal - in a lot of ways, we’ve externalized costs to them.

  • DD – BP would like to move towards a model where, since QCF will be easier to deploy for individual orgs, OpenFF would stand up its own QCF server. This would reduce the asymmetry of costs between OpenFF and QCF. Right now more than 90% of the data in QCA is OpenFF’s.

  • JW: I still have to think over it, one side it is fair to host by ourselves and on the other side MolSSI’s objective might be to provide this service for the community. Perhaps we need to add a section on the costs for data archival in our future grants, there are a few options we need to assess and figure out.

    • DD: Instead of piling data from different orgs to MolSSI’s server Ben is thinking if it is feasible to offer instances for individual orgs. It may shift the burden away from Ben on being the core support person for all the data.

  • DD – Do we use the export tool at present?

    • CC – Yes, my training sets are about 2MB, since they’re just structures and energies.

    • DD – That’s not bad.

  • PB – So we’ll host our own database/archive?

    • DD – Yes, and BP is planning to rebrand away from the “archive” name. QCF is more of a distributed compute system.

    • PB – Could we host the server on an academic cluster?

    • DD – Yeah, there will be some additional work to setting up multi-user behavior and external access. This also wouldn’t be servicable as a batch job.

  • DD: Is it possible to include “hosting a server” on oracle proposal or is it too late?

    • JW: Not sure if they are interested in a datacenter style application.

  • DD – Also worth mentioning that we don’t have to allocate a ton of storage. We could just upload complete datasets to Zenodo and then EOL “live” datasets shortly after compute is complete.

  • JW – So the action items moving forward seem to be:

    • DD will work on data archival practices and update website both with dataset and practices

    • JW will bring the “we will need to host our own QCF server at some point in the future” up to the lead team. Options will include AWS hosting or academic center hosting. Also note that it will be possible to massively reduce storage size if we agree to EOL live access to datasets and just put static versions on Zenodo.

Infrastructure advances


Throughput status

  • OpenFF Protein Capped 3-mer Backbones v1.0

    • Opts: 293477 → 299557 → 310894

    • TDs: 16 → 19 → 20

  • Revival of old protein datasets after geometric release

    • Github link macro
      linkhttps://github.com/openforcefield/qca-dataset-submission/pull/281

    • Github link macro
      linkhttps://github.com/openforcefield/qca-dataset-submission/pull/290

    • CC: No movement on these this week.

    • DD: Need to add tracking label. Should be good to go now.

  • New submission from Ken Takabo, may need to add them to qca-dataset-submission repo

    • Github link macro
      linkhttps://github.com/openforcefield/qca-dataset-submission/pull/317#issuecomment-1289469768

User questions/issues

DD: Added machine learning tag to SPICE sets but it seems to break the ML archive section of qca website. Need to discuss.

Science needs

✅ Action items

  •  

⤴ Decisions