2023-06-27 OpenFF QCA Working Group Meeting Notes

Participants

  • @Trevor Gokey

  • @Jeffrey Wagner

  • Ben Pritchard

  • Josh Rackers

  • @Lily Wang

  • @Pavan Behara

  • @Trevor Gokey

Goals

  • MolSSI QCArchive user group

    • user questions / issues / feature request

    • server instance statuses

      • QCArchive Legacy

      • QCArchive OpenFF

        • currently retains everything from Legacy

      • QCArchive ML

      • QCArchive Validation

    • compute resources statuses

    • call for new users

    • trainings

      • upcoming PortalClient trainings

      • upcoming compute manager trainings

    • deployed stack versions:

      • QCArchive Legacy

        • 0.15.8.1

      • QCArchive OpenFF

        • 0.50.0b11

      • QCArchive ML

        • 0.50.0b12

      • QCArchive Validation

        • 0.50.0b12

  • New datasets

    • SPICE 2.0

    • OpenFF Optimization Diverse Fragments with Iodine (w/ ESPs)

    • OpenFF Optimization Hypervalent Sulfurs (w/ ESPs)

    • OpenFF DNA

  • Updates from stakeholders

    • OpenFF

    • Genentech

    • MolSSI

  • QCFractal development : sprint begins …

    • QCFractal v0.50.0 - imminent

    • v0.70.0 milestone:

  • Additional business

    • MolSSI QCArchive Working Group start date: 8/29

Discussion topics

Notes

Notes

  • MolSSI QCArchive user group

    • user questions / issues / feature request

      • PB – Can we still use QCSubmit to retrieve data from the new server?

      • JW – I think we can still pull old results - BP is running a copy of the old server.

      • BP – Right, you can still pull results from an old copy of the server. But I don’t recommend writing to the old version of the server since I’ve already completed the data migration

      • DD – So there are 3 servers running on the host?

      • BP – Yes, 4 actually

        • QCArchive Legacy

        • QCArchive OpenFF (still has everything from the old DB)

          • BP – I’m cancelling all of the old/currently running tasks, to clean things up.

          • DD – That sounds good.

        • QCArchive ML (BP has started cleaning up non-ML datasets from this)

        • QCArchive Validation

    • server instance statuses

    • compute resources statuses

      • DD – No compute running since we don’t have jobs in queue. And we don’t have jobs in queue since qcsubmit doesn’t work.;

    • call for new users

      • LW + JW – New QM hires. Also PEastman.

      • DD – Let’s bring on new users once MolSSI training starts up.

      • JW – I’ll loop them into this meeting during onboarding.

    • trainings

      • upcoming PortalClient trainings

      • upcoming compute manager trainings

      • BP – I’ve got meetings/trainings coming up the next two weeks. (site visit week of the 10th, boot camp the next week), so I’d be available the week of the 24th. But them I’m booked after the week of the 24th.

      • Training will run 12 - 3 PM Los Angeles on July 25.

        • BP will aim to have user credentials prepared, can expect new hires to be proficient with conda.

      • DD – That’d be great

      •  

    • deployed stack versions:

      • Legacy - QCFractal 0.15.8.1:

      • OpenFF - QCFractal 0.50 beta12

      • ML - QCFractal 0.50 beta13

      • Validation - QCFractal 0.50 beta12

  • DD – Could next branch be released?

    • BP – Need to prepare docs and readme. Folks will have trouble understanding this is reverse-incompatible, and I don’t have time to have a bunch of conversations. So users who have a new portal will be confused when connecting to a new vs. old server. Aiming to have this done by the time of the bootcamps. There are conda pacakges of the betas available at https://anaconda.org/qcarchive/qcfractal/files new docs: https://molssi.github.io/QCFractal/user_guide/index.html

    • BP – There is some work started in a QCSubmit branch that can help.

    • JW – Yeah, I need to figure out what “success” looks like and see whether the current branch meets that.

    • BP – I recommend taking all of the caching out of QCSubmit and put it into QCPortal. And generally decide whether there are features in QCSubmit that should go into QCPortal.

    •  

    •  

  • New datasets

    • JR – I’ve been talking to PEastman and somne of the other SPICE folks to try and come up with a version 2.0. Happy to outline ideas for how to expand if folks here are interested.

      • A couple of things I’d like to do in version 2:

        • Build a massive dataset along the lines of “ML Big Data”

        • Include solvation effects on all molecules in the dataset (not just AAs)

        • Keep including more chemical space (bring in more of pubchem and other sources)

        • Generate more conformers

      • Interested in two-level approach:

        • Do everything in xTB

        • Do some subset at DFT level

        • Do a much smaller subset at coupled cluster level

      • I’ve been interested in OpenCatalyst work (similar project to develop a DB of QM calcs for materials, to develop catalysts). They host public challenges to evaluate models and update a leaderboard. So it’d be neat to have the QM community move past QM9 and have a challenge.

    • BP – This could fit on the ML instance - That will have SPICE and would fit the mission of other ML datasets

    • DD – Where would the compute power come from/what resources could be marshalled? OpenFF helped with V1 but we may not be able to do a V2.

      • JR – Can pull in a hodgepodge from genentech, also from NVidia and LightningAI. Also facebook folks (who do opencatalyst) were interested. So if we come up with a solid plan for what’ll be in the dataset we can likely marshall resources. My bigger concern is storage, and I’ve spoken with BP about this. I think it’ll be important to store densities and wavefunctions for at least a subset of the data. On the resources side, it’ll be good/necessary who have one person be the point person for SPICE2.0/openconformer intiative. Organizationally this will basically be necessary, and genentech could kick in to fund this role, or if there’s someone interested they could lead it.

      • DD – Do you have suggestions for folks who could be a point person?

      • JR – Not really. PEastman is interested in contributing technically but not managing the project. I’m not sure about OMSF/MolSSI.

      • BP – At MolSSI I’m not sure we have someone with the expertise and the time. I’d love to have a dataset leader at MolSSI (ML or otherwise) since that’d take work off my shoulders.

      • DD – Would SPICE2.0 be a big enough draw to reprioritize folks at MolSSI?

      • BP – No, I don’t think so.

      • JR – I’ll ask about folks at Genentech.

      • JW – re: POSE grant; may be an avenue for at least a host org, funding; doesn’t solve finding the right person

      • DD – Probably the right next step is to talk to JC about whether there’s a good person we could get.

      • JW – Happy to have OpenFF contribute compute via PRP/NRP. Or to help MolSSI get on those platforms. So feel free to ask me for an introduction.

    • LW – We have some datasets we want to get in, but not urgently. Looking at diverse fragments containing iodine. Also hypervalent sulfurs. And a DNA dataset. All are relevant for vsites. But we don’t need them for the initial vsite FFs. Was hoping to submit once QCSubmit is updated, and once the new hires are up to speed, so in the next few months. The datasets aren’t finalized but I’ve been looking at some ideas.

      • PB – Would these be for ESPs?

      • LW – Iodine dataset and hypervalent sulfur yes, but DNA no. And these would be geometry optimizations.

      • (General) – Not actually storing the ESPs, but instead storing wavefunctions and reconstructing ESPs locally.

      • TG – Does it still store wavefunctions for the whole trajectory? Is there some way to just get it for first and last frames?

      • BP – There’s an option to store the initial and final structure in a trajectory. But I don’t think there’s a way to store wavefuncs for the initial and final and then also store the trajectory without wavefunctions.

    •  

  • Updates from stakeholders

    • OpenFF

      • JW – I need to get working on QCSubmit, but I haven’t started yet.

    • Genentech

    • BP – I may bring in the MolSSI ML person and postdocs. There was some overlap with what JR was talking about and our plans. I’ll try to invite them myself or will ping JW/DD if there’s difficulty.

  • QCFractal development : sprint begins …

    • QCFractal v0.16.0 - imminent

    • v0.17.0 milestone:

  • BP – I think we’re basically feature-complete here. And I’m tracking my work in Asana, so I don’t think I’ll open this up. Right now my “sprint” is basically bugfixes. I’m not adding any major new features, and I’ll prune the issue tracker and PR list once the next branch comes out.

  • DD – This meeting could be a good way to pull in new devs

    • BP – I’d love to get contributors for portal. Fractal not so much. My asana is very brief/personal notes stuff. Major new development (0.70) will rely on me getting more funding to get folks to work on it.

    •  

  • Additional business

    • DD – Is there a MolSSI QCArchive Working Group start date?

    • BP – Could be the week of the trainings.

    • JW – Could go in the off-week from the current semiweekly meeting.

    • DD – OpenFF could host the current meeting through the end of august?

      • BP + JW – That’d work.

    • DD – I’ll mark this though Aug 29.

    • BP – There’s a MolSSI user group meeting in November, like a mini symposium. We’re planning on MOPAC as a topic.

Action items

Decisions