2022-11-08 QC Meeting notes

Date

Nov 8, 2022

Participants

@Pavan Behara
@Jeffrey Wagner
@David Dotson
BenjaminPritchard
@Chapin Cavender

Discussion topics

Item	Notes

Item	Notes
Updates from MolSSI	BP: Running tests for qcf-next and everything seems good! DD: BP: I will cut a release earlier than I think and may run simultaneously old and new servers. Still contemplating options to host the server long term. JW: Yeah, I started the conversation within OMSF. If we host our own we’ll start EOLing datasets, happy to start this conversation sooner. BP – New version has more support for views, you can dump datasets into a SQLite file. Forward compatibility will still be complicated. New version will also allow for deleting datasets. DD: Do you have granular permissions system for users? BP: Not exactly, but deletions, etc., are restricted. DD: I would prefer to have additional userid to just take care of dataset modifications like deletions, so that we won’t mess by accident. BP – QCEl will also need updates. Not many other maintainers there, there’s LBurns and one other person who sometimes helps. The new postdoc, Mars, should also be working on it. We’ve got both postdocs now, working on onboarding. One will help with QCSchema, other will be using QCA on some applications. BP – Was there a complaint about server performance? PB – We’re comparing dipole moments from the benchmarking molecules. DD – Ah, that’s right, but the exporter is slow. So there are 150k optimizations and it takes a long time to download. But we’re not committed to maintaining/updating openff-benchmark, so it took some configuring. Then BSwope asked whether there’s a faster way to get this and I said yes, but I don’t have the time to make custom solutions for them. So I looped PB in to this since he has some familiarity with the area. PB – Right, I need to write a script for this, will aim to do it this morning. DD – … PB – Does the molecule data contain the company identifier? BP – It may be in the entry data. DD – So for each molecule, take the entry on the collection, and it will point toward the other objects that may have the name. BP – Could be cool to offer a method that just pulls down all the final molecules. I could do this server-side pretty efficiently.
Infrastructure advances	JW – Industry user asked for a specific change to qcsubmit to allow running without psi4. So there will be a release soon. https://github.com/openforcefield/openff-qcsubmit/pull/206 CC - Advice on making a test for this PR: https://github.com/openforcefield/openff-qcsubmit/pull/202 CC – Want to do a test on this, but many of the inputs will be too large to run this. So should I pull down an existing dataset or make a toy dataset? This is a blocker for the protein work. PB – Are you already using this to do the 2D scans? CC – Not sure I understand. This is only a problem when I try to retrieve results. DD – Oh, I see. Sorry this sat for so long. This should be good to merge. I’ll merge it now.
Throughput status	OpenFF Protein Capped 3-mer Backbones v1.0 Opts: 310894 → 311229 → stuck at this, might be a longer run TDs: 20 → 22 → stuck at this OpenFF Protein Capped 1-mer Sidechains v1.2 TDs: 44 → 45 (remaining 1) Stuck on 155215 for a week - might be a longer run or not CC – I’ll take a look at these DD – Should I leave lilac workers on? Or would it be better to have everything running on TSCC? PB – Agree, let’s shut down all workers on lilac. RNA Trinucleotide Single Point Dataset v1.0 - Almost complete! 57299 → 24362 → 7 remaining (might be longer runs, no status updates for the last three days) DD – Let’s bump this to high-priority since we may be able to finish it off (Later) DD – I tried running a worker locally and it’s not pulling down any of these jobs. BP, could you run your unsticking script? BP – (Runs script) DD – Awesome, a job just came through for me.
User questions/issues	BP – Refresh my memory? In geometry optimizations, for every gradient step, you’re storing various indices, and they take up a lot of space (partly because they’re duplicated - One set has underscores and the other doesn’t) PB – This is mayer and wiberg bond order, I think. BP – I’d like to delete one of these copies. Should I delete the one with or without underscores? As I’m migrating the server I’d like to clean this up. JW – Could we check which one qcsubmit pulls? That’s probably the one that we use. JW – Likely here: https://github.com/openforcefield/openff-qcsubmit/blob/396b2fbffd8fabfdd06a1460bec5702aed260361/openff/qcsubmit/common_structures.py#L78 JW – In March we’d decided to delete these for everything except first and last opt steps (incl torsiondrives) https://openforcefield.atlassian.net/wiki/spaces/MEET/pages/2261155854/2022-03-01+QC+Meeting+note (Copied notes from that meeting) * BP – From torsiondrives and optimizations, could I delete meyer+wiberg except for first and last conf for each? * TG – Yes, any time we’re looking at trajectories or torsiondrives, we don’t need intermediate bond info except at beginning and end. ( PB – I think it’s fine to delete meyer+wiberg info from those places
Science needs	PB – Once the next release is made, we should delete the iodine-containing molecules from gen2 set, since they were done with the wrong basis, and we have the corrected versions run in a different dataset. DD – We should be able to do that once the next release is out JW – Can we not supersede this with a higher-version dataset? PB – That could work too. JW – Let’s discuss at a later meeting. PB – OK BP – We can discuss deleteing older datasets as well.

2022-11-08 QC Meeting notes

Date

Participants

Discussion topics

Action items

Decisions