2021-05-17 Core Developers meeting notes

Participants

  • @Iván Pulido

  • @Lily Wang

  • @Pavan Behara

  • @David Dotson

  • @Simon Boothroyd

  • @Matt Thompson

  • @Jeffrey Wagner

Goals

  •  

Discussion topics

Item

Notes

Item

Notes

Updates

  • SB

    • Mostly fitting work. Sage inputs repo is now up. Largely based on automated infrastructure.

    • Worked with MT on System object and identifying different ways to use pydantic

  • IP

    • Worked with LW and JW on residue perception, toplogy refactor.

      • We could potentially load biomolecules in SDF format, o we’llneed a way to fill residue info from that format.

      • Alternatively, we could load a biomolecule from PDB, we could fill in the cheminformatics information.

    • We’ve extracted SMARTS for each residue, and need to implement matching for the cases above. So far it works pretty well, but it gets confused by protonation states. Things like lysine and cysteine.

    • Worked on issue #926 from SB, where we try to canonically order molecule before generating conformers. The straightforward solution doesn’t work for RDKit. For OpenEye it works.

  • MT

    • Made 0.1.0a1 release of System – Purely ceremonial, just a git tag

    • Met with SB and got a ton of useful feedback on data models. I’ve fixed all the easy stuff, but will need time for the harder stuff (like tracking partial charge provenance).

    • Starting conda-forge packaging, but will need packages for openff-units and openff-utils.

    • Started preparing to port create_openff_system into openff-toolkit, using all private methods. Not sure when this will get into a toolkit release.

    • Bunch of small improvements – Improvements to documentation, fixing other bugs.

    • Super confusing OE license bug – GBSA energy bug is sporadic, and the times its been passing have been when the OE license is unavailable.

      • JW – I’ll try to check this out this week. Currently the failing tests are low-severity, but we’ll need to resolve it before the next release.

  • DD

    • Partner benchmark – We have 4/10 results tarballs.

    • Public optimizations are 34% complete. I exported these locally, roughly 300GB of data. Mostly json output, since we store optimization trajectory.

      • JW – Do we know if these are the smallest molecules in the whole submission? It used to be the case that the whole set would have been error cycled a bunch of times, and the small molecules would finish first.

      • DD – The current results are almost entirely Boehriger Ingelheim, so I don’t think it’s looping over the entire dataset.

      • SB – The molecules that are done seem to be pretty big, so I don’t think it’s just small molecules that are being completed.

    • Putting in performance improvements for openff-benchmark export. Will open PR this week.

    • Opened up torsiondrive PR, Xavier Lucas is testing it out. Lorenzo is porting it to openff-geopt. There were some notable failures before that this should resolve.

    • Created openff-benchmark #77 with Lorenzo. We’re working on reproducing Lim paper results. Meeting weekly on Wednesday. First hurdle is that there are some different sorts of results exported from QCA. Will work with BP on this.

    • Working on a procedure doc for submitting large datasets to QCA (eg ~500k molecule opt set). At some point we hit the constraints of GH actions. This will be necessary to submit an enamine set.

      • DD – We’ll want to consider hosting our own larger runners on AWS, maybe once OMSF is established+running.

      • SB – That sounds good, but this may be something that MolSSI should handle. Same with error cycling. Some question as to whether it’s most efficient for that to be under MolSSI.

      • DD – Some unclear jurisdiction here – If an optimization fails 5 times and the server flags it as problematic, which human should manually re-tag it?

      • SB – MolSSI could provide touch point to restart a whole dataset? That may eb a good compromise.

      • DD – That’s a possibility. It seems a little dangerous to automate too much and have old datasets recycle without user visibility.

      • SB – True, though it seems like we run the risk of desynchronizing with the state of software at MolSSI. So it’s likely optimial for them to own more of this.

      • DD – There could be different boundaries for different options here. So, error cycling could be a configuration option on a dataset, and then the server would handle it. I could take this on, and that would let us retire our error cycling.

      • JW – Agree, now that we have the desired behavior more hammered out, this is something that we can start pushing MolSSI to own.

      • DD – Yes. I’ll try to find time to bring this into MolSSI’s ecosystem.

    • Protein ligand benchmarks: Still haven’t had time to build up original design document. Working with LD on this and the reproducing the Lim work.

      • (General) – Can run PL systems through lilac for now, this isn’t blocking for sage. The release timeline can be on the order of months.

      • DD – Goal is to support using Perses and PMX. PLBenchmarks workflow is currently using PMX. OpenMM implementation wold use Perses. If needed/if we hit difficulties on these, we can use OpenEye Orion qubes.

      • SB – For me, the best starting point would be using perses on the JACS set. Everything else would be nice, but this would be the most valuable.

  • JW

    • Working with IP and LW on Topology refactor.

    • mmcif parsing, working on deciding which library to use

    • Confusing error from Fox involving N1NC(=N)C(=N)1 and spiro compounds

    • Implicit H mapped smiles error

    • Blog post

    • Parameter deduplication, summer project planning with CD.

      • SB – AmberTools AM1 minimization could be an idea here.

      • JW – That may be too complex for an undergrad

      • IP – I’d put it on my plate in the previous sprint.

      • SB – Is the idea that we’d check for connectivity rearrangement, or apply restraints if a change happens.

      • JW – What’s the ideal solution?

      • SB – Initially we should raise an error if the connectivity changes. Hard to say whether restraints should always be used. OE always uses restraints, but this leads to other quality issues. So my ideal would be to, when a connectivity change happens, to use GeomeTRIC to do AM1 optimizations.

      • JW – What’s the runtime of running sqm in a loop?

      • SB – Could be somewhat expensive – Would need to restart it repeatedly, and also there’d be file I/O. But the key is that we’d only do these geomeTRIC optimizations when something goes wrong.

      • (General) – Do we know how frequently this happens? Not really. It’s common enough that we should fix it.

      • SB – I have a script to wrap AM1 minimizations in geomeTRIC – It’s hacky but it works. In DM with OM+MT+CD+JW.

      • JW – In the short term, what’s desired behavior if connectivity rearranges?

        • SB – Could have an error, or try another conformer.

  • PB

    • Mostly spent last week on sulfonamide data. Presented results at FF release call and followup meeting.

  • LW

    • Met with Dennis Della Corte, spent about a day coming up with prototype code to parameterize a nonstandard AA inside a protein. Used openmm to do this. It’s really hard to write out a modified OpenMM forcefield. Trying to do this using ParmEd.

    • Trying to answer question of whether you can always break a big molecule apart, and then stitch it back together, and how to correct for charges at overlaps.

    • Found some weird behavior with OFFTK – If a SMARTS has a high degree of internal symmetry, then it can blow up the system memory.

      • JW – This may be a problem unique to librarycharges. Things like vdW would raise an errror if an atoms didn’t get parameters assigned, but librarycharges will silently skip over a molecule if there are any gaps.

        • LW – Is there a way to make this not be silent?

        • JW – We could do this in a fork of the OpenFF toolkit.

      • LW – Trying to understanding uniquify kwarg and how this could help us.

        • SB – OpenEye docs have a good explanation of uniquifying

      • SB – Issue was in a sizeable polymer, when we set max_matches, it wouldn’t assign parameters to the second half of the polymer. It’s also a problem that there wasn’t an error when big regions of the polymer were missing parameters.

Checking for connectivity rearragnements

(JW + IP worked on this, IP will carry implementation forward to PR)

Action items

Decisions