2021-03-15 Developers Coffee Meeting notes

Date

Mar 15, 2021

Participants

  • @Jeffrey Wagner

  • @David Dotson

  • @Simon Boothroyd

  • @Lorenzo D'Amore

  • @Matt Thompson

  • @Pavan Behara

Discussion topics

Item

Notes

Item

Notes

Updates

  • LD

    • Recieved Janssen laptop on Friday. Waiting to get access to linux workstation.

    • Had a call with DM and DH regarding future projects. Idea is to do torsional scans to continue on Chaya’s work. Would check biaryl systems using two datasets (detailed notes on other computer).

  • SB

    • Mostly generalizing bespokefit to do bespoke and general fits. Should deduplicate code used to make eg torsiondrive targets. This could become a much more major route of FF fitting. Could ultimately become a schema-based replacement for OpenFF-forcebalance. All compoenents are modular so it would be possible to use eg QUBEKit instead of Forcebalance on command.

    • Evaluator and recharge are now on conda-forge. Only remaining things off it are QCSubmit and bespokefit

      • JW – Anything held back by fragmenters old OE contraint?

      • SB – QCSubmit has fragmenter as an optional dep. I’m refactoring bespokefit to have it as optional.

    • PB – When will this be released to production?

      • SB – no timeline yet. We’re hoping to have something working by the end of the week, but this wouldn’t be tied toa release

    • PB – Another useful filter could be to exclude benchmarking set of molecules

      • SB – That’s a good diea. In general, it’d be good to have this a a general filter that could be applied at any stage.

    • JW – Saw a nice use of plotting tool last week

      • SB – It’s just using plotmol. Maybe 30 lines of code to make those plots. github: simonboothroyd/plotmol

  • MT

    • Helped with 0.9.1 release

    • Made forcebalance package on conda-forge. No smoke yet but waiting to see what happens.

    • Tried to get into strawman repo, but wasn’t sure about where to contribute. I think I’ve got it now, and only have a few lines written. I’ll be continuing this.

    • Scattered work around toolkit, fixing things as they came up – Mostly on maintenance/reliability side.

    • Worked on interoperability – Mostly building out test suite, especially energy tests. Right now this is hard because we don’t have importers (they’re last on the roadmap) and 2) going abck and forth with ParmEd is hard, and 3) the only data import route goes through the Toolkit. I’m considering making some simple readers to get through this block.

    • Met with Swails on potentially enlisting his help, but also expanding out scope somewhat. Meeting coming up this week to determine technical costs.

    • Met with Vanderbilt and had a working session. Was fairly productive on a technical level. I expect that we’ll have something productive ready for release in a few weeks.

    • SB – Roadmap for internal use of System object?

  • DD

    • Conference last week tues/weds.

    • Began working on TorsionDriveExecutor for benchmarking

    • Worked with Bill Swope to get him unstuck on exporting problem. General fix is merged and will be in the next release.

    • Finished connectivity rearrangement check PR + merged in benchmark.

      • SB – We should see if we can keep this from being a one-off, and reuse code from QCSubmit if possible

      • JW – We’re already using (copying code) and planning to use stuff from QCSubmit, so I’m planning to fully move over to it soon.

      • SB – We should make this a discussion at a QC* meeting. Same with a lot of the work like caching that DD is doing. It’d be good if there was a way for users to access this cache. JH would find a lot of benefit from this.

      • DD – Currently the caching is all an implementation detail and is not exposed to the user. I’ll need to spec out exactly what a user-exposed cache could look like.

    • Looking at making custom parameter sets/a more flexible method/basis input for qcengine. This will let us fill in eg. a path to a SMIRNOFF file for a submission.

    • No major updates on QCA.

    • Need to prepare partner datasets this week.

    • Preparing for partner meeting on the 24th.

    • Will follow up with Hyesu to craft a way to evaluate single-point hessians on final molecules in optimizations. Want to incorporate this into our current automation.

      • SB – Hyesu think that, fi we’re going to use the same training set as the 1.2.0 fit, those should all have hessians except the ones that faled. So it’ll be good to follow up with Hyesu about this.

  • PB

    • Was off Thursday and Friday.

    • Went down the rabbit hole of debugging MM energies, and trying to match how ForceBalance was doing the full MM calculations.

    • Updated scripts to do new ELF10 calculations.

    • Will continue working on WBO fits this week.

  • JW –

    • Moving this week, out Thursday and Friday

    • Dalke will likely be joining, need to finish reviewing code samples

    • Lily Wang will probably begin joining these calls

    • 0.9.1 and 0.8.4

    • Working with Connor Davel (CU boulder) on performance issues. I think that, instead of making special case logic for typing polymers, we can fix a lot fo problems at once by correctly implementing caching in toolkitwrappers.

      • SB – Caching can be really tricky – I see it as a band aid when there’s a deeper design problem.

    • This week, will work on clearing Toolkit PR backlog. Possibly refactoring fragmenter to work with new OETK.

Other topics

  • Fragmenter update

    • LD may not have access to OE license – Would need full refactor to use. May do torsion scans and could use Fragmenter for this.

    • SB – How many torsion scans would there be in this case, and what kinds of molecules?

      • LD – Continuing Chaya’s(?) work. I saw it in a YouTube video.

    • SB – One option is to remove the OE license, otherwise someone else could run the fragmentation on their machine and send you the results. It’s also worth considering whether the molecules would need to be fragmented.

    • JW – Two options:

      • Quick – Update fragmenter to use OE2020 toolkit. JW can diagnose whether this is possible in a few hours. If it’s possible, he can have fix out in a day or two (in a stable release)

      • Slow – Refactor fragmenter to use OpenFF toolkit. This will require a major rewrite (1-2 engineer weeks), and there will probably be cases of different behaviors

    • SB – A lot of fragmenter is probably superfluous now. We could consider making the replacement a complete rewrite to only expose the important API points, and the internal methods that those require.

    • SB – Re: testing – JH has a big dataset that he ran through fragmenter. That could be used as a regression set.

    • DD – We could consider making an openff-fragmenter package.

    • JW – Hesitant about this, since I could imagine the code eventually going into openff toolkit. So then folks would need to follow a trail through two deprecated repos to find the functionality. So we could overwrite `master` in openforcefield/fragmenter

    • Three options

      • Overwrite openforcefield/fragmetner

      • make openforcefield/openff-fragmenter

      • Immediately make the rebuild inside of openff-toolkit

        • PB – This would be most useful to me

        • SB – This also runs into a parity issue (OE and AT will give different outputs)

        • PB – Would basing this off the toolkit by any avenue run into the parity issue?

        • JW – could force the use of OE backend initially

        • SB – We’ll know a lot more once we try.

    • JW – More generally with the “parity” issue, I’m not sure how to handle internal Toolkit changes that may affect the numerical output of a method. Is it OK if users get a different output from the same command with the same dependencies, where only the Toolkit version varies? I think this is ultimately necessary to accept.

    • (General) – We’re not sure which way to go here. Each one has pluses and minuses. Should check in with Chaya and John.

    • (General) – We’ll start by making a fresh branch on openforcefield/fragmenter as a “complete rewrite”. If Chaya and John don’t want the original repo overwritten, we’ll be able to make a new repo out of the code in this branch.

    • Necessary API points

      • WBOFragmenter in bespokefit

      • WBOFragmenter in QCSubmit

  • Moving to openff-units? When and how?

    • MT –

      • Who owns this effort?

      • Where should the code live? Currently I have a copy of it in OpenFF-System

      • When will other people within OpenFF adopt it? Will they?

      • Would we want to put this up as a conda package in the near future?

      • Would it be worthwhile to have evaluator switch to it immediately?

      • How complex will it be to switch OpenFF toolkit over to this?

        • It’d be great use the molecule refactor as an opportunity to base the entire toolkit off of openff-unit instead of simtk units.

      • SB – Organization/political aspect

    • DD – Can we make this opt-in?

      • MT – It’ll be much easier to do this for non-production/widely used packages

    • JW – I think all adoption within the OFF Toolkit is dependent on discussions with John and Peter. So let’s push forward on that first.

    • MT – Setters should still be able to take simtk units. But getters will return pint units.

    • MT – Where are all the places where simtk units are used? Forcefield/parameters. Conformers+box vectors+partial charges. to/from_qcshema.

  • What’s the real case for Pint vs SimTK?

    • Easier to use different CODATAs

    • Slightly better serializaition

    • Numpy compatibility (if you call np.sqrt on a pint array, you get the expected output. If you do the same with simtk, you get the units square-rooted.)

    • Unit/dimension reduction/cancellation (required for Evaluator, will get joule/calorie instead of 4.184, will also get floats when units cancel instead of Dimensionless)

  • How painful would it be to stay with simtk?

    • Evaluator requires pint for reasons above

    • Not sure whether the current openff-units repo could use simtk internally

      • SB – It’s likely to be possible, but painful.

  • How much does OpenMM’s choice affect ours? There’s some concern that, if we don’t have organization boundaries, then we can never move forward on any decision, because there’s an unbounded list of stakeholders.

  • We need to make a decision on how to move forward with this. JW will present this topic to the project leadership, along with possible options. JW will have a suggestion which will be the path forward if consensus isn’t reached.

Action items

Decisions