2020-08-10 Core DevsMeeting notes

Date

Aug 11, 2020

Participants

@Jeffrey Wagner
@Simon Boothroyd
@David Dotson
@Matt Thompson
@David Hahn

Discussion topics

Notes

Notes

- JW -- Amber impropers, OFF-pele integration, moving C-F migration forward
- DD --
  - QCArchive work. MM deployments. New QCA worker environment will pull in openforcefield(s).
  - Will be migrating to qca-dataset-submission. Will make a new conda channel and dockerhub org.
  - Worked on Cerutti's disaccharide submission.
  - JW will organize meeting with Cerutti on QM dataset formatting.
  - BP is organizing plan for major refactor of QCFractal, and I'll be helping out. Should cover things like submitting same molecule twice.
  - Reviewing evaluator docs/tutorials. Will try to begin work on pAPRika and Evaluator updates this week.
- MT --
  - Reviving benchmarking stuff, though likely to be passed on to Trevor. He has old machinery that will work for "single-pass" stuff, but isn't suitable for long-term datasets.
  - Met with DH last week, looked at data organization. Considering how to hold large/hetereogenous data in git trees vs. other sources.
  - Maybe FF releases and FF data generation should be uncoupled?
  - Worked on CLI -- Started making some utilities, dedicated repo
  - Started getting into matrix math of System object. Can now directly autodiff a loss function.
  - Big question is how much this will be involved in FF fitting
  - JW -- Could rank features by difficulty and give progress report where we ask for ranking
  - MT -- Might be better to rank by utility -- Things that already have a 90% solution probably don't need to be done. Making completely new functionality WOULD be really useful.
  - MT -- There's more than just "utility" and "difficulty". It might be something like "time", where the value of a feature changes based on when it comes online
  - DD -- Something like "business impact"
  - MT -- Some question about utility of moving away from OpenMM intermediate format
  - DD -- Could do a Wordley mapping to help figure out where to expend effort.
- SB --
  - Migrated dataset curation from nonbonded into evaluator, since it's become pretty stable. There's still a bit of new stuff in nonbonded (like data models), but most stuff is now in evaluator. This is the big new feature in Evaluator 0.2
  - Have ESP calcs in recharge. Made alpha release of recharge, which builds using its own conda smithy. Easy to make inputs and read outputs -- Handled using pydantic.
    - OpenFF 0.7.0 is not compatible with psi4 on mac, but is on linux.
    - Likely due to pinning of mkl library. Potentially conflict with AmberTools.
    - This is known by Psi4 team, but it's unknown when they'll resolve it.
  - JW will upload recharge package to omnia/main
  - These early releases don't have a full GH release, but are attached to tags labeled with -alpha, -beta suffix. This keeps semantic versioning happy. In conda packages, `-alpha` has to be replaced with `a`. This is handled by jinja template.
- DH --
  - Finalized benchmarkFF paper, now updated on chenrxiv (or will be when Mobley comes back)
  - Back on coding, talked with Matt about PLBenchmarks repo. Some questions left to resolve about data management and availability. Current plan is to include bulky data in releases. But quesitions remain (like what if we only update one small file)?
  - JW -- I'd be in favor of keeping entire giant tarball for each release. No limit on file storage. Would love to join next chat on dataset provenance.
  - DH will organize meeting for later this week, loop in MT and JW.
- Open discussion
  - MT -- Virtualistes PR is large and hard to review. We should avoid gigantic PRs in the future.
  - JW -- I could take PR
  - MT -- Not necessary, I'm close to done. But I AM interested in knowing how to avoid giant PR reviews in the future
  - JW -- Two thoughts
    - One alternative is to do PRs into a "new feature" branch, and take them in bite sized chunks. Is there any other model?
    - Trevor and I talked a lot about the behavior and API of this, so this limits the PR in scope
  - MT -- Still doesn't address what to do in the general case, especially if we expect more of tehse giant PRs
  - DD -- Could think of strategies for rebalancing this work.
  - MT -- This would have benefitted from defining milestones at the beginning.
  - SB -- Lots of changes in PR that aren't code -- Lots of formatting in diffs. Some method/function renamings. Much of this could be fixed by black/isort. API changes could be clearer.
  - MT -- On large code/behavior journeys, small reviews will make sure that things stay on the rails.
  - SB -- Agree that small reviews would make this easier and more comprehensive.
  - SB -- Re blacking, flake8ing, and isorting the toolkit, I'd suggest opening a new PR and blacking, but not isorting or flake8ing yet
  - DD + MT -- Flake8 is a bigger step than black -- It won't automatically fix things, but will raise an error if it's not happy.
  - SB -- isort and black will sometimes fight each other, if there are comments mixed in with import statements. Other possible issue is that isort never actually runs the code, so if there's an import in __init__ that depends on order, isort will naively rearrange them to be alphabetical and break things.
- MT -- CMILES just got merged in.

2020-08-10 Core DevsMeeting notes

Date

Participants

Discussion topics

Action items

Decisions