May 2020 Developers Week Notes

Date

11 May 2020

Participants

Goals

Discussion topics

Notes

Virtual devs week organization

15 minute break each hour
End early if possible

Day 1:

Round table updates
Feedback on development and work practices
Interest in physical developers week meetup
Namespace reorganization
Create subgroup for QC dataset organization
(maybe) Slack channel discussion

Day 2:

Round table updates

Jaime –

From Spain. Master in bioinformatics, PhD in computational chem (metal ions in biology).
Worked a bit on automation in parameterizing/running metal-biological simulations.
Got interested in scientific pipelines.
Worked on porting AmberTools20 to conda-forge, other ugly infrastructure work

David Hahn --

From southwest Germany. Phd in molecular dynamics method development and applications.
Currently openFF postdoc at Janssen
Work on benchmarking FF parameters WRT P-L binding free energy, developing benchmarking dataset, free energy calculation workflow

Josh Horton --

From Northeast UK. Phd in bespoke FF parameterization. Now a postdoc with OFF working on bespoke pipeline

Josh Fass --

From Egypt. Senior PhD with Chodera. Working on pilot experiments for bayesian sampling in models where property calculations are cheap.
Working on different ways to explore atom type environments, to alleviate need for brute force sampling

Simon Boothroyd

From central UK. Working on OFF-evaluator, lots of infrastructure work. Migrated PropertyEstimator to OFF Evaluator. Significant amount of documentation written, examples on CoLab.
Created Nistdataselection – main repo for curating phys prop datasets
Last week, made a new repo that rethinks the nistdataselection structure, for a better-architected way to access said data.

Matt Thompson

PhD in chemical engineering with Peter Cummings, with no biology whatsoever. Studied properties of ionic liquids. Didn’t care much for FFs. Worked on classical atomistic MD (GROMACS+LAMMPS) of materials.
Software scientist with OpenFF. Working here for 6 weeks. Working on small tasks to get onboarded.
Will work on creating System object. Expecting cooperation with MosDef group/GMSO code. Will be highly interoperable with both MD engines and machine learning frameworks.

David Dotson

From St. Louis. PhD on large protein systems. Worked in healthcare industry and did devops/data engineering on the side.
Met with Daniel Smith at SciPy, heard about OpenFF.
Started at the same time as Matt, working 50% times. Working on implementing proper torsion interpolation.
Looking at accelerating performance of property calculations.

Hyesu Jang

From Korea. Now grad student with Lee Ping.
Started by making a package for RESP calculation.
Now focusing on running fits and generating new FF parameters.
Currently working with Jessica on improving valence parameters.

Jeff Wagner

From Los Angeles. First research internship at national lab modifying LAMMPS. Confused by biomolecules. Wanted to be a doctor
PhD at UCSD, worked on a mix of methods development and benchmarking, also some docking/screening.
Bothered by inaccuracy in field, disconnect between development and application.
Now maintaining/developing OpenFF toolkit, doing general organziation infrastructure stuff.

Jeff Setiadi

From Sydney, Australia. Did condensed matter physics. Went directly to PhD, looking at protein-ligand interactions.
P-L is hard, would be useful to work on lower-entropy systems.
Now working on pAPRika

Jessica Maat

From San Diego. Studied math+chem in undergrad. Now 4th year grad student with Mobley.
Have worked on a bunch of OFF work. Now looking at trivalent nitrogens. Have done experiments on assigning trivalent nitrogen parameter.
Have also made a tool to select diverse nitrogen compounds, dataset generation for FF release.

Tobias Huefner

Bachelor’s Master’s, PhD in comp chem in Germany. Focus on drug discovery, biomolecular solvation.
Now in Gilson lab. Will be working on benchmarking of docking programs (CELPP).
Looking at understanding docking performance as a function of molecular features.
Also looking at atom typing, in the context of how it can limit accuracy, and if they can be optimized using physics-based methods.

Development practices

SB – Writing code fast with Owen, we didn’t do a lot of testing. I’d do PRs and no merge without reviews, and I think that was the right way to go. Other repos, espeically data-focused ones, have a big need for a quality version history, and so we need to handle it on a case by case basis.

HJ – Used single jupyter notebook for generation of datasets; used PRs for QCA dataset submission.

SB - The kind of PRs we were doing for the data curation / choices etc: https://github.com/openforcefield/nistdataselection/projects/1

JW - Justification for QCA submission approach - needed a way to document and keep track of what we did, be able to evolve approach over time. Can later try and capture the best approach.

JH - Took a few rounds to figure out pattern in QCA submission. Information was all spread out. Needed to synthesize it all.

JM - May be useful to have scripts we all use. Functions that are common for generating the JSON, etc.

JW - QCSubmit should be able to capture many of the lessons we’ve learned. Are there specific things we can list that we’ve learned from this?

Namespace reorganization

Building support for something like from openff import toolkit, evaluator, ...)-- At least covering how we want the final namespaces/imports to look, and maybe getting into implementation of the changeover
Full proposal: Infrastructure Architecture
- Before I finalize everything with releasing the re-branded OpenFF Evaluator framework and commit to the new API naming conventions, I wanted to suggest we should invest some time to cleanup the software stack offered by OpenFF.While everything exists under the same GitHub org, there is almost no consistency between our packages. This will only get worse over time, and equally, will only get much harder to reverse as the user-base expands.i.e currently we have
```
from evaluator import ...
from openforcefield import ...
from cmiles import ...
...
```
  while it would be much more cohesive to have an overall architecture similar to
```
from openff.evaluator import ...
from openff.toolkit import ...
from openff.fractal import ...
...
```
  In practice this seems obtainable through an implicit namespace file structure like https://packaging.python.org/guides/packaging-namespace-packages/#native-namespace-packages while still maintaining individual repositories. This style of architecture / design would seem to lend itself to creating smaller, more focused repo's / packages (similar to more of a set of software 'microservices').I understand this would initially cause a large amount of disruption and possible confusion among users, but the end result would be a cohesive, elegant stack, with all the software we build being connected and identifiable under the same umbrella. Moreover, I believe it would push us to build software which more rigidly follows a single responsibility pattern, rather than monolithic packages which 'do everything' which the toolkit seems to be heading towards (especially if it simply just absorbs things like fragmenter and the QC submission frameworks).
  It would be fantastic to start moving away from a style similar to a zip file of disconnected tools, and to start planning longer term about how we want our software to look and be interacted with.
Which packages should be under OFF namespace?
What should namespace be called?
When should the migration happen?

Determining best practices for QC dataset naming and organization

Migrating packages over to GitHub Actions and unifying under one OE license

Reorganizing/defining/consolidating the many development-related slack channels

Deciding upon a consistent approach and theme for each repos docs

Making a contributor community

Date

Participants

Goals

Discussion topics

Action items

Decisions