Infrastructure Roadmap 2022

List of infrastructure tasks for 2022. Each task should be linked to its Confluence or GitHub page with more information. See also Science Roadmap 2021 .

Labels

Category	Labels

Category	Labels
Priority	high \| MEDIUM \| LOW
Effort	high \| MEDIUM \| LOW
Status	Not started \| In Progress \| PROTOTYPE \| Completed \| BLOCKED \|

Roadmap

Infrastructure tasks	Priority	Effort	Blocking science?	Infrastructure Dependencies	Start date	End/Due date	Status	Driver

Infrastructure tasks	Priority	Effort	Blocking science?	Infrastructure Dependencies	Start date	End/Due date	Status	Driver
Architecture / General infrastructure
95%+ core package uptime and deployment (OpenFF TK, s99F, OpenFFs)	High	High			Ongoing		Completed	@John Chodera @Jeffrey Wagner @Matt Thompson @David Dotson
Add `openff-qcsubmit` `openff-evaluator`, `bespokefit` to core packages	High
`openff-qcsubmit` and `bespokefit` users guide	MEDIUM							@Josh Mitchell
Refresh , consolidate, and prune OpenFF toolkit examples	MEDIUM							@Josh Mitchell
SQM AM1 optimization connectivity change handling	MEDIUM							@Jeffrey Wagner @Connor Davel
`openff-benchmark` refactor (make components be Python-first and more modular to enable more flexible workflows)	High							@Jeffrey Wagner
Streamline `ForceBalance` CI and pre-release testing	MEDIUM
`ForceBalance` developers guide	LOW
Refactor `espaloma` for production use or implement in OFF Toolkit	Unresolved – Don’t know timeline for adoption – Bring up in leadership/gov board meeting. Infra team will look into completeness of tests/reference values to estimate refactor cost
Accept, reject, or request specific feedback for SMIRNOFF spec proposals within 4 weeks of submission	High							@Jeffrey Wagner @David Mobley @John Chodera @Simon Boothroyd
Automated upstream RC tests	MEDIUM						IN PROGRESS	@Matt Thompson
“Did I break something else?” tests against `master`/`main` branches of OpenFF packages	MEDIUM							@Matt Thompson @Jeffrey Wagner
QCA Standards v3 implementation	High							new qca hire?
QCA in-server, policy-based error cycling	MEDIUM
QCA 2D torsiondrive support (may just need to verify that this works)	High
Psi4 to conda-forge	High
QCA chained operations	LOW (may increase if found to be blocking)
General “reproducible computation” records and data infrastructure	High			Interoperable molecule class				@Simon Boothroyd @Joshua Horton
Bayesian infrastructure: ML frameworks			Bayesian Fitting	Analytically Differentiable System Object			BLOCKED
Off-site charges (support for conversion to other packages)	MEDIUM			Hard to spec without VirtualSite Handler implementation	June 2021		IN PROGRESS	@Matt Thompson
Define and maintain specific goals for Bespokefit deployability/stability (succeeds on 95% of minidrugbank? In under some set number CPU-hours? Regression test suite incorporated into CI)	High							@Jeffrey Wagner @Matt Thompson @Joshua Horton
Local torsiondrive executor (default qc, ANI, and XTB - a more formalized version of this command line command in `openff-benchmark`)	MEDIUM							@David Dotson
Interchange: Have ForceField.create_openmm_system either be deprecated or wrap Interchange call	MEDIUM			GBSA suport in Interchange Plugin support in Interchange, and a few months of lead time for scientists who need to port parameterhandler plugins				@Jeffrey Wagner @Matt Thompson
Interchange: System combination	MEDIUM						PROTOTYPE	@Matt Thompson
Interchange: AMBER export	High			Biopolymer topologies SMIRNOFF updates			IN PROGRESS	@Matt Thompson
Interchange: GROMACS export	High			Biopolymer topologies SMIRNOFF updates			PROTOTYPE	@Matt Thompson
Interchange: OpenMM export	High						PROTOTYPE	@Matt Thompson
Intercharge: LAMMPS export	LOW			SMIRNOFF updates				@Matt Thompson
Interchange: ParmEd export	MEDIUM			Biopolymer topologies SMIRNOFF updates			PROTOTYPE (not to be advertised for production use)	@Matt Thompson
Interchange: AMBER import	MEDIUM			Biopolymer topologies			IN PROGRESS	@Matt Thompson
Interchange: GROMACS import	LOW			Biopolymer topologies			IN PROGRESS	@Matt Thompson
Interchange: OpenMM import	LOW			Biopolymer topologies			PROTOTYPE	@Matt Thompson
Interchange: ParmEd import	MEDIUM			Biopolymer topologies			PROTOTYPE (not to be advertised for production use)	@Matt Thompson
Interchange: Track parameter provenance on import (ie, hold a single value for shared GAFF parameters)	LOW			Biopolymer topologies				@Matt Thompson
Interchange: Interfacing with ML-based fitting			Blocked by lack of specification / needs to be broken into more discrete deliverables				BLOCKED	@Matt Thompson
Interchange: “Book” documentation/user’s guide	MEDIUM						IN PROGRESS PROTOTYPE	@Josh Mitchell@Matt Thompson
CLI tool infrastructure	HIGH				June 2020		IN PROGRESS PROTOTYPE	@Matt Thompson @Jeffrey Wagner
Remove `smirnoff_hack`.py	MEDIUM							@Jeffrey Wagner
Implement CachingToolkitWrappers	High						PROTOTYPE IN PROGRESS	@Jeffrey Wagner @Connor Davel
Toolkit
AMBER-derived SMIRNOFF-format FF	HIGH		Biopolymer fitting		March 2020		PROTOTYPE	@Jeffrey Wagner @Chapin Cavender
Polarizability ParameterHandler	LOW		Polarizable fitting
Custom GBSA handler	(Follow up with @Jeffry Setiadi @Michael Gilson to understand long term plans and infrastructure needs)						PROTOTYPE
WBOs for improper torsions			Waiting on research results to assign priority
A deep dive into toolkit parametrization differences (Josh Fass SMIRKS differences) / Automate complaining about cases where incoming molecule/chemistry is bad/misformatted	High						IN PROGRESS	@Connor Davel @Jeffrey Wagner
Refactor/make our own Exception hierarchy, implement some problems as catch-able warnings.	MEDIUM						IN PROGRESS	@Matt Thompson @Simon Boothroyd @Jeffrey Wagner
Implement friendly default behavior when loading large molecule datasets/high-volume pipelines, with option for custom validation logic. Consider making moleculefixer for common data problems.	MEDIUM
openforcefield-core/pydantic refactor (possibly driving a SMIRNOFF spec update)	High			Aromaticity refactor Stereochemistry refactor
Remove OpenFF-Toolkit’s hard dependency on OpenMM (migrate to pint/openff-units) (patch)	MEDIUM				Sep 2021	0.11.0 release	IN PROGRESS	@Matt Thompson
Protonation state enumeration	LOW			RDKit doesn’t have helpful protonation state enumeration; need to publicize and see if community wants to contribute there https://github.com/openforcefield/openforcefield/issues/526 Could use EPIK from schrodinger suite? Example in OpenMolTools	Mar 2020	July 2020 (incomplete)	PROTOTYPEBLOCKED
Interoperable molecule/stereochemistry/aromaticity refactor	MEDIUM			Need to decide on desired behavior for how stereochemistry and aromaticity is handled. Also need to decide on which molecule formats should be losslessly round-trippable.				@Jeffrey Wagner
Biopolymer infrastructure (SMARTS typing optimization)	High		Biopolymer fitting			Dec 31 2020	IN PROGRESSPROTOTYPE	@Jeffrey Wagner @Connor Davel
Biopolymer infrastructure (infra improvement/Topology refactor/automated polymer unit recognition)	High		Biopolymer fitting	Should discuss design with OpenEye			IN PROGRESS	@Jeffrey Wagner @Iván Pulido
Biopolymer infrastructure (graph charges and/or other scalable solution)	High		Biopolymer charge fitting
CMAP torsions in OFFTK/SMIRNOFF spec	LOW		CMAP fitting
Fitting
Migrate FF optimization to ML framework	HIGH
QM-MM / iPolQ solvent calcs on QCA	MEDIUM Needs research cycle: Could another QC program offer performant continuum solvent model and be compatible with QCF?			(Maybe: Way to generate solvent configuration in QCA? Or is this cheap enough to do outside QCA?)				@Chapin Cavender Maybe @new-qca-hire @Trevor Gokey @Jeffrey Wagner
Single-point QM-MM of a subset of packed/folded protein on QCA	LOW			Need to decide on a QC program and ensure feature support/QCA compatibility)
Benchmarking
H-G calculations in OpenFF-Evaluator	Medium							@Jeffry Setiadi
P-L benchmarking (repo)	Medium				Mid 2019?		IN PROGRESS PROTOTYPE	@David Hahn@David Dotson
Protein Xtal/NMR observable based benchmarking and fitting (chemical shift/scalar couplings/RDCs/kirkwood-buff integrals/etc) (specific effort to be directed by @Chapin Cavender)	HIGH							@Chapin Cavender @Jeffrey Wagner
PL Benchmarking on Folding@Home (aligning with architecture from `bespokefit` where possible/appropriate)	HIGH							@David Dotson
Automated benchmarking + dashboard May include geometry tools (MM minimization, conformer generation, torsion scanning, conformer scoring)	HIGH			(Optional) Reliable QCMol → OFFMol conversion/CMILES deviation checks	???		IN PROGRESS	Dashboard: @Jaime Rodríguez-Guerra (Deactivated) @David Dotson @John Chodera @Trevor Gokey
Documentation / Community / Training
Reference energies data package (curation write-up)	MEDIUM	MEDIUM		SMIRNOFF updates	Sep 2021		IN PROGRESS	@Matt Thompson
CHARMM-GUI integration / validation	LOW			A way to create CHARMM residue template files (ParmEd Issue #1103)