Approach III: training to FF data only

A valence fit to existing QM protein data only, experimenting with initial values from both AMBER and Sage.

Overview

Summary	One reason our protein force field candidates may not approach the performance of AMBER force fields may be because they were trained (over many years and generation) almost solely to protein data, as gaff is used to handle small molecules. Dataset issues on our end could include not enough protein data or overweighting small molecule data. This experiment trains only to protein FF data to see if that improves performance. We experiment with two starting points: one from AMBER initial values, the other from Sage.
GitHub repo/branch
Status	Not started In progress Completed Won't progress

1 Overview
2 Milestones and metrics
3 Progress and findings
- 3.1 Curated data (or similar title)

Milestones and metrics

Stage	Milestone/Benchmark	Contributors	Deadline	Status

Stage	Milestone/Benchmark	Contributors	Deadline	Status
Fit Sage from protein data only (Null FF)	Starting from an up-to-date version of the protein-param-fit FF ( ), run a re-fit only to protein data, from Sage initial values (MSM for angles/bonds, torsions for torsions).	@Chapin Cavender (updating environment) @Brent Westbrook (Unlicensed)	Nov 22	Not started
Fit Sage from protein data only (Specific FF)	Starting from an up-to-date version of the protein-param-fit FF ( ), run a re-fit only to protein data, from Sage initial values (MSM for angles/bonds, Sage or AMBER torsions for torsions).	@Chapin Cavender (updating environment) @Brent Westbrook (Unlicensed)	Nov 22	Not started
Validate AMBER port on all training and testing data	Check output OpenMM Simulation systems that all parameters are assigned the same as they would be under the OpenMM AMBER 14sb FF, vs the SMIRNOFF port	@Lily Wang		Not started
Fit AMBER from protein data only	Fit the above force field to protein data only	@Lily Wang		Not started
Small molecule benchmarks	Run small molecule QM benchmarks for Null + Specific			Not started
Protein stability benchmarks	Run GB3 benchmarks for Null, Specific, and retrained AMBER. Can likely decide whether to progress based on performance over 5 us			Not started
Helix folding benchmark	Run helix folding benchmark for Null, Specific, and retrained AMBER.			Not started
Smaller peptide benchmarks	Smaller peptide NMR scalar coupling benchmarks			Not started
				Not started In progress Completed Passed Failed

Progress and findings

Curated data (or similar title)

Action items

Ask Chapin to check protein-param-fit is up-to-date
Ask Chapin to prioritise null vs specific
Ask Chapin for input specific files (and double check the hard-coded params in b7s26)