2025 Cross Training

2025 Cross Training

Motto: “Everyone fits!”

Goals

  • Increase team capacity to perform mission-critical work

  • Identify pain points in the existing fitting workflow

  • Improve team ability to troubleshoot fitting tools

  • Advance some scientific features toward release

Logistics

Dates

Aug 11, 2025 - Aug 22, 2025

Compute resources:

  • UCI HPC3 (talk to DM about using free queue vs. dedicated)

  • NRP (though DASK is flaky)

  • Amazon (max 2 GPUs)

Summer camp plans

 

Summer camp plans

 

Lily

Camp counselor

Jeff

Something QCSubmitty, BespokeFitty

Figure out project ideas here (minimal “basis set” torsion fits?)

Jen

Something general forcefield-y

Figure out project ideas here

Matt

Something Evaluator-y and if possible YAMMBS-y and interchange-ey

(tentative) Small nonbonded refit (in FB / evaluator)

Josh

(tentative) Protein FF refit with SMEE

 

Smee vs. forcebalance

New NAGL model fitting (would take too long)

Replicate lipids or water refit

Protein-ligand benchmarking?

Some sort of adversarial “game the benchmarks” competition

Looking at really bad outliers in benchmarking results

 

Spitballing ideas

  • Re-fitting vdW with additional parameters (requires: GPU access, Evaluator, ForceBalance for now)

    • Difficulty: Medium/hard (requires dask wizardry; low on SLURM)

    • DASK operator on NRP is flaky, requires frequent requests to admins to get things reset.

    • There are improvements we can make with ForceBalance to make re-fitting possible

    • If running on NRP became more reliable, would help our collaborators too

    • Could run large-scale phys prop benchmarks to look for more outliers/possible errors in ThermoML

  • Do nonbonded fitting with smee/descent/evaluator stack

    • Difficulty: Hard

    • smee does its own phys prop calcs, not sure if that would implicitly obsolete evaluator/be able to scale up? Or could be best to hook up evaluator to smee to ensure old behavior.

  • Doing a refit with smee and expanded torsions to small molecule data (requires: 1 GPU access, smee)

    • Difficulty: Easy

  • Experimenting with minimal torsion initial fits and a “basis set” of minimal data before expanding (smee, GPU)

    • Difficulty: Easy, modulo picking datasets (would require using QCSubmit)

    • QC data: minimal torsion-driven/optimized fragments

  • A protein re-fit with smee

    • Difficulty: Easy/medium

    • QC data: Phosphyorlated/modiifed AAs? 2D dipeptide torsion scans (if not done already)? Not really sure here.

  • Diagnosing current FF issues with current benchmarks (e.g. outlier bonds, angles)

    • Difficulty: Easy, possibly tedious (lots of interacting with YAMMBS)

    • QC data: Molecules which are likely to be outliers

  • Adversarial benchmarking – looking for edge cases where the metrics look good but the FF is bad

    • Difficulty: ? - Could task everyone with keeping an eye open on ways to game benchmarks

Core competencies that everyone will end up with

  • The conceptual stages of FF fitting

    • Comprehension check: Maybe have everyone outline the FF fitting process in about ~1 page (or have like a free response quiz), and have ChatGPT evaluate everyone’s understanding relative to Lily’s

  • Creating+Submitting QC datasets

    • Comprehension check: PR review, maybe short presentations from everyone on their datasets

  • Setting up and running valence fits with FB

  • Setting up and running valence fits with smee

    • Comprehension check: Everyone presents on differences between these two and speculates about causes/which is better

  • Running a nonbonded fit with FB/Evaluator for a subset of phys props

    • One project MIGHT look at differences between this and smee/descent stack

  • Running standard benchmarks using YAMMBS

  • Running standard nonbonded benchmarks (maybe on reduced datasets if compute-constrained)

  • New set of benchmarking data of hard cases

  •