2021-07-22 Meeting notes

Date

Jul 22, 2021

Participants

  • @Chapin Cavender

  • @Pavan Behara

  • @Simon Boothroyd

  • @Michael Gilson

  • @Michael Shirts

  • @Matt Thompson

  • @Jeffrey Wagner

  • @Lily Wang

Goals

  • Review project plan

  • Identify infrastructure needs

  • Brainstorm training datasets and validation datasets

  • Status of LiveCoMS review article

Slides

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Project plan

@Chapin Cavender

  • MS – Re: “Prioritize consistency with small molecule FF” – We want nonbonded to be the same, as well as bonds and angles. But we’d kinda expect dihedrals to differ.

    • CC - Agree

    • SB – Agree

    • JW – Agree

    • MS - They way this is described to other groups is going to be critical

    • MKG - shouldn’t be too hard to justify as people always ask, well, is you SMFF compatible with X protein FF?

  • (protein ff models slide)

    • SB – We should put the protein-specific LJ refit goal above CMAP – LJ compatibility should be a high goal.

    • CC – So protein FF should be an extension of small molecule FF

    • MS – Yes. One question is “how much surrounding chemical context needs to be present for a ‘protein’ parameter to be applied?”

      • SB – Same with LJ

    • MG – I think it’s the specificity…

    • CC – So maybe the right way to brand it is “polypeptide torsion”?

      • MG – I think we should avoid even saying “peptide specific” or “protein specific”. Maybe instead “refined torison types”. This way we’d avoid surprising people if these parameters were applied to a small molecule

    • JW – CMAPs at least 6+ months out – Current infrastructure goals are pretty ambitious and don’t include CMAPs yet.

    • SB – People previously have also expressed hesitation about CMAPS.

  • (Proposed timeline slide)

    • MS – What’s the difference between new torsions and torsion CMAPs?

      • CC – There may be more CMAP terms

    • MS – And should the QC dataset for training torsion parameters be different for training CMAPs, or should they be the same?

      • JW – It seems like they should be the same.

      • CC – I think they could be different – There may be a need to train on a different number of grid points for torsion parameters vs CMAPs.

      • MS – So, maybe the initial scans should be done really finely, so we can subsample more easily.

      • SB – The QC* infrastructure is very slow for 2D scans, so we may want to do widely-spaced grid points initially.

    • MS – We’ll want the fitting infrastructure+benchmarking infrastructure to be very automated. So we should coordinate with infrastructure team.

      • JW – This will depend on where we’re running. Will lilac be our compute center for this?

      • SB – Lilac is probably the largest cluster we’ll have access to. Folding@Home will be for RBFE calcs.

      • MS – Could use compute at oak ridge if needed, but it’d be on powerpc.

      • MG – Would XSEDE time help? Could get startup allocations very quickly.

      • CC – I put through an XSEDE proposal in my PhD, so I could take the lead on this.

    • JW – If we want CMAP parameters ready for august, when would we want the infrastructure to be ready?

      • CC – By ~May 2022? Can be flexible

Infrastructure

@Chapin Cavender

  • JW - currently have library charges available for all natural amino acids which may be alternative to generating charge. Reach out if you want more info.

  • CC – Do we need stereo-specific SMIRKS?

    • JW - where this will likely have most impact are partial charges, that will then have knock on effect for torsions. If all the stereocenters flip then the partial charges should be the same, so the parameters could be the same. But if only some stereocenters flip then the parameters wouldn’t transfer.

    • (General) – We’ll keep looking into this

  • MT – Are CMAPs implemented the same way in all engines? Or is there a significant risk that the energies in other engines would be different?

    • CC - not sure, but people probably looked into this. Details about CMAP implementations in AMBER are likely in the chamber program.

    • MT – Thanks. My biggest concern is about interoperability - This is separate from the openff toolkit implementation

Training datasets

@Chapin Cavender

  • MS - how do the listed properties relate to experiments? how to compute them? what is derived etc.

    • CC – KB intergrals are derived directly from exptl measurements. The KB integrals are at the interface of macroscopic and microscopic properties.

    • MS – Chemical potentials are just solvation free energies carried out at a particular concentration. And KB integrals aren’t the directly measured physical property, so let’s list that here. I’m mostly concerned about data provenance, so we should start with a raw, versioned dataset, and then do any needed transformations in our infrastructure.

    • MS – And these pairwise distribution functions and stuff would be for small molecules, not proteins

      • CC – Yes, maybe dipepdides and stuff.

      • MG – This is really promising, this could be high quality data for fitting.

    • MS – Protein HDX and xtal simulations?

      • CC – Those are longer-term goals, more for testing rather than validation

      • MS – Let’s make this timeline info publicly known, so people can give us feedback before they come in as a manuscript review or community criticism of work

    • JW – Is there actually a tool to derive NMR info from simulations? I’ve never gotten these to work personally.

      • (General) – Unknown, we expect to source tools/workflows for this from the LiVeCoMs article.

Validation datasets

@Chapin Cavender

  • MG – CC and I talked for a while about didpeptides, tripeptides, tetrapeptides, and the time-cost of the runs. And can we reduce the combinatorial explosion by flanking the AA of interest with “representative”

    • JW – This seems to be what Cerutti did, which adds some weight to the “representative” idea

  • JW – We can also submit partially-overlapping datasets with varying “priority” levels for computation. Then we can reduce the effect of uncertainty wrt runtime – We just do test fits with whatever’s available when we reach the deadline.

LiveCoMS article

@Chapin Cavender

  • LW – I’d be interested in being involved with this / coauthoring

Action items

Decisions