2021-10-07 Meeting notes

Date

Oct 7, 2021

Participants

@Chapin Cavender
@Pavan Behara
@Lily Wang
@Michael Gilson
@Daniel Cole
@Jeffrey Wagner
@Joshua Horton
@David Mobley
@Simon Boothroyd
@Matt Thompson

Goals

Anticipated Rosemary infrastructure
Decisions for protein library charges

Discussion topics

Item	Presenter	Notes

Item

Presenter

Notes

Anticipated Rosemary infrastructure

@Chapin Cavender

Infrastructure team is updating their roadmap for the next year. Do we anticipate any infrastructure needs for the biopolymer force field beyond the minimum viable product that we should have on the roadmap?
Support for QM/MM calculations in QCArchive/QCSubmit
- Explicit representation of solvent sampled from MD trajectory, like IPolQ
  - Solvate di/tri/tetrapeptides in specific water model (and possibly salt ions), run MD until solvent ESP converges, fit small number of point charges to reproduce solvent ESP
  - QM solute with MM from fit point charges
- Sample snapshots from a protein trajectory obtained with candidate force field parameters for subsequent retraining
  - Choose snapshots by clustering or filling gaps in phase space absent from existing training data
  - QM for subset of residues, MM for rest of protein and/or solvent
  - JW - is MM contribution primarily electrostatics
    - CC - yes, but vdw contacts could be important in e.g. stacking
- MG – Idea is analagous to LWang’s experiment a few weeks ago, looking at changes in peptide charge as a funciton of flanking residues
- SB – What sorts of terms would be adding errors form gas-phase calcs? Would this approach really solve them?
  - CC – Electrostatics, and torsions as a result
  - SB – That makes sense.
- SB – This could be handy, but it’s not clear that this is going to be needed/high priority. Would it be better to start with something like RESP2?
  - CC – That makes sense. I had the impression that implicit solvent models weren’t fast enough for our needs, so another idea would be to make the existing models more performant.
- DM – We’re talking about two infrastructure needs:
  - The “iPolQ approach” – Not a huge lift, we’ll eventually want to support this
  - The “QM/MM of a packed/folded protein approach” – Not clear that we’d end up using this.
- The former may be the higher cost/benefit ratio
- JW – Both of these may be the same/largely overlapping in infra needs. I’ll put these as candidates for addition to the roadmap, but don’t really have much of an idea about the difficulty. I’ll begin working with Dotson on seeing how difficult this could be
  - DM – This could overlap with the new QCA developer that’s coming in
- MG – This could also be remedied by polarizability.
  - CC – We don’t yet have a polarizable charge mdoel, so how do we represent the fixed charge model?
SB – I’d like to make sure that there’s a good scientific rationale and a driver before we add this. So it’s be good to have done a feasibiliy study and a plan to make the small molecule+protein FF self-consistent.
- CC – Makes sense
DM – A weekish ago, we were discussing external electric fields in psi4 – What was the context for that/does that overlap with this?
- (For the notes, here’s the thread where Willa’s work was discussed:

https://openforcefieldgroup.slack.com/archives/CJQ4DCWN8/p1632352959026000

- )
- MG – This was Willa Wang’s project. It’s being used to generate training data for polarizability. It came up because we had been doing calcs outside of the global QCA, and we dicided it would be better to do it inside the global QCA. WWang mentioned that there may be inconsistencies in how wavefunctions get mapped to ESPs.
  - SB – It looks like WWang’s dataset is moving through the submission pipeline. In terms of reconstructing the ESP from wavefunction I’m not aware of any problems - Please have WWang contact me if there’s an issue.
- DM – Ok, so it’d be good to have a clear idea of what’s already available and a specific plan of what we’ll additionally need.
- CC – Big thing to me would be running MD and getting solvent distributions, then submitting those for calculation.
- SB – That sounds like something where we should do the sampling locally to begin with, then learn exactly which behavior we want, and then once that’s established, to try and find a place where it should be refactored and live in the longer run.
- CC – That makes sense. I’ll start scouting this out.
CC – I had thought think
JW – Would people want a “point at a residue and get it out as a capped molecule” functionality? This is on our roadmap but I’m not sure how high-priority it is.
- CC – I could see that being handy for setting up QM/MM calculations. But for now I’m building things from the bottom up.
- DM – LW, have you made this before?
- LW – Not yet. Currently I do this manually.
- MG – What’s this needed for?
- DM – This would be for parameter fitting/charge assignment to new polymer units.
- LW – Oh, I do have a parameter/charge generator for new polymer subunits, but it’s kinda manual to say how and where to cap.
- DM – So I see this as three problems:
  - How do I break this thing into repeating units?
  - How do I cleave out a single unit?
  - How do I cap this thing that I cleaved out?
    - LW – This is currently a manual step.
- JW – I could see this being used for mostly standard proteins with PTMs, where the PTM would get excised, get charges assigned, and then those would be assigned back to the subunit in the protein.
  - LW – Currently polymetereizer tries to do this, but the cap addition could be more sophisticated.
  - CC – So it’s hard to infer what an appropriate cap would be? Like, this is where user input is required?
  - LW – Yes
  - DM – So an algorithm could be “figure out if this is a protein, if so then do ACE/NME, otherwise don’t / just use methyl”. Instead of the “is this a protein” check, fragmentation could be based on WBOs, where the relevant neighboring environment could be included in the excised fragment.
  - LW – Where would that go?
  - JW – I think this should live in its own repo until we understand the desired behavior.
- SB – So the root of this discussion is “How do we assign charges to an unexpected thing in an otherwise standard residue chain”?
  - DM - Yes, where most cases will be covalent modifications of a rpotein.
- SB – Looking at the question of “how do we assign charges?” One option is “cut and cap”, but another option is to have a quick neural network which has been trained both on small molecules and proteins run on the entire protein. The featurization of this network would dictate how many bonds out it looks, and so the resulting parameters would be self-consistent. I think this should be considered alongside the “cut and cap” options.
  - DM – That’s not a bad idea, but we should totally do the “traditional” cut and cap as well for comparison.
- SB – Beyond charges, the parameters themselves would come from a single FF – We shouldn’t think of it as “small molecule parameters” mixing into the “protein force field”. It’s going to be a single self-consistent force field.
DM – So, LW should make the automated cut-and-cap method available.
- LW – I can refine my method by running on more peptides and ensuring that the residue processing works cleanly.
SB - benchmarking infrastructure
- Comparing to observables, NMR, xtal
- Pair distribution functions

Decisions for protein library charges

@Chapin Cavender

ELF10 library charges for amino acids from @Lily Wang obtained by averaging over Ace-Val-X-Y-Z-Val-Nme
What residues should have library charges?
- Alternate protomer/tautomers?
  - DM – We definitely want to support the standard AMBER protomers/tautomers - HIP/HIE/HID, CYS/CYX/(maybe CYM),
    - CYM?
      - DM – CYM can be used for complexing metals without covalent linkages/restraints
      - JW – Metal support is a can of worms
      - CC – Could treat it as an unnatural AA with the “cut and cap” method
      - CC – So, let’s support all protomers and tautomers from FF14SB, maybe excepting CYM
    - Selenocysteine/selenomethionine?
      - (General) – Lets not cover this
- Non-capped termini?
  - (General) – Also let “cut and cap” handle these as if they were unnatural AAs?
What SMIRKS strings should we use?
- SMIRNOFF port of Amber ff14SB?
- Should we specify stereochemistry?
  - JW - not for charges - All-L and all-D polymers should have identical charges. Torsions may be “contaminated” if the whole training set is one chirality
  - DM – Unresolved science, should do more experiments on this
  - MG – Torsions may indeed be incorrectly fit
How do we handle non-integer averaged charges?
- Normalize like toolkit assign_partial_charges()?
  - (General) – So, this is if the charges are assigned by doing charge assignment to a 5-mer, but some electron density is donated to/withdrawn from the central residue of interest, and so the point charges don’t sum to the formal charge.
  - DM + JW + CC + SB - normalize by residue
SB – Infrastructure needs for benchmarking?
- CC – I’m thinking these will be simple – Just need to calculate kirkwood-buff integrals, pair distribution functions, NMR/xtal observables.
- JW – Could you send me a slightly more detailed version of this to ensure that we can get these on the roadmap
  - CC – I’ll send you a list of these
- PB – Would these go in evaluator?
  - CC – I think so.
  - SB – Let’s sync up on that and get the groundwork laid in the next 1-2 months.

Meetings

2021-10-07 Meeting notes

Date

Participants

Goals

Discussion topics

Action items

Decisions

Related content