2022-02-24 Protein FF meeting note

Participants

@Chapin Cavender
@Matt Thompson
@Pavan Behara
@Jeffrey Wagner
@Michael Shirts

Discussion topics

Item	Notes

Item

Notes

BPS poster feedback

CC – People were very excited about OpenFF poster, asking about when they can use it.
- Lots of excitement about covalent modifications
- I was right next to MDAnalysis poster, I think that attracted a nice crowd of folks interested in big simulations.
- People asked whether interchange was going to support TINKER. I don’t know what our plans are there.
  - MT – I’d be happy to support tinker if they put resources in for my time, but right now it’s not a priority for me. They’re a little lacking in specification and they don’t have as large of a userbase as AMBER/GROMACS, which are still in our plans.
  - MS – Agree
- A lot of folks working on the CHARMM-DRUDE FF came by, I don’t think we have plans to include DRUDE in our toolkit right now, but I did mention that we’re pursuing some polarizability models. The folks that I talked to were at Justin Lumkul’s group at VT (lead author of 2nd gen DRUDE FF, lab is looking into making parameters for covalently modified AAs).
  - JW – I think DC lab is working on polarizable parameter generation? Not certain, but this could be a way folks to get early access to it before it’s supported in our flagship packages
  - MT – Polarizability is a huge pain infrastructure-wise, this won’t be coming to our infrastructure any time soon.
  - CC – Agree that it’s a huge pain. For now our best path forward is to stick with additive models.
  - MS – Yeah, it’ll be hard to find a polarizable model that is supported in all engines, there may not be such a thing. I anticipate this will be hard.
  - MS – Maybe there’s two questions – When will this get into the infrastructure, and when will this get into the FFs?
    - Infrastructure would come first, or at least the ability to extend it
    - No hard plans/deadline to include in flagship FF line
- MS – What are current plans for PTM support?
  - JW – If people can get librarycharges for the PTM, then they can do Sage+ff14sb port PTMs on day one of the biopolymer release. We don’t have plans to provide a convenient official way to get the library charges for PTMs. Rosemary may be able to handle this in an “all in one” solution, but that relies of having graph net charges.
CC – Got some questions about benchmarking from a lipid FF person. I talked about our plans to benchmark to observales and they seemed satisfied.
JW – Are we planning to use F@H for observable benchmarking? Or will they run locally?
- MS – I don’t know
- CC – I’m also not sure, info would be in the liveCOMS review. I can look into this next week.
- MS – We’ll eventually want this, but maybe not initially.
- JW – Hm, we’ll need to get this into the project requirments by next Tuesday if we need it in the first generation. Would this just be unbiased simulation at a certain temp and pressure?
  - CC – We may need enhanced sampling.
  - MS – Talk to VVoelz and/or GBowman. I don’t think we can do replica exchange because that required parallel communication.
  - JW – So, requirements would be
    - (high) Unbiased simulation at a set temperature and pressure, getting trajectories back from F@H.
    - markov state modeling
  - JW – Could we write this up in fah-alchemy issue tracker?
    - MS – I’ll draft this and tag folks from this call
  - CC – Bowman group uses a method called FAST, which provides seed conformations for F@H jobs.
CC – Talked to Greg Bowman’s group, consensus there was that the problem isn’t FFs, it’s sampling. They think most sims are undersampled by a factor of 1000.
- MS – Some problems are due to undersampling, but some aren’t. We can’t say which problems are due to underssampling until we DO have adequate sampling.

Talk notes

Heard talk from John Jumper, lead dev on alphafold. A lot of folks said that they just put more data in than anyone else, but John credits the network structure instead.
- He went into a lot of detail about how the structure prediction works - They treat the backbone as a set of beads (3 beads per AA), and the network learns constraints based the relative positioning of these beads. Then the later, additional atom placement steps are treated as another puzzle. John says that the network doesn’t understand context - It just tries to make a folded structure as hard as it can - so it doesn’t react correctly to mutations like putting a charged AA in the core, which should lead to unfolding.
- One cool thing they found is that the “confidence score” is well-correlated with the propensity of a protein to form secondary structure. So the confidence score can incidentally be used as a high-performing disordered-region-detector.
- Also found that alphafold does a good job of predicting dimers. Later trained a model for multimers, this model is pretty good but misses some interfaces.
- Credited CASP and the surrounding community were critical to solving the problem, by giving a strict definition of the problem and a lot of data.
- MT – Do you think that they did an adequate job of explaining how the network actually solves the problem?
  - CC – They did mention specific layers in the network in describing different stages of the folding process. I’m not sure that he fully justified his explanations, but it seemed reasonable.
  - MT – This still sounds a bit black-boxy, but it clearly does work well, so maybe understanding all the intricacies isn’t necessary.
- JW – I wonder if they’re working in a world of biased physics - not just “here’s the entire world of protein physics”, but rather “here’s the world of protein physics for proteins THAT CRYSTALLIZE”
  - CC – They were aware of this bias in many methods. In response, they mention that the confidence score correlates very well with disordered proteins/regions, and the model doesn’t generate predictions for those regions.
- JW – How did the team dynamics at alphafold look?
  - CC – There were 15-20 authors, so the team was pretty large. They were gracious – Credited CASP community. Not sure what their future plans are business-wise. I think this is seen more as an advertisement “hey, we solved protein folding, we can solve your business problem too”
  - MT – When alphazero was released a few years ago and beat other chess engines, people were excited by a bunch of followup questions. There were a bunch of followup papers about possibilities, but the google team moved on and didn’t come back.
Talk from Phillip Biggins lab from Oxford.
- Studied ABFEs for lipid-exposed binding pockets, like in bilayers/membranes. So “unbound state” was defined in lipid, not water.
- Only used one protein target and FF (99sb-ildn, and GAFF2), and 27 ligands. Found that they underestimate binding energies by about 2.8 kcal/mol.
- Hypothesized that ligands are less flexible in lipids compared to water. This might interfere with protocols that are used in current binding energy calculations. Found that more aggressively forcing sampling of ligand conformational space brought RMSE down from 2.8 to 1.6 kcal/mol.
- PB – WRT the lipid simulations, which FFs did they use for this?
  - CC – Probably CHARMM FFs. They have an all-atom model, and the coarse-grained MARTINI model
- CC – People asked whether, in the apo state, did the binding pocket remain in a conformation that allowed the ligand to successfully get lambda'd into the binding pocket? Like, if you lipid FF is wrong, maybe your binding pocket is collapsing/going to a different conformation so the ligand is unable to get into its correct conformation? Authors replied that the apo state seemed to be modeled well.
- CC – This seems like a generally interesting questions to investigate - mostly people look at solvent-exposed binding pockets
- JW – It’s a little interesting that they look at the lipid-binding pocket transition, instead of solvent → lipid → binding pocket
  - CC – The full cycle for binding would indeed include transitions from solvent to lipid, that’s an active area of research. But they’re looking at a different part of the cycle, and it would add unnecessary noise to consider the whole cycle instead of just the part that they studied.
Talk from Lars Bock, from Grubmuller lab in Max Planck institute in Gottingen.
- Was in response to a talk about using cryo-EM grids to predict solvent distribution (By Holger Stark?. So the different classes separated from a cryo-EM ensemble could represent a transition where the solvent rearranges, and you could do math with the changes in density to calculate thermodynamics.
- Speaker did some rough math and MD simulations to show that the cooling process before cryo-EM would keep this from being a useful source of solvent info. The cooling process, if it takes more than a few hundred ns, gives the molecules time to relax to a lower energy state, and so they’re not representative of solvent structures. There’s a significant subpopulation of the field that agrees that the cooling process disrupts the solvent distribution to the point where it should be used.
Cool FF talk from PROSECCO group in prague.
- ECC stands for Electronic Continuum Correction. Tries to deal with the fact that molecules are parameterized in gas phase, but actually simulations happens in condensed phase. So there’s a big permittivity difference. Standard way of dealing with this is to use a polarizable FF, but that’s difficult to implement. RESP2 and iPolQ give ways to adjust fixed charges to represent condensed phase environments. This talk argues that you can separate permittivity into nuclear and electronic contributions.
- They do some math using a factor of 1/sqrt(permittivity) to correct for this. So if you have a system with zero charges, sometimes you end up modeling components with nonzero total charge.
- Did some validation but they went through it pretty fast, I”ll need to read the manuscript.
- JW – Do you know if they have software available to generate these charges? We could experiment with them here.
  - Working on refitting a FF with PROSECCO charges, so they must have software, but the rest of the FF will be CHARMM36. So I’m not sure whether there’s software to just get the charges out.
CC – Lots of talks on MSMs and enhanced sampling methods, using HDX to constrain/inform simulation (seems like there’s still work to be done)
- MS – What is the consensus on how to predict NMR observables form simulations?
- CC – Robert Best from NIH supports a model
- MS – Some ML models too (Andrew White)
- CC – There’s a tuning parameter for how much the observable data affects the simulation, they did cross-validation of this parameter value.

2022-02-24 Protein FF meeting note

Participants

Discussion topics

Action items

Decisions