2021-12-16 Meeting notes

Date

Dec 16, 2021

Participants

  • @Chapin Cavender

  • @Pavan Behara

  • @David Mobley

  • @Michael Gilson

  • @Daniel Cole

  • @Simon Boothroyd

  • @Lily Wang

  • @Michael Shirts

Goals

  • Results of pilot study for dipeptide 2-D TorsionDrives

Slides

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Dipeptide 2-D TorsionDrives

@Chapin Cavender

  • 2D torsion-drive on 2 proper torsions

  • Current status: constraint not propagated on QCSubmit jobs due to version error – resubmitting

  • Alanine:

    • higher levels of theory not considered, for consistency with other calculations (currently: b3lyp-d3bj/dzvp?)

    • low energy basins of QM energy map at gauche conformations, likely controlled by sterics than other secondary structures so far

  • Tryptophan rotamers

    • peaks and minima look like they’re in the same places, but there are differences in the rotamers, e.g. in the beta region

    • DC: pointing out apparent jumps in energy in the QM map, asking if it’s hysteresis

      • MG: side-chain of rotamer is not restrained, so it’s possible the apparent jumps are results from side-chains moving around. Future drives will be constrained

      • CC: yeah, tryptophan has a few degrees of freedom near the backbone and a bulky aromatic group at the end

  • Proline rotamers

    • CC: Only sampled up to ~60 degrees phi angle, as ring strain makes it difficult to go higher, as shown by the high energy barriers on either side of the basin

    • CC: rotamers are fairly similar, differences only in barrier heights

  • MM torsion profiles compared to offset QM profiles (openFF 2.0)

    • minimized molecule with torsion atoms frozen (sidechains are not restrained)

    • Alanine – similar to QM

    • Tryptophan – barrier heights different, region is a lot more “flat”

    • Proline likewise looks similar

  • CC: Fits to target for Fourier series (which would be target in ForceBalance)

    • SB: how does this differ from FB loss function?

    • CC: This is essentially going to be the FB loss function, except without weighting factors

    • CC: basically taking away the rest of the force field and only working with the torsions

  • Concerns about minimisation of MM conformers

    • SB: might still have contributions from, e.g. angle gradients, more work might be needed to zero out the other terms

    • MG: wonders about goodness of other terms, as angle terms will also contribute to the stiffness of a torsion

    • DM: yes, but as a first goal, we are trying for a first pass set of torsions that works with other terms

    • SB: raises concerns about side chain confounding the torsion fit, making it look worse, during MM minimization

    • DM: how do other force fields deal with this? e.g. AMBER

    • CC: AMBER doesn’t let the MM structure minimize, they use the QM conformation

    • DM: Chris Bayly has pointed out that without optimization, you can end up with stiffer barriers from steric contribution

    • SB: suggests restraints on most internal degrees of freedom, wonders if you can track the force contributed by restraints

    • CC: other terms in sage/parsley were derived with minimisation

  • Results of MM targets

    • CC: alanine looks periodic in 2D

    • Comparing tryptophan rotamers

      • Differences seen, and jumps in energy, e.g. around -180 phi and 0 psi – CC attributes to side-chain contributions as discussed above

    • CC: proline looks about periodic

  • Comparing torsion profiles by shape and magnitude

    • RMSE and normalized RMSE from superimposed profiles

    • CC: alanine and tryptophan more similar than anything to proline

    • CC: In general, comparing between side-chains/residues is more dissimilar than comparing between rotamers

      • MG: ignoring proline, because proline is always weird, differences between side-chains look similar to differences between rotamers. Low pro rotamer difference is understandable b/c it’s pretty rigid

      • CC: included tryptophan because expected the biggest difference between rotamers of this particular AA

      • DM: what’s the big picture?

      • CC: know there’s coupling between side-chain and backbone dihedrals. This is a question about dataset generation – do we need to enumerate rotamers? existing protocols only use one

      • CC: looking at normalized RMSE, probably need at least 2 rotamers each for a useful fitting target. Differences from ala and trp to pro are ~20%

      • MG: interested to see what happens when side-chains are restrained

      • MG: should we have side-chain dependent BB torsions one day?

      • MS: that sounds like basically CMAPs

    • CC: tryptophan rotamers similar in most places, differ mostly in angles around linear

  • MS: what are next steps?

    • CC: now need to scale up generation of QM datasets. This was a pilot feasibility study

    • CC: resubmit with constrained side-chains, then decide if want to include other rotamers

    • MS: can these torsions be applied to other small molecules that have the same chemical environments?

    • CC: probably. An open question is if we want to make these stereospecific – do we want to give people the amide generic torsion or the protein-specific one?

    • MG: well, if it’s a mirror image protein it should behave the same

    • MS: differences between DDD and DLD stereo protein chains should be from sterics

    • CC: agrees with proposal to not write in specific stereochemistry

    • MS: cyclic peptides, instead of being treated as small molecule, should be treated as proteins

    • DM: also agrees, no chiral smarts

  • Practical considerations

    • MG: how long do you think this will take?

    • CC: it’ll probably go faster with the restraints. We’re getting about 2000 optimizations per day. About 600 grid points. We want 26 side-chains, estimates ~50 days

    • DM: suggests more compute resources. How soon do we want this? Helps with juggling free vs paid compute

    • CC: next week or so

    • DM: suggests CC ping internal after meeting and ask to spin up more compute, possibly enlist Trevor Gokey

  • SB: this is all training data, right? What are plans for benchmarking

    • CC: looking at NMR observables for small peptides, which we have from LiveComs review, and work out which are most helpful for us

    • CC: will reach out to SB in January to start working on infrastructure needs with e.g. evaluator

    • SB: also need to work with the software scientists on this, will be a huge need

    • CC: current plan is to write input files for external program and get external software to run it

    • SB: also need to consider packaging and distribution, that might also need software scientist time

    • CC: simple plan is use shiftx for chemical shifts, think about how to improve on that later, but that’s an accepted standard

    • MS: suggests ML predictions of chemical shifts later, but for now shiftX as a benchmark. Mentions Andrew White as an interesting alternative

    • CC: Yes, shiftX will be easier to compare with existing benchmarks for now







Action items

Decisions