2021-08-05 Meeting notes

Date

Aug 5, 2021

Participants

  • @Chapin Cavender

  • @Pavan Behara

  • @Simon Boothroyd

  • @Jeffrey Wagner

  • @David Mobley

  • @Michael Gilson

  • @Lily Wang

Goals

  • Progress on LiveCoMS review

  • Pilot dataset of dipeptide 2-D TorsionDrive

  • Plans for future QC datasets

Slides

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

LiveCoMS review

@Chapin Cavender

  • Emails sent to existing and new contributors inviting them to:

    • Edit outline to include key references and specific datasets

    • Volunteer to contribute text

  • Update on responses

  • CC – No responses from most participants.

Dipeptide 2-D TorsionDrive

@Chapin Cavender

  • Pilot dataset to test feasibility of 2-D TorsionDrives for dipeptides

  • Generated all rotamers of 27 dipeptides (Ace-X-Nme)

    • CC – Used OpenEye for rotamer generation

      • JW – Is there an open-source alternative for the rotamer database and rotamer generation? We should keep our eyes open for this, but this is complex enough that it’s not essential

  • 2-D TorsionDrive to scan phi and psi for:

    • Alanine dipeptide

    • Two rotamers of tryptophan dipeptide

    • CC – Working with Dotson to prepare this submission to QCA, will run on dedicated cluster in San Diego (Triton Shared Computing Cluster)

Future QC datasets

@Chapin Cavender

  • Compare 2-D TorsionDrive to grid optimization (grid search with fast method followed by constrained optimization with accurate method)

    • JW – Some technical/theoretical/organizational issues with ANI implementation, I can’t put a timeline on resolution so I wouldn’t rely on this.

    • DM – Maybe XTB?

    • CC – Has anyone does this before?

      • SB – JMaat kinda did 2D grid optimizations but not supported by QCSubmit yet.

      • DM – Haven’t done large-scale stuff on this topic before. But it wasn’t the same sort of torsion scan

      • DM – Cerutti writeup may have useful input here. Conceptual, not necessarily implementation-level

      • CC – Agree. We’re also able to explore different strategies by doing the same tasks of work with different levels of theory/other settings that affect runtime/possible accuracy

  • What should we include in the training dataset? We want to construct a hierarchy of high priority (small) to low priority (large) datasets.

    • How many side chains in QC molecules?

      • MG – The core idea seems to be that, if you want to have each residue “experience” every other residue before and after it, you have to do 8000 jobs. This can be reduced a lot by using some sort of “representative” of each neighboring residue, but data to select those representatives would be a project unto themselves. It seems like Cerutti had made this decision.

      • MG – There’s two levels of sensitivity, where one level is that there’s “big sentisitivty”, where we get substantially different results out, and the other is “limited sensitivity”, where we get numerical differences but they’re dependent on the other parameters.

      • CC – There are surveys of xtal structures, where they cluster the ramachandran plots of each AA, and get 6 clusters. Things like glycine, things that precede a proline, etc.

      • MG – For steric-likely AAs, like beta branching, if you see a cluster in the xtal data/ramachandran plots, do you need special terms for that, or will the sterics drive the needed change?

      • CC – Simmerling basically argues that CMAPS were still necessary here, meaning that there is a need for different parameters.

      • MG – Well, one issue is “should we use CMAPS?” and the other one is “should we use different torsion types?”

        • CC – I don’t think that CMAPS fully explore the idea of perceiving whether a 1D torsion can accurately describe the energy surface

        • MG – I also wonder whether torsion splitting for beta branching could ALTERNATIVELY be fixed by adjusted LJ parameters.

      • JW – It’s possible that a large region of sequence space would lead to proton transfers, and could be excluded or downweighted.

        • MG – Would solvent stabilize these in real life?

        • JW – Yeah, it would, good point. Unfortunately we have problems with solvent in QM, so these will likely have to be in vaccuum.

      • MG – I doubt that charges on distance sidechains would affect the backbone torsions

      • CC – The dielectric environment might solve this

      • MG – The good news is that there aren’t often changes buried inside proteins, the bad news is that when they are there, they’re important.

    • Which dihedrals to scan?

    • How many rotamers per side chain?

    • For tetrapeptides Ace-X-Y-Z-Nme

      • Which side chains should be included as central residues (Y)?

        • CC – We could base this off ff19SB paper CMAP groupings (Table S3)

          • MG – These seem arbitrary, and I’m not sure that I agree with all of them.

          • DM – Can talk to the lead author, Chuan Tian - He’s on our advisory board

        • CC – I think I’ll follow that scheme closely, but maybe will put HIS into its own group

      • Which side chains should be included as flanking residues (X and Z)?

        • CC – I’m not sure that I followed Cerutti’s logic, I’d suggest 4 flanking residues (val, ser, asp, lys) and maybe also PRO and a beta branched one

          • MG – Maybe also glycine

        • JW – Would we expect to see energetics of neighboring proline and gly reflected in their xtal ramachandran plots? Or might the xtal plots be kinda evolution driven, where the backbone looks a certain way because nature “wants” to end a helix there

          • CC + MG – Agree, the only way to know will be to generate the set

      • CC – How much should we sample sidechain rotamers? I’m inclined to not worry about them in the first pass. We can base a lot of decisions on the dipeptide results

        • MG – Agree

        • DM – Agree, it’s probably a good idea to keep a “bucket of idea I could study”, but focus on the minimum viable product.

Action items

Decisions