2021-07-20 Cavender/Behara/Gokey/Chodera check in

Participants

  • @Chapin Cavender

  • @Pavan Behara

  • @Trevor Gokey

  • @John Chodera

Goals

  • Review OpenFF compute resources with Chapin

  • Review QCA datasets with Chapin

  • Discuss feasibility of future peptide datasets

Discussion topics

Item

Notes

Item

Notes

Compute resources

  • What are OpenFF’s available compute resources? Are there shared allocations on e.g. XSEDE or is everything run on local/institutional resources?

    • Local resources, most datasets finish in < 1 week

    • Entos OrbNet Denali training dataset (available in QCEngine soon)

  • For shared resources, what is the process for approving for QM calculations?

Status of peptide datasets

  • Peptide unconstrained Optimization dataset. Looks like 6689/6709 complete from this PR.

  • Peptide TorsionDrive dataset. Looks like 823/845 complete from this PR.

  • PEPCONF dataset. Looks like 2892/7560 complete from this PR.

    • Need additional compute spec for second-order SCF methods for many calculations

  • Is there documentation on how the conformations were chosen for Dave Cerutti’s datasets?

    • From mdgx, not reproducible

  • Are we planning to troubleshoot errored calculations or are these datasets complete enough as is?

    • Small error rate is expected and acceptable

Expense of QC datasets

  • What is the expected cost (in wall time or core-hours) of each conformation in an Optimization dataset and a 1-D TorsionDrive dataset? How does cost scale with number of atoms or electrons?

    • 25 heavy atoms, 700 torsions, completed in ~1 month

    • 140 optimizations per TorsionDrive for large molecules

  • How do you decide the number of conformations per molecule?

    • 10 conformations per molecule

  • What is the cost of a 2-D TorsionDrive? Is a 2-D TorsionDrive with spacing N-by-N equivalent to N 1-D TorsionDrives with spacing N?

    • 1-4 interactions complicate optimization trajectories

  • What is the expected cost of including an implicit solvent model, e.g. PCM?

    • Very slow in psi4; including polarizability in MM model may be more expedient

Feasibility of peptide datasets

  • 20 canonical amino acids + 7 alternate protonation/tautomers

    • Pilot datasets with ANI-2X, few days wall time

    • New version of ANI from Simon

  • Optimization datasets for PTMs

    • Complete set of filtered SMILES strings from Ligand EXPO

  • 1-D TorsionDrives of backbone torsions and χ1 for all dipeptides

  • 1-D TorsionDrives of backbone torsions and χ1 for subset of Ace-X-Y-Z-Nme tetrapeptides

  • 2-D TorsionDrives of backbone torsions for subset of dipeptides

Action items

Decisions