Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Driver

Alexandra McIsaac Jennifer A Clark

Approver

Lily Wang Brent Westbrook (Unlicensed)

Contributors

Other stakeholders

David Mobley , Michael Shirts , Daniel Cole

Objective

A neural network charge model that can assign conformer-independent charges to both small molecules and large systems, at a higher level of theory than AM1BCC

Time frame

?Genetech, Chodera Lab

Objective

Provide Chodera lab and Genetech group with QM data necessary for training a machine learned forcefield.

Time frame

12/01/2024 - 12/01/2025

Key outcomes

A neural network charge model dataset that covers:

  • Is trained on data with a higher level of QM theory than AM1-BCC, with polarization effects from a solvent model

  • Can accurately assign charges to small molecules and large systems at a reasonable speed

  • Assigns charges that perform better in simulation than AM1-BCC

  • Corrects issues with sulfur and phosphorus charges

A force field incorporating:

  • NAGL2 charges

  • re-trained vdW terms

  • re-trained valence terms

Key metrics

  • Good reproduction of the underlying data defined as equivalent or better testing error on ESPs, dipoles, and quadrupoles at the NAGL2 level of theory, compared to NAGL’s testing error on AM1BCC

  • Improved performance on “real-world” benchmarks compared to NAGL/AM1BCC-ELF10 (e.g. solvation free energies, protein-ligand benchmarks, or other similar targets), especially for hypervalent atomsRelevant levels of theory

  • Support for metal centers of interest

  • Coverage of ligand chemical space

  • QM output with properties of interest

Key metrics

  • ?

Status

Status
colourYellow
titleIn progress

GitHub repo

Slack channel

https://openforcefieldgroup.slack.com/archives/CDR1P66Q2C085GQ8BCNB

Designated meeting

TM FF fitting meetingMeeting

Released force fielddatasets

Publication

\uD83E\uDD14 Problem Statement and Objective

AM1-BCC charges are trained to reproduce RESP charges, which are calculated at a low level of QM theory (HF/6-31G*) and rely on that theory level’s overpolarization to fortuitously model charge polarization in solution. The level of theory is particularly poorly suited for sulfur and phosphorus, which can be hypervalent, as well as some other functional groups. Additionally, it has been shown that HF/6-31G* does not consistently overpolarize charges by the same amount in every system, and within a given system, it erroneously polarizes both solvent-accessible and buried atoms by the same amount. These issues with polarization become more problematic the larger the simulated system is, causing more problems for large systems than small molecules.

In order to accurately model electrostatics, we wish to train a graph neural network charge model which solves these problems. We will train the GNN to a higher level of QM theory, to more accurately capture the electrostatics of complicated systems like hypervalent atoms. We will model the effects of solvent polarization directly by using a solvent model.

🎯 Scope

...

Must have:

...

  • Neural network charge model that performs better than or equivalent to AM1BCC-ELF10 on very small molecules, small molecules, and proteins, lipids, and nucleic acids

  • Minimum element set includes all currently covered atoms

  • Charge assignment must scale better than AM1-BCC

  • Assigned charges must reproduce QM ESPs and dipoles better than NAGL1/AM1-BCC

  • Assigned charges must reproduce “real world” benchmarks like solvation free energies and protein-ligand binding better than NAGL1/AM1-BCC

  • Must provide reasonable/physical charges for “buried atoms” e.g. atoms that are not solvent accessible and often are assigned unphysical charges with unrestrained ESP fitting methods

...

Nice to have:

...

  • Expand element coverage to include B, Si, maybe metals?

  • Incorporating virtual sites

  • Confidence metric returned directly by neural network

...

Not in scope:

...

🎯 Scope

Must have:

  • Dataset at agreed upon model chemistry (suggest BP86 / def2-TZVP for primary and B3LYP-d3bj)

  • Support for metal centers of primary interest: Pd, Fe, Zn, Mg, Cu, Li
    with organic compound elements: C, H, P, S, O, N, F, Cl, Br

  • Coverage of ligand chemical space

  • Coverage of charged molecules {+1, 0, −1}e

  • QM output with properties such as: energies, forces, partial charges, multipole moments

Nice to have:

  • Dataset with model chemistry overlapping with SPICE (i.e., ωB97M-D3BJ/def2-TZVPPD) and OpenFF (i.e., B3LYP-D3BJ/DZVP) standards.

  • Support for metal centers of secondary interest: Rh, Ir, Pt, Ni, Cr, Ag
    Better: Make it element agnostic

  • Coverage of charged molecules {+3, +2, +1, 0, −1, −2, −3}e etc.

  • Desiderata: High-spin Fe complexes (e.g. up to 5/2)

  • Desiderd: higher charge states

  • QM output with properties such as: atomic spin density, orbital energies, electronic structure of complexes

Not in scope:

  • Organometallic complexes with multiple metal centers

⚙️ Project Approaches

Child pages (Children Display)
depth1
allChildrentrue
style
sortAndReverse
excerptTypesimple
first0

📖 References