Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt

Initial plan and approach: unconstrained fit to existing data

👀 Overview

Summary

This initial approach started from the scripts and data left by Simon. Unfortunately, re-training vdW properties showed worse performance for Br molecules and pyridine molecules. Some experimentation with splitting out virtual sites showed that splitting the N vdW term into two improved performance. On examining the charges, it was observed that the charge of the pyridine virtual site was >1 in magnitude. Digging into it, the documentation left by Simon was slightly incomplete and resulted in only training to some of the data, so a new re-fit and approach was started in Approach II.

GitHub repo/branch

Status

Status
colourRed
titleWon't progress

Table of Contents
minLevel2
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

\uD83D\uDEA9 Milestones and metrics

Stage

Milestone/Benchmark

Contributors

Deadline

Status

Train virtual sites and BCCs to existing HF/6-31G* data

Re-fit 2.1.0 BCCs to ESP data

Lily Wang

July 2023

Status
colourGreen
titleCompleted

Re-fit 2.1.0 BCCs and virtual sites to ESP data

Lily Wang

July 2023

Status
colourGreen
titleCompleted

Re-fit valence and vdW parameters to condensed phase properties

Re-fit FF terms for no-vsites-candidate

Lily Wang

Aug 2023

Status
colourGreen
titleCompleted

Re-fit FF terms for vsites-candidate

Lily Wang

Aug 2023

Status
colourGreen
titleCompleted

Benchmark

Improved or equivalent performance for molecules with virtual sites added (Cl, Br, pyridines) on training data

Lily Wang

Sept 2023

Status
colourRed
titleFailed

Experiment with vdW site splitting to see if that improves benchmarks

Lily Wang

Sept 2023

Status
colourGreen
titleCompleted

Experiment with fitting to dimer energies

Lily Wang

Sept 2023

Status
colourGreen
titleCompleted

Benchmark

Improved or equivalent performance for molecules with virtual sites added (Cl, Br, pyridines) on training data

Lily Wang

Sept 2023

Status
colourRed
titleFailed

📊 Progress and findings

Training data

The training data for each virtual site is as follows:

  • C-Cl (3280 training ESPs)

    • "[#6A:2]-[#17:1]"

    • "[#6a:2]-[#17:1]"

  • C-Br (2149 training ESPs)

    • "[#6A:2]-[#35:1]"

    • "[#6a:2]-[#35:1]"

  • Lone pair off N (796 training ESPs)

    • "[#6X3H1a:2]1:[#7X2a:1]:[#6X3H1a:3]:[#6X3a]:[#6X3a]:[#6X3a]1"

Training to QM ESPs

...

The training appears to converge after ~1000 epochs or so.

Screenshot 2024-10-11 at 1.15.10 pm.png

Training to condensed phase properties

Molecules with Cl improve with retraining to QM ESPs, and improve minorly with additional virtual sites.

...

Unfortunately, properties with bromine and pyridine virtual sites have a decrease in performance.

Splitting out vdW terms

Splitting the N vdW term into two, and fitting to dimer energies resulted in improved performance on training data.

...