VS1 Approach I: initial plan

 

Initial plan and approach: unconstrained fit to existing data

Overview

Summary

This initial approach started from the scripts and data left by Simon. Unfortunately, re-training vdW properties showed worse performance for Br molecules and pyridine molecules. Some experimentation with splitting out virtual sites showed that splitting the N vdW term into two improved performance. On examining the charges, it was observed that the charge of the pyridine virtual site was >1 in magnitude. Digging into it, the documentation left by Simon was slightly incomplete and resulted in only training to some of the data, so a new re-fit and approach was started in Approach II.

GitHub repo/branch

 

Status

Won't progress

 Milestones and metrics

Stage

Milestone/Benchmark

Contributors

Deadline

Status

Stage

Milestone/Benchmark

Contributors

Deadline

Status

Train virtual sites and BCCs to existing HF/6-31G* data

Re-fit 2.1.0 BCCs to ESP data

@Lily Wang

July 2023

Completed

 

Re-fit 2.1.0 BCCs and virtual sites to ESP data

@Lily Wang

July 2023

Completed

Benchmark

Improved or equivalent performance on ESPs for test set

@Lily Wang

July 2023

PASSED

Re-fit valence and vdW parameters to condensed phase properties

Re-fit FF terms for no-vsites-candidate

@Lily Wang

Aug 2023

Completed

 

Re-fit FF terms for vsites-candidate

@Lily Wang

Aug 2023

Completed

Benchmark

Improved or equivalent performance for molecules with virtual sites added (Cl, Br, pyridines) on training data

@Lily Wang

Sept 2023

Failed

 

Experiment with vdW site splitting to see if that improves benchmarks

@Lily Wang

Sept 2023

Completed

 

Experiment with fitting to dimer energies

@Lily Wang

Sept 2023

Completed

Benchmark

Improved or equivalent performance for molecules with virtual sites added (Cl, Br, pyridines) on training data (simulation properties)

@Lily Wang

Sept 2023

Failed

Benchmark

Solvation free energies

@Lily Wang

 

 

Progress and findings

Training data

The training data for each virtual site is as follows:

  • C-Cl (3280 training ESPs)

    • "[#6A:2]-[#17:1]"

    • "[#6a:2]-[#17:1]"

  • C-Br (2149 training ESPs)

    • "[#6A:2]-[#35:1]"

    • "[#6a:2]-[#35:1]"

  • Lone pair off N (796 training ESPs)

    • "[#6X3H1a:2]1:[#7X2a:1]:[#6X3H1a:3]:[#6X3a]:[#6X3a]:[#6X3a]1"

 

Training to QM ESPs

The loss function over epochs

The training appears to converge after ~1000 epochs or so.

 

Screenshot 2024-10-11 at 1.15.10 pm.png
The pyridine charges that result from the fitting.

 

Training to condensed phase properties

Molecules with Cl improve with retraining to QM ESPs, and improve minorly with additional virtual sites.

Unfortunately, properties with bromine and pyridine virtual sites have a decrease in performance.

 

Splitting out vdW terms

Splitting the N vdW term into two, and fitting to dimer energies resulted in improved performance on training data.