VS1 Approach I: initial plan
Initial plan and approach: unconstrained fit to existing data
Overview
Summary | This initial approach started from the scripts and data left by Simon. Unfortunately, re-training vdW properties showed worse performance for Br molecules and pyridine molecules. Some experimentation with splitting out virtual sites showed that splitting the N vdW term into two improved performance. On examining the charges, it was observed that the charge of the pyridine virtual site was >1 in magnitude. Digging into it, the documentation left by Simon was slightly incomplete and resulted in only training to some of the data, so a new re-fit and approach was started in Approach II. |
---|---|
GitHub repo/branch |
|
Status | Won't progress |
Milestones and metrics
Stage | Milestone/Benchmark | Contributors | Deadline | Status |
---|---|---|---|---|
Train virtual sites and BCCs to existing HF/6-31G* data | Re-fit 2.1.0 BCCs to ESP data | @Lily Wang | July 2023 | Completed |
| Re-fit 2.1.0 BCCs and virtual sites to ESP data | @Lily Wang | July 2023 | Completed |
Benchmark | Improved or equivalent performance on ESPs for test set | @Lily Wang | July 2023 | PASSED |
Re-fit valence and vdW parameters to condensed phase properties | Re-fit FF terms for no-vsites-candidate | @Lily Wang | Aug 2023 | Completed |
| Re-fit FF terms for vsites-candidate | @Lily Wang | Aug 2023 | Completed |
Benchmark | Improved or equivalent performance for molecules with virtual sites added (Cl, Br, pyridines) on training data | @Lily Wang | Sept 2023 | Failed |
| Experiment with vdW site splitting to see if that improves benchmarks | @Lily Wang | Sept 2023 | Completed |
| Experiment with fitting to dimer energies | @Lily Wang | Sept 2023 | Completed |
Benchmark | Improved or equivalent performance for molecules with virtual sites added (Cl, Br, pyridines) on training data (simulation properties) | @Lily Wang | Sept 2023 | Failed |
Benchmark | Solvation free energies | @Lily Wang |
|
|
Progress and findings
Training data
The training data for each virtual site is as follows:
C-Cl (3280 training ESPs)
"[#6A:2]-[#17:1]"
"[#6a:2]-[#17:1]"
C-Br (2149 training ESPs)
"[#6A:2]-[#35:1]"
"[#6a:2]-[#35:1]"
Lone pair off N (796 training ESPs)
"[#6X3H1a:2]1:[#7X2a:1]:[#6X3H1a:3]:[#6X3a]:[#6X3a]:[#6X3a]1"
Training to QM ESPs
The training appears to converge after ~1000 epochs or so.
Training to condensed phase properties
Molecules with Cl improve with retraining to QM ESPs, and improve minorly with additional virtual sites.
Unfortunately, properties with bromine and pyridine virtual sites have a decrease in performance.
Splitting out vdW terms
Splitting the N vdW term into two, and fitting to dimer energies resulted in improved performance on training data.