Virtual site perception

The ability to determine when and where to model electronic anisotropy in nonpolarizable forcefields is challenging. This is primarily due to the issue of transferability, since electrostatics is sensitive to which molecules are used to train, in addition to the quality of the structural data. Here, in this rough outline, we aim to discover when anisotropy should be modeled by inserting virtual sites in the SMIRNOFF-style force field. The rationale behind when to add anisotropy has been less explored in most work in the literature except for specific cases. For example, the halogen sigma hole is well known as well as the anisotropy of pyrodine. In order to design virtual sites for a fully transferrable force field, a general method that is compatible with the SMIRKS encoding of parameter assignment must be designed. This work aims to design a method to automatically determine where virtual sites should be included that improves the quality of fit in areas that can not be approached by other parameters, e.g. on-site charges and/or valence parameters.

Methods

The first task of this problem is to determine which SMIRKS pattern should be used. However, concurrently this SMIRKS pattern depends on the type of virtual site in addition to the parameters used to place the particle in space relative to the anchor atoms. In this case, we must first find a group of atoms across all molecules which appear to exihibit a similar pattern in anisotropy, then try to find a pattern that matches all groups.

Because each virtual site definition in the SMIRNOFF spec uses differing atoms and parameters, the first general task is therefore is deciding which virtual site definition is appropriate. Each site has the ability to span a certain space. For bond charge sites which place the charge along the axis defined by two atoms, it is only feasible to use this definition if the anisotropy exists along this axis. This can also be apparent from the fact that bond charge sites only use one spatial parameter i.e. the distance. This is also the case for trivalent sites. For a divalent site, there are two spatial parameters and therefore can be placed on a plane. Monovalent sites, using three parameters, are able to span Cartesian space.

Multipoles, specifically dipoles and quadrupoles, also have a set of axes, called a frame, similar to the virtual site definition. The molecular multiples can be partitioned onto each atom center using Distributed Multipole Analysis (DMA) from Stone. We therefore interpret the atomic multiples to be an indicator of local anisotropy, and use the dipole and quadrupole frames to place virtual sites. The type of virtual site used depends on how well the virtual site frames overlaps with the pole frames. For example, in water there are dipoles existing between the OH bonds. In this case, we can detect that a bond charge site would be appropriate because its axis (the OH bond) overlaps with the dipole axis.

The loss function to determine the overlap is the length of the difference vector between each pole axis and its projection onto the virtual site

(Insert equations)

noting that for a dipole there is one axis, and for quadrupoles there are three axes.

In addition to this loss function, we also introduce a preference in virtual sites since multiple sites may produce the same loss, e.g. a bond charge and a divalent could both describe the CC bond in ethane. In general, the preference is for sites which the least complexity: bond charge, divalent, trivalent, monovalent. Monovalent are last specifically because they can always produce a loss of 0, and therefore would always be preferred over the others.

Once the virtual site type is chosen, we initialize the spatial parameters to be the expected parameters for a test charge of 1, but threshholded if the parameters place the point outside of the molecular surface. The default of DMA is to solve the poles on each atom a radius of .65 (A?), so this is likely a hyperparameter to play with. We might want to threshhold the distance to this number.

DMA Dipoles of pyridine using B3LYP-D3BJ/DZVP. Colors of the frames are opposite of the monopole of the atom center. For example, the monopole of the nitrogen is negative, so the dipole points towards a positive test charge
DMA quadrupoles of pyridine using B3LYP-D3BJ/DZVP. Colors of the frames are opposite of the monopole of the atom center.

 

Details and Implementation:

The first goal will be fit just a divalent charge first to pyrodine, then start to automate the procedures to fit in a transferable way.

TODO

Priority 1: Fitting pyrodine

Code to perform the difference between a virtual site frame and the projection of a pole axis
Code to extract the DMA data from a (QCArchive) psi4 output into pol frame per atom
Basic parsing of raw dma.out for dipoles and quadrupoles to xyz
Prebake a divalent charge parameter for pyrodine in recharge (just use a CNC SMARTS)
Code to set angle and distance based on projection
Show that a CCC, HCC, HCN SMARTS inferior to CNC based on multipole overlaps and and ESP fit
Need to show that multipole overlap agrees most with a CNC SMARTS, meaning N has the largest anisotropy
From images above; easily shown by dipoles; carbon quadrupoles are larger but out of molecular plane
Incorporate into the pyridine example e.g. evaluate ESP fit per FF parameterization/optimization
Example psi4 input to run pyridine:
memory 600 mb molecule pyridine { 0 1 pubchem:pyridine noreorient nocom symmetry c1 } set basis dzvp optimize('scf') optimize('b3lyp-d3bj') grad, wfn = gradient('b3lyp-d3bj', return_wfn=True) gdma(wfn) dma_results = variable('DMA DISTRIBUTED MULTIPOLES') tot_results = variable('DMA TOTAL MULTIPOLES') pyridine.save_xyz_file("mol.xyz", True) dma_results.print_out() dma_results.save("dma.out", saveLowerTriangle=False) arr = dma_results.to_array(dense=True) print(arr)