Virtual Site notes

Expected release 8/24

Final tasks:

Add an “exclusion_policy” to top-level Virtual Site handler
naming? Policies up to “parents” are implemented
Remove parameters (InPlaneAngle in divalent vsite) which do not belong/do anything.
Flesh out the API for vsites (accessing particles, use of public/protected members)
_add_*_virtual_site from parameter handler
this seems like the only acceptable case. We are not keen on exposing an API to add virtual sites/modify the topology, so it will be allowed
Move the typing logic from forcefield to parameters.py::PH._add_parameters (inside the loop)
Write test cases
Particle iterators
virtualsite add parameter (in base ParameterHandler add_parameter; see tests/test_parameters:test_add_parameters)
Correct positions
Correct energy
Validation of offxml input (no angles in BondCharge, etc.)
Correct hierarchy in selection (last wins) based on names, smirks, and type
make sure create_openmm_system uses the molecule api to create oMM particles
make sure particles show up in the oMM when using the molecule API
Equality operators
Figure out the xmltodict error?
Merge vsite branch into master 8/26
Docstrings of all new functions added
Other documentation tasks?
‘__ all__’ ← put new objects in here
docs/typing.rst
In the release notes, add 5-15 sentences which highlight difficult edge cases (that we correctly handle)
Documentation by 8/31

Meeting notes:

  • We should allow multiple matches for virtualsites, and multiple of the exact same SMIRKS

  • When initializing a ForceField, should we deduplicate VirtualSites with same SMIRKS, type, angles, distances, chargeincrements?

    • We should not deduplicate VirtualSites during VirtualSiteType or VirtualSiteHandler initialization

  • Does OpenMM have a problem if two virtualsites are really close to each other (eg. do they cause an energy explosion?) Under what conditions does OpenMM automatically deduplicate or apply nonbonded exceptions between virtualsites?

    • Unresolved -- We should check

    • (2020/04/10 JW edit): Previous context – Recognized that this could be a problem, but didn’t provide a solution

  • Should we throw an exception if two virtualsites are defined on top of each other (same reference atoms, angles, distances) during system creation?

    • Yes, but add kwarg to create_openmm_system for `allow_overlapping_virtual_sites`

We should change the tagname of the “VirtualSite” elements to be distinct -- eg “BondTypeVirtualSite” etc. We should change the SMIRNOFF spec docs accordingly.

Notes from meeting today:

  • OpenMM Topology doesn't contain VSites, so we don't need to worry about them in OFFTop.from_openmm

  • If isomoprhism/deduplication gets complicated, we could remove VSites from the public OFFMol API, and only let them be added during parameterization

  • Should we provide a function to "label" an OFFTopology with virtualsites? This seems like a good idea, and it'd be most direct to repurpose create_openmm_system for this. create_openmm_system could return BOTH the resulting OMMSystem AND a modified copy of the input OFFTopology. This resulting OFFTopology would have COPIES of reference molecules and vsites with final charges. I'm not sure what would happen with constraints. Returning the resulting OFFTopology should be triggered by a new kwarg, return_topology=False, which will run return system, topology at the end of create_openmm_system is True. Should be trivial to add, since we already make a deep copy of the topology at the beginning of create_openmm_system.

  • When do we want to deduplicate SMARTS matches (by permuting tuples of match indices)? Is it absolutely necessary to add a unique option to virtualsites?

    • For BondCharge sites, it's clear that we do NOT want to attempt deduplication -- N2 should have symmetrically-applied vsites.

    • For DivalentLonePairs, deduplication would SOLVE unavoidable duplication issue with TIP4P, but would make TIP5P require two separate vsite SMIRKS (+N and -N degrees). So it seems that DivalentLonePairs should be deduplicated.

    • For TrivalentLonePairs, deduplication is definitely wanted, since it will otherwise unavoidably add 6 copies of the virtual site for NH3's lone pair. This deduplication might be performed by taking the match index tuple (a,b,c,d), and deduplicating permutations of b, c, and d. This... SHOULD be safe, though I wonder if there are pathological cases that would cause trouble here.

    • I'm not sure about deduplicating matches between DIFFERENT parameters. SMIRNOFF's "hierarchy" philosophy would make me think that we could add, for example, a "generic" trivalent nitrogen lone pair using N([*])([*])([*]), and then define a more specific lone pair for secondary nitrogen centers using N([C])([C])([*]) further down in the list, expecting the second to override the first when it applies. However, there are also other situations where I'd expect virtualsites to be additive. So, for now, the best way forward is probably to NOT attempt deduplication between different parameters, and just caution users to craft exclusive SMARTS if they want them to apply exclusively.

Immediate action items:

As the items indicate, deduplication depends on the vsite definition. Therefore the matching needs to include a parameter of whether to transform the dictionary keys. I have already massaged this to work, so it is close to being done. It appears difference exists whether the vsite is in the mirror plane or not. BondCharge and Monovalent have planes of symmetry between atoms, and so placing vsites on one side must be mirrored. where Trivalent and Divalent do not have symmetry between atoms, and the vsite does not need to be mirrored. Arbitrary vsites, such as a COM, are assumed to have c1 symmetry and don’t need to be mirrored (dedupe = True where all permutations are collapsed).

 

The physical representation of vsites dictates that no two sites shall overlap, since it is subject to 1/r potentials. It is simple to check if two vsites are defined exactly twice, but it generally possible to determine if two different (e.g. two monovalents with 0-1-2 and 0-1-3 definitions) define a vsite at the exact same point. These are dependent on the geometry, so r = 0 will only happen in specific configurations. It is possible to determine which combinations will always lead to r = 0, such as:

These can be explicitly checked, but will need to be checked if a hook to do this exists. Note that, for chargeincrements, it is probably best to ignore in uniqueness. Two sites that match any of the above but have different chargeincrements are considered duplicates and and error raised. Allowing multiple chargeincrements will create a complex “patch” behavior, and readability and reproducibility will suffer.

 

The normal parameters follow a hierarchy, where the most specific pattern matches first. Likely, this approach should be similar. Choose the parameter with the least number of wildcards (as simple as smirks str parsing for *, then breaking ties with ~). However, Jeff is in the mindset that vsites can be stacked, even if they are not unique. This will fail the above requirements for the above uniqueness strategy.

Matching procedure (proposed):

  1. Collect all sites which match the smirks, wildcard or otherwise. For e.g. BondCharge, do NOT deduplicate.

  2. Sort based on who “:1” is (separate the intent), this is two distinct set of vsites for steps 3-5

    1. Sort based on name

  3. Determine which sites are the most specialized (e.g. sort by atomic number of “:1”)

  4. Determine uniqueness (make sure none have const r=0) # Can it be removed?

  5. Apply the surviving sites (maybe the ones at the end of the list)

 

H2CBr
C:2-*:1

Br:1-*:2 d=-1

Br:1-C:2 bondcharge d=1

Does deduplication consider atom labels? [C:1]-[Br:2] vs [Br:2]-[C:1]. Would deduplication collapse these?

if yes: need to separate distinct “:1” cases, then deduplicate groups separately

if no: continue as normal

The idea of having an owning atom in the vsite definition is arbitrary (especially if just choosing the first atom), but simplifies quite a bit. It essentially appears to tackle the aspect of oriented smirks matching, since it puts an “origin” in each smirks.

Expanding 3 (checking specialization)
[C]-[Br]

Before deduplication, will get 0,1 and 1,0. Is it clear that both matches will be [C:1]-[Br:2]?

[*]-Br : will * always be *:1?

Misc. Notes:

Want to avoid patterns where we have explicit matching and anti-matching, e.g. C-Br and C-!Br since specializing wildcards in this scheme will be hard to maintain.

Matching regular torsions has a slot idea, which does not apply here.

We need to allow use of wildcards, which forces the need of a hierarchy. In the current FF, the hierarchy appears hand-made, and the last match wins. We shall avoid the case where specialized vsites are diffs to underlying vsites, and instead let the last match win. HOWEVER, we need the ability to specify multiple vsites on a single atom. This is at odds with the slot idea, since we do not know how many vsites shall be allowed.

OpenMM adds an exclusion to the vsite and the [arbitrarily chosen] first atom in the match. This means the forces between the “owning” atom and the vsite are not included. (Does this mean pushing on the vsite pushes the owning atom?)

Use cases:

  1. Virtual Site specialization

<vsite smirks=”Br-*” type='bondcharge' d=1.0 chargeincrement=.2>

<vsite smirks=”Br-C” type='bondcharge' d=.95 chargeincrement=.22>

This means we have a generic vsite for all Bromine, but we have specialized the term, improving the forcefield, by modifying the Br-C term. Therefore we want to replace the Br-* site with the Br-C site, where we change the distance and chargeincrement. This contrasts with the idea of stacking, where there would now be TWO vsites because of the distance. This is undesirable.

2. Multiple unique matches

<vsite smirks=”Br-*” type='bondcharge' d=1.0 chargeincrement=.1>

<vsite smirks=”Br-*” type='bondcharge' d=1.3 chargeincrement=.1>

<vsite smirks=”Br-C” type='bondcharge' d=.95 chargeincrement=.11>

<vsite smirks=”Br-C” type='bondcharge' d=1.2 chargeincrement=.12>

The Br-C types win, throwing out the wildcard sites. The two matches are checked to make sure they are not equivalent. They are not equivalent since the distances are different. Since they match on the same hierarchy level, apply both. This will result in two sites. If more specialize set comes in, throw out all Br-C sites and apply the more specialized versions.

3. Multiple duplicate matches

<vsite smirks=”Br-*” type='bondcharge' d=1.0 chargeincrement=.1>

<vsite smirks=”Br-*” type='bondcharge' d=1.3 chargeincrement=.1>

<vsite smirks=”Br-C” type='bondcharge' d=1.0 chargeincrement=.11>

<vsite smirks=”Br-C” type='bondcharge' d=1.0 chargeincrement=.12>

The winning vsites are equivalent, and only the last one will be applied. It is permissible to throw an error or warning of this situation. Note that chargeincrement is not used for uniqueness. It is important that both chargeincrements are not applied.

4. Symmetric matching

<vsite smirks=”[Br:1]-[*:2]” type='bondcharge' d=1.0 chargeincrement=.1>

<vsite smirks=”[C:2]-[*:1]” type='bondcharge' d=1.0 chargeincrement=.1>

(With proposed matching procedure above) These would be put into the same grouping, since the “:1” match Br. They would be identified as the same parameter, and only one match (or

As a note, a smirks like C-[*:1] is complicated. It is saying everything matching to a carbon has some radially extending hole on each atom. Since the bond between C-* is variable, defining the distance is quite loose. In this particular case, we assume Br:1-* is more specialized, and should replace that single match. Why is it more specialized? It is because the origin is not a wildcard… hmm. We need to check whatever [.+:1] was (or the first index in the tuple if no match) to form the hierarchy. Thus, we compare * with Br, and say that match wins.

Both sites match C-Br. This case is still up for discussion.

The general desired use case is to apply both, since their intents are clearly different based on smirks. Both will be presented during the processing, so will be pretty much indistinguishable with current architecture. Are they unique? Yes even though d and ci are same. We need a specialization scheme.

With the labels, these have the same intent. For BondCharge, label 1 specifies the origin, and these are therefore the same vsite. Apply one. If they are different (distances not equal), we need a hierarchy

Without the labels, they are different since the origin is not the same.

Thus we get conflicts if the labels are the same. Perhaps this is how we should specialize.

[Br:1]-[C:2] == [Br]-[C] == [Br:1]-[C] == [Br]-[C:2] and all wildcard incantations matching the corresponding atoms

 

Dangling items and notes:

5/08

“Index” formalism with respect to virtualsite atom indexing

Example:

I have a hierarchical definition of vsites in my FF. I put a generic one for nitrogen centers on top, and a more specific one for a nitrogen center with one chlorine below. How do I specify that

  1. I want the second to override the first?

  2. I want them BOTH to apply?

    1. Just give it a different name –> done

 

FF:

[*:1][N:2]([*:3])([*:4]) name="VS1" index=0 [*:1][N:2]([Cl:3])([*:4]) name="VS1" index=0 ci1=0.05 ci2=0.1 ci3=-0.05 ci4=-0.1

Molecule is (in order of atom index) Cl-N(-H)(-H)

Cl-N(-Cl)(-H)

First one SMARTS matches: [1,2,3,4],[1,2,4,3],[3,2,1,4],[3,2,4,1],[4,2,1,3],[4,2,3,1]

Second SMARTS matches: [3,2,1,4],[4,2,1,3]

 

 

Match ordering:

C-C-C-N

SMARTS [C:1]-[C:2]-[C:3]-[N:4] – > find_matches → (1,2,3,4)

SMARTS [C:4]-[C:3]-[C:2]-[N:1] – > find_matches → (4,3,2,1)

SMARTS [C:2]-[C:1]-[C:4]-[N:3] – > find_matches → (2,1,4,3)

 

Midway though a Topology:

C10-C11-C12-N13

SMARTS [C:1]-[C:2]-[C:3]-[N:4] – > find_matches → (10,11,12,13)

Can use topology.atom(N).atom.molecule_atom_index to translate from Topology atom index to reference molecule atom index

 

How to ensure that indexed chargeincrements line up with atom match tuples?

in the smarts [C:1]-[C:2]-[C:3]-[N:4], [C:1] will always match up with chargeincrement1.

So, if the molecule is N1-C2-C3-C4, then running find_matches with the above SMARTS will return {some_tuple_we_can_throw_in_the_trash: a_chemical_env_match_object}. The tuple that we care about is a_chemical_env_match_object.environment_match.topology_atom_indices. THIS will contain the match tuple (4,3,2,1), which correctly indicates that chargeincrement1 should be applied to topology particle 4

 

 

The virtualsite should do ExcludeWith applied to AT LEAST the “origin” atom. It would be nice to do it with all tagged atoms, but this gets complicated if we’re 1-3 according to one tagged atom and 1-4 from another.

The virtualsite should exclude nonbonded interaction (coulomb and vdW) completely with all tagged atoms

 

 

4/31

Talking with Jeff; need to order by topology index since wildcards can flip mapped indices and change index ordering. The input could be something like index=”(0,1,2,3)” or “1-2-3-4” or just keep index as index=0 to specify first ordering

Checked.. it does this anyways

4/29

Use an index key in the elements to select the tuple permutation. Uniqueness is defined by (type, index, name)

4/17

We need to assume where the owning atom will be, since weird cases can pop up.

BondCharge: owning atom is always first

Monovalent: owning atom is always first, middle atom defines the angle origin

Divalent: owning atom is always second, second atom defines the angle origin

Trivalent: owning atom is always second: second atom defines the angle origin

So we need to ignore labels. A weird case is Trivalent, where we put [H:1]-N-(H)-H. Since we deduplicate, only one H would be applied.

So the take away is that it seem reasonable to ignore atom labels for now.

C:3-N:2#N:1-C

1,2

2,1

HCN

Monovalent smirks HCH

H:1-C:2-H:3

 

 

4/10

JW: If users wants them to be exclusive, they should say:

<vsite smirks=”Br-[!C]” type='bondcharge' d=1.0 chargeincrement=.1>

<vsite smirks=”Br-C” type='bondcharge' d=1.0 chargeincrement=.2>

JW: I’ve changed my mind. There’s no need to EVER constrain the user to have exactly one SMARTS match for a virtualsite. Both Override-land and Combine-land don’t have this constraint, and have sensible behavior.

Molecule = CN(Cl)(Br)

*-N(*)(*) vsite params

*-N(Cl)(*) vsite params

*-N(Br)(*) vsite params

*-N(Br)(Cl) vsite params <--- Needed only if we are in override-land, AND we deduplicate based on “sorted” match tuples

If the above were an Improper section in the FF, the parameters would be competing to apply in three slots:

1-2(-3)(-4)

1-2(-4)(-3)

3-2(-1)(-4)

(You start with 6 matches, and delete any where tuple[0] > tuple[3])

Do we want the same for vsite deduplication?

If two virtual sites with the same (def?) atom tuple and geometric parameters match to a topology:

  • Are their charge increments combined?

  • Does the second override the first?

If we want to allow two BondChargeVirtualSites to match C-Br, does this mean that

  • There is no “sensible” way to enforce a “max” number of slots for BondChargeVirtualSites? Yes

  • Does this extend to all types of virtualsites? Yes

In override-land, if we want a C-Br parameter to override a *-Br parameter, is this possible?

  • Does this ALWAYS require looking at factors OTHER than atom tuple?

In override-land, if C-Br overrides *-Br, but C-Br has different DISTANCES, should the *-Br sites remain, or be deleted?

  • TG: In EITHER land, this means the *-Br should be deleted, and only the C-Br should remain (assuming it’s higher-priority in the hierarchy)

Is it possible to have a FF with wildcard vsites, and then other parameters that “modify” or “Specialize” on top of the wild card vsites?

  • JW: No. As soon as the “Specilaized” vsite has a different GEOMETRY, then it becomes impossible to know what a user intends by “deduplication” with the wildcard one (should it be REPLACED? should some parameters be COMBINED and the geometry updated? Should the wildcard one be REMOVED altogether?). Any attempt we make on this will require meaningfully comparing SMARTS, and that is impossible.

  • TG:

<vsite smirks=”Br-*” type='bondcharge' d=.8 chargeincrement=.2>

<vsite smirks=”Br-C” type='bondcharge' d=1.0 chargeincrement=.2>

Results in one vsite, since it is implied that Br-C is a better vsite than the wildcard counterpart.
<vsite smirks=”[Br:1]-*” type='bondcharge' d=.8 chargeincrement=.2>

<vsite smirks=”[Br:1]-*” type='bondcharge' d=-0.2 chargeincrement=.2>

<vsite smirks=”[Br:1]-C” type='bondcharge' d=1.0 chargeincrement=.2>

<vsite smirks=”[Br:1]-C” type='bondcharge' d=-0.3 chargeincrement=.2>

<vsite smirks=”[*:1]-C” type='bondcharge' d=-0.25 chargeincrement=.2>

JW: Critical problem with override-land is: It’s impossible to distingush between ACTUALLY wanting two bondchanrgevirtualsites on C-Br, vs. wanting to voerride *-Br with a more specific C-Br param

JW: I believe in combine-land

In bondhandler, how does Br-* and Br-C get resolved? Does C-Br and Br-C crash?

 

Override-land (according to hierarchy)

Combine-land (no hierarchy, multiple vsites can be added for an atom tuple)

 

Override-land (according to hierarchy)

Combine-land (no hierarchy, multiple vsites can be added for an atom tuple)

Supports wildcard base, and modification on top?

Y

Y

Requires comparing SMARTS?

 

 

Requires comparing “parent atom”?

N/A

N/A

Requires comparing raw atom tuple?

Depends on number of “slots” we allow

N

Requires comparing “deduplicated” atom tuple?

Depends on number of “slots” we allow

Y

Unique if chargeincrement changed?

 

 

Unique if SMARTS changed?

 

 

Unique if radius changed?

 

 

Unique if epsilon changed?

 

 

Unique if angle/distance changed?

 

 

 

 

 

 

 

 

Generally, is there ANY scheme where we can support vsites defined with wildcards, but are overrideable by more specific parameters.

  • Is there ANY scheme that satisfies the above, and DOES NOT rely on “stacking” or “adding” overlapping virtual sites?

For analytical differentiation, it ISN’T the end of the world if some properties of a vsite depend on TWO FF parameters. The jacobian can handle this.

 

JW: Preferred behavior:

  • Add vsites together This is actually a really bad idea, will break optimization and rely on “comparing geometry”, which is sketchy

  • OpenMM crashes because two vsites are on top of each other

  • OpenMM computes energies weirdly but correctly for some reasonable definition of “correct”

JW: Not preferred behavior:

  • We silently delete one of the vsites

JW/20-04-10