Operations on arbitrary graphs

Suppose we have two angles in two different environments, meaning two graphs. These graphs have metadata on the nodes and edges, in my case, a bit vector that encodes a SMARTS pattern.

Since we will be iterating over these, we need to align them in some way. This is necessary for mapping bits between one graph to the other, when we e.g. union or intersect them.

When the number of nodes was known, for example when when an angle was always 3 vertices, aligning was easy since we check ABC or CBA to the reference. We need something looser than isomorphism to perform the alignment.

 

 

Let us take the example of taking the union of the angle of graph X and the angle of graph Y. ABC and GHI are the “roots”, meaning alignment will take precedence. Then, for the intersection, we first align

ABC to GHI or IGH, whichever gives the best mapping in terms of bit vector overlap. Let us assume that ABC maps to GHI. Then we map F to either L or M, and DE to JK or KJ. We also make a ghost vertex on X that maps to M. We know have a one-to-one mapping between X and Y. The operations are now straightforward: the resulting graph of a union is a union between the mapped vertices and edges, and similar to intersection, and all other operations. The issue then is finding this one-to-one mapping.

In the mapping algorithm, we give precedence to the closest vertices to the core. This means if F has vertices, will consider those mappings after all mappings to the current representation (X and Y).

Perhaps the first step is to generate all of the vertices, such that all graphs in considering are “syntactically similar,” meaning the graphs have the same number nodes, edges, and topology. They would be isomorphic in this respect. Then, we do the branched scoring.

So this means instead, we do a greedy, recursive approach (P.2). If we find that AG+CI > AI + CG, then that means A maps to G, and C maps to I, and vice versa. If they are equal, then we look to the successors to break ties. In this case, we know have to select the best score in each sub block, then the score of the sub blocks. In this case, we evaluate:

  1. score AG = max(FL + _F, FM + _L) = max(FL, FM)

  2. score AI = max(FJ + _K, FK + _J) = max(FJ, FK)

  3. score CG = max(DL + EM, DM + EM)

  4. score CI = max(DJ + EK, DK + EJ)

Then, we calculate the final score as

  1. max(AG + CI, AI + CI) = max( max(FL, FM) + max(DL + EM, DM + EM), max(FJ, FK) + max(DJ + EK, DK + EJ) )

Now, the _ signifies a blank vertex, which means the score will be 0 (no intersection; X | null). We can then zero out the corresponding row and simplify the above equations.

In this case, if the score is still a tie, then we say that the alignments are the same, and we take the first. In general this should not happen as we take out the trees as far as possible, which eventually can take the entire molecule, which creates a unique tree for that specific molecule.

In the case where we have another layer, or there are more than two successor atoms, then will generate a R,R matrix at layer P, where R = max(# atoms at layer P of graph X, # atoms at layer P of graph Y).

 

We can always take the layers out to the maximum to find the complete one-to-one mapping, which is needed for the graph operations. In this case, we build every matrix layer, then determine which blocks to use based on the results of the blocks higher in the hierarchy. The final result then is the mapping, and the operations can be do in a direct manner. For example, if the mapping at layer 0 is [A:G, C:I], then the next layer mapping is either one of the two choices depending on the scores, whether F:L or F:M, and whether D:J, E:K or D:K E:J.

Finally, because it was not stated explicitly, the values of the matrix are the bit vector overlaps, or intersection, e.g. A & G (after aligning the bitvector first). We have to take into account both the bond info and the atom info during this operation. This means we would be checking -F to -L or -M, for example, meaning the sum of the intersection between the vertices and the edges.

 

Finally the rules for the various internal coordinates are as follows:

  1. vdW: 1:1

  2. bond: either 1:2 or 2:1

  3. angle: 2:2, 1:3,3:1 or 1:1,3:3

  4. torsion: 1:1,2:2,3:3,4:4 or 1:4,2:3, 3:2,4:1

  5. outofplane: a few! 2:2 and 1:{3,4} and the remaining permutations