Using Espaloma to discover areas to improve parameters

Background and goal

Espaloma trains a graph neural network (GNN) to generate values for the traditional parameters within a molecular dynamics force field: bond lengths, force constants, etc. It can generate both bonded and non-bonded terms, such as bonds, angles, dihedrals, impropers, partial charges, and Lennard Jones parameters. It has shown good performance on free energy benchmarks.

While OpenFF has yet to move to a full neural network force field in the framework of Espaloma, it may be useful to use Espaloma as a reference, and we may be able to use Espaloma to determine areas where OpenFF parameters need improvement. For example, there may be cases where OpenFF uses one parameter to encode a particular chemistry, that Espaloma splits into many different values. Here, Trevor Gokey’s work on automated parameter generation could come in handy for partitioning espaloma data. If assigned parameter values are significantly different between Espaloma and OpenFF, that would also be worth exploring.

Experiments

Big Torsion Deviations

As a first attempt, I labeled a data set with both Espaloma and Sage 2.1.0 and compared the values they assigned. Two of the torsions, shown below, had deviations between the Sage force constant and the average Espaloma value of more than 10 kcal/mol. These correspond to torsion IDs t129 and t140, respectively.

For these torsions, I replaced the Sage value with the average value from Esplaoma in the force field, and ran benchmarks on the OpenFF Industry Benchmark Season 1 v1.0 data set, yielding the plots below. I didn't expect to see much difference from such a small change of only two parameters, but it's encouraging that it didn't ruin anything, at least. The eps-tors-10 results might even be very slightly better, as desired.

All Parameters

With these results in hand, I next repeated the process but replacing every Sage parameter with the corresponding average parameter from Espaloma. As shown below, the results are more different from the esp-tors-10 results, as expected. And positively, esp-full appears to perform a bit better by all three metrics. This is without any re-fitting, so Espaloma’s average parameters for our SMIRKS patterns perform slightly better than our re-fit Sage 2.1.0 values.