...
Code Block |
---|
# in terminal git clone git@github.com:openforcefield/sage-2.1.0.git cd sage-2.1.0/inputs-and-outputs/data-sets/ # in python from openff.qcsubmit.results import TorsionDriveResultCollection td_result_collection = TorsionDriveResultCollection.parse_file( "td-set-for-fitting-2.1.0.json" ) |
Solution
t49 applies to any aromatic CN bond. t84 applies to any CN bond where the N has 2 substituents and the C has 3, or if the N has 3 substituents and one is O. t49 should apply to any molecule which has an aromatic right with a charged N+ in the ring, where the N has 3 substituents but none are O.
We can verify this by inspecting the training dataset to see which molecules need t49:
Code Block | ||
---|---|---|
| ||
from openff.toolkit import Molecule, ForceField, Topology
forcefield = ForceField("openff-2.1.0.offxml")
import tqdm
from qcportal.client import FractalClient
from openff.qcsubmit.results import TorsionDriveResultCollection
# Create a client which allows us to connect to the main QCArchive server.
qcarchive_client = FractalClient()
td_result_collection = TorsionDriveResultCollection.parse_file(
"sage-2.1.0/inputs-and-outputs/data-sets/td-set-for-fitting-2.1.0.json"
)
records_and_molecules = td_result_collection.to_records()
use_t49 = [] # list of molecules that use t49
for _, molecule in tqdm.tqdm(records_and_molecules, desc="checking"):
all_labels = forcefield.label_molecules(molecule.to_topology())
for mol_idx, mol_forces in enumerate(all_labels):
for force_tag, force_dict in mol_forces.items():
for atom_indices, parameter in force_dict.items():
atomstr = ""
for idx in atom_indices:
atomstr += "%3s" % idx
# for some reason this adds each molecule 4 times for each appearance of t49
if parameter.id == 't49':
use_t49.append(molecule.to_smiles())
# Create unique list of molecules that use t49
set_list = set(use_t49)
use_t49_unique = list(set_list)
# Visualize the molecules
Molecule.from_smiles(use_t49_unique[0])
Molecule.from_smiles(use_t49_unique[1])
Molecule.from_smiles(use_t49_unique[2]) |
t123
t123 "[*:1]~[#15:2]-[#6:3]-[*:4]" in Sage 2.1.0 seems entirely contained in t123a and t124 - should t123 be deleted? The V1 value is suspicious too.
...
A good way to tackle this would be the “coverage” approach above – seeing what kind of training data was used for this parameter might explain the relatively steep force constant.
...
Solution
t123 corresponds to a C that is single bonded to a P and at least one other group. t123a corresponds to a C that is single bonded to a P and one other substituent, and has 1 other substituents that can have any kind of bond. t124 corresponds to a C that is single bonded to a P and has 2 other substituents that can have any kind of bond.
It is hard to imagine a molecule that has a C with two single bonds that doesn't have either one or two other substituents (e.g.
...
a double bond or two single bonds), though maybe it could happen in some exotic molecule.
We can verify by checking whether any molecules in the training set use this parameter:
Code Block |
---|
from openff.toolkit import Molecule, ForceField, Topology
forcefield = ForceField("openff-2.1.0.offxml")
import tqdm
from qcportal.client import FractalClient
from openff.qcsubmit.results import TorsionDriveResultCollection
# Create a client which allows us to connect to the main QCArchive server.
qcarchive_client = FractalClient()
td_result_collection = TorsionDriveResultCollection.parse_file(
"sage-2.1.0/inputs-and-outputs/data-sets/td-set-for-fitting-2.1.0.json"
)
records_and_molecules = td_result_collection.to_records()
use_t123 = []
for _, molecule in tqdm.tqdm(records_and_molecules, desc="checking"):
all_labels = forcefield.label_molecules(molecule.to_topology())
for mol_idx, mol_forces in enumerate(all_labels):
for force_tag, force_dict in mol_forces.items():
for atom_indices, parameter in force_dict.items():
atomstr = ""
for idx in atom_indices:
atomstr += "%3s" % idx
# for some reason this adds each molecule 4 times for each appearance of t49
if parameter.id == 't123':
use_t123.append(molecule.to_smiles())
print(use_t123) # empty list |
It appears that no molecules in the training set use t123, which would make it redundant (at least based on this dataset).