Why does the OpenFF toolkit require SDF/MOL2 instead of PDB?
IP – SDF/MOL2 allows the option of specifying richer chemical information that is needed to assign parameters to molecules. Such as bond orders, charges, atomic weights, SMILES representations, among others. These are not possible with PDB.
In the above example, where does the force field file (OFFXML/SMIRNOFF format) ultimately come from (link to repo/file)?
IP – QM simulations, approximations w/ empirical methods.
Where do the partial charges come from in the above example?
IP – AM1BCC. But I’m not sure where that selection is specified in the inputs.
How does this know to use AM1BCC?
IP – I believe it is encoded in the OpenFF “Parsley“ forcefield specification file, under the ToolkitAM1BCC tag.
Which code is executed to calculate the partial charges?
IP – Calling Forcefield() does it using the charges as specified in the forcefield.
Which example from the directory above should be used to set up a protein-ligand simulation? Where do the protein parameters come from? How does a SMIRNOFF ligand get combined with a protein parameterized using OpenMM and a different force field?
Protein-ligand sim:
Protein parameters come from AMBER force field line
System combination is done by ParmEd with the (extremely stylish) + operator
How does each of the following packages define “Molecule”, “Topology”, and “System”? (Some of the packages below only define one or two, don’t spend more than 5 minutes on a term if you can’t find it)
RDKit
Molecule: Atoms and bonds (optional bookmarks and arbitrary metadata) – through rdkit.Chem.rdchem.Mol
No explicit Topology or System. Can read groups of molecules in an SDF using Chem.SDMolSupplier.
MDTraj
Topology: Can be read from a PDB file. Includes atoms, chain, residues and connectivity information – through traj.topology – no charge information.
No explicit Molecule or System.
OpenMM
No explicit Molecule.
Topology: Similar to mdtraj’s. Atom, chains, residues and connectivity information. No charge information. Through pdb.topology and simtk.openmm.app.topology.Topology
System: Joins ForceField and Topology in a single object which has the mathematical information on how to do a MM simulation (forces, atoms, interactions, methods, box distance/type, constraints, etc.) – through forcefield.createSystem and simtk.openmm.openmm.System
OpenFF
Molecule: openff.toolkit.topology.Molecule – Object with atoms, bonds, torsions, charges, coordinates… basically everything you need to know from a chemical molecule at an atomistic level.
Topology: Ordered group of Molecules – openff.toolkit.topology.Topology
System: No explicit definition.
ParmEd
Molecule: Not explicitly, it does have parmed.topologyobjects.Residue which are composed of groups of atoms.
No explicit definition for Topology or System. Many clases are used to define a topology (Atoms, Residues, Bond, Angle…).
In the above packages, are molecules required to have coordinates?
mdtraj & OpenMM: No specific Molecule definition, but they use conformations – I’d say Yes.
RDKit & OpenFF – Not required. Can be added by adding conformers.
ParmEd – Not required in Atoms or Residues.
If the above SMIRNOFF_simulation example can simulate two molecules, why can’t we just plug a protein in as a third molecule? Why use OpenMM/ParmEd? IP – I can see that openff-toolkit’s Molecule(s) need detailed bond information that is not available in protein structures. Crystallography experiments are frequently not able to resolve parts of the proteins (specially hydrogen atoms).
Technical reasons?
We don’t have the required bond information from the protein structure (PDB).
We need some tool connecting both worlds, small-molecule world with SMIRNOFF/OpenFF forcefield (closer to QM) and “large”-molecule world with different classical forcefield (AMBER, GROMOS, CHARMM, etc.).
Scientific/theoretical reasons?
My understanding is that protein behavior up to some point can be well-described using classical physics and, therefore, classical potentials and are commonly described up to atomistic resolution. Small molecules (drug-like) require higher resolution structures (electron density maps, for example) and detailed chemical information that is commonly only achieved through QM or close-to-QM calculations.
Algorithms such as AMI1BCC would never finish with big protein-like molecules.
PDB format reasons?
PDB format only specifies atoms types and indices, residues types and indices and coordinates, chain, coordinates, b-factor and occupancy. No information for detailed chemical properties (bonds orders, aromaticity, stereochemistry, etc.)
We will eventually start turning this into aspirational code, and then tests, and then finally we’ll build an implementation that makes the tests pass. We’ll get to the aspirational code later this week/next week, but for now we’ll focus on understand the fundamentals of the software/concepts.
This is for your interest if you’d like to look at it, but there aren’t any deliverables for this yet.
IP – What about the OpenEye license? How should I obtain one?
How would we add TopologyAtom.element?
from openff.toolkit.topology import Molecule
mol = Molecule.from_smiles('CCO')
print(mol.atoms[0].atomic_number)
print(mol.atoms[0].element)
top = mol.to_topology()
print([at for at in top.topology_atoms][0].atomic_number)
print([at for at in top.topology_atoms][0].element)
JW – I messed up and didn’t ever add the last line of code’s element access method. Where would this method go? What would the code look like?
IP – The element method should be part of the TopologyAtom class and the code should look something like:
@property
def element(self):
"""
Get the element name of this atom
Returns
-------
str
"""
return self._atom.element
Tested succesfully.
Action items
@Iván Pulido will send Jeff a website picture, ORCID (if you have one), full name as you’d like it on the website, linkedin page, twitter, github, google scholar (all if applicable)