2021-04-05 Pulido onboarding Meeting notes

Date

Apr 5, 2021

Participants

@Jeffrey Wagner
@Iván Pulido

Discussion topics

Item	Notes

Item	Notes
Organization	Professors John Chodera (New York/San Francisco) David Mobley (Los Angeles) Michael Shirts (Colorado) Mike Gilson (San Diego) Full time people Science-y people Simon Boothroyd (UK) – force field czar David Hahn (Belgium/Germany) – Joint position with Janssen, FF benchmarking Lorenzo D’Amore (Spain/Belgium) – Joint position with Janssen, replacing DH Infrastructure team Jeff Wagner – OFF toolkit and general project support Matt Thompson – OFF System (parmed/OpenMM System substitute) David Dotson (50% OpenFF) – QCFractal development/management and benchmarking infrastructure
Slack	Added to all relevant channels, including #check-ins. Iván will do a check in each working day.
GSuite	Email: ivan.pulido@openforcefield.org Calendar
Confluence	Iván made a personal page
Website	Made to-do list for website, listed below
Meetings	Invited to sprint planning Should also attend biopolymer meetings
Github	Added
Initial work	Conceptual questions: Run through SMIRNOFF simulation example Why does the OpenFF toolkit require SDF/MOL2 instead of PDB? IP – SDF/MOL2 allows the option of specifying richer chemical information that is needed to assign parameters to molecules. Such as bond orders, charges, atomic weights, SMILES representations, among others. These are not possible with PDB. In the above example, where does the force field file (OFFXML/SMIRNOFF format) ultimately come from (link to repo/file)? IP – QM simulations, approximations w/ empirical methods. https://raw.githubusercontent.com/openforcefield/openff-forcefields/1.2.0/openforcefields/offxml/openff-1.2.0.offxml Where do the partial charges come from in the above example? IP – AM1BCC. But I’m not sure where that selection is specified in the inputs. How does this know to use AM1BCC? IP – I believe it is encoded in the OpenFF “Parsley“ forcefield specification file, under the `ToolkitAM1BCC` tag. Which code is executed to calculate the partial charges? IP – Calling Forcefield() does it using the charges as specified in the forcefield. Which example from the directory above should be used to set up a protein-ligand simulation? Where do the protein parameters come from? How does a SMIRNOFF ligand get combined with a protein parameterized using OpenMM and a different force field? Protein-ligand sim: Protein parameters come from AMBER force field line System combination is done by ParmEd with the (extremely stylish) `+` operator How does each of the following packages define “Molecule”, “Topology”, and “System”? (Some of the packages below only define one or two, don’t spend more than 5 minutes on a term if you can’t find it) RDKit Molecule: Atoms and bonds (optional bookmarks and arbitrary metadata) – through `rdkit.Chem.rdchem.Mol` No explicit Topology or System. Can read groups of molecules in an SDF using `Chem.SDMolSupplier`. MDTraj Topology: Can be read from a PDB file. Includes atoms, chain, residues and connectivity information – through `traj.topology` – no charge information. No explicit Molecule or System. OpenMM No explicit Molecule. Topology: Similar to mdtraj’s. Atom, chains, residues and connectivity information. No charge information. Through `pdb.topology` and `simtk.openmm.app.topology.Topology` System: Joins ForceField and Topology in a single object which has the mathematical information on how to do a MM simulation (forces, atoms, interactions, methods, box distance/type, constraints, etc.) – through `forcefield.createSystem` and `simtk.openmm.openmm.System` OpenFF Molecule: openff.toolkit.topology.Molecule – Object with atoms, bonds, torsions, charges, coordinates… basically everything you need to know from a chemical molecule at an atomistic level. Topology: Ordered group of Molecules – `openff.toolkit.topology.Topology` System: No explicit definition. ParmEd Molecule: Not explicitly, it does have `parmed.topologyobjects.Residue` which are composed of groups of atoms. No explicit definition for Topology or System. Many clases are used to define a topology (Atoms, Residues, Bond, Angle…). In the above packages, are molecules required to have coordinates? mdtraj & OpenMM: No specific Molecule definition, but they use conformations – I’d say Yes. RDKit & OpenFF – Not required. Can be added by adding conformers. ParmEd – Not required in Atoms or Residues. If the above `SMIRNOFF_simulation` example can simulate two molecules, why can’t we just plug a protein in as a third molecule? Why use OpenMM/ParmEd? IP – I can see that openff-toolkit’s Molecule(s) need detailed bond information that is not available in protein structures. Crystallography experiments are frequently not able to resolve parts of the proteins (specially hydrogen atoms). Technical reasons? We don’t have the required bond information from the protein structure (PDB). We need some tool connecting both worlds, small-molecule world with SMIRNOFF/OpenFF forcefield (closer to QM) and “large”-molecule world with different classical forcefield (AMBER, GROMOS, CHARMM, etc.). Scientific/theoretical reasons? My understanding is that protein behavior up to some point can be well-described using classical physics and, therefore, classical potentials and are commonly described up to atomistic resolution. Small molecules (drug-like) require higher resolution structures (electron density maps, for example) and detailed chemical information that is commonly only achieved through QM or close-to-QM calculations. Algorithms such as AMI1BCC would never finish with big protein-like molecules. PDB format reasons? PDB format only specifies atoms types and indices, residues types and indices and coordinates, chain, coordinates, b-factor and occupancy. No information for detailed chemical properties (bonds orders, aromaticity, stereochemistry, etc.) Project background: Previous meeting on polymer topology / place where we’ve started collecting requirements2021-02-11 Topology refactor Meeting notes We will eventually start turning this into aspirational code, and then tests, and then finally we’ll build an implementation that makes the tests pass. We’ll get to the aspirational code later this week/next week, but for now we’ll focus on understand the fundamentals of the software/concepts. This is for your interest if you’d like to look at it, but there aren’t any deliverables for this yet. Engineering warmup: Task: Make a development build https://open-forcefield-toolkit.readthedocs.io/en/latest/developing.html#setting-up-a-development-environment IP – What about the OpenEye license? How should I obtain one? How would we add `TopologyAtom.element`? `from openff.toolkit.topology import Molecule mol = Molecule.from_smiles('CCO') print(mol.atoms[0].atomic_number) print(mol.atoms[0].element) top = mol.to_topology() print([at for at in top.topology_atoms][0].atomic_number) print([at for at in top.topology_atoms][0].element)` JW – I messed up and didn’t ever add the last line of code’s `element` access method. Where would this method go? What would the code look like? IP – The `element` method should be part of the `TopologyAtom` class and the code should look something like: `@property def element(self): """ Get the element name of this atom Returns ------- str """ return self._atom.element` Tested succesfully.

Action items

@Iván Pulido will send Jeff a website picture, ORCID (if you have one), full name as you’d like it on the website, linkedin page, twitter, github, google scholar (all if applicable)

2021-04-05 Pulido onboarding Meeting notes

Date

Participants

Discussion topics

Action items

Decisions