/
2021-04-05 Pulido onboarding Meeting notes

2021-04-05 Pulido onboarding Meeting notes

Date

Apr 5, 2021

Participants

  • @Jeffrey Wagner

  • @Iván Pulido

Discussion topics

Item

Notes

Item

Notes

Organization

  • Professors

    • John Chodera (New York/San Francisco)

    • David Mobley (Los Angeles)

    • Michael Shirts (Colorado)

    • Mike Gilson (San Diego)

  • Full time people

    • Science-y people

      • Simon Boothroyd (UK) – force field czar

      • David Hahn (Belgium/Germany) – Joint position with Janssen, FF benchmarking

      • Lorenzo D’Amore (Spain/Belgium) – Joint position with Janssen, replacing DH

    • Infrastructure team

      • Jeff Wagner – OFF toolkit and general project support

      • Matt Thompson – OFF System (parmed/OpenMM System substitute)

      • David Dotson (50% OpenFF) – QCFractal development/management and benchmarking infrastructure

Slack

  • Added to all relevant channels, including #check-ins. Iván will do a check in each working day.

GSuite

Confluence

  • Iván made a personal page

Website

  • Made to-do list for website, listed below

Meetings

  • Invited to sprint planning

  • Should also attend biopolymer meetings

Github

  • Added

Initial work

Conceptual questions:

  • Run through SMIRNOFF simulation example

  • Why does the OpenFF toolkit require SDF/MOL2 instead of PDB?

    • IP – SDF/MOL2 allows the option of specifying richer chemical information that is needed to assign parameters to molecules. Such as bond orders, charges, atomic weights, SMILES representations, among others. These are not possible with PDB.

  • In the above example, where does the force field file (OFFXML/SMIRNOFF format) ultimately come from (link to repo/file)?

  • Where do the partial charges come from in the above example?

    • IP – AM1BCC. But I’m not sure where that selection is specified in the inputs.

    • How does this know to use AM1BCC?

      • IP – I believe it is encoded in the OpenFF “Parsley“ forcefield specification file, under the ToolkitAM1BCC tag.

    • Which code is executed to calculate the partial charges?

      • IP – Calling Forcefield() does it using the charges as specified in the forcefield.

  • Which example from the directory above should be used to set up a protein-ligand simulation? Where do the protein parameters come from? How does a SMIRNOFF ligand get combined with a protein parameterized using OpenMM and a different force field?

    • Protein-ligand sim:

    • Protein parameters come from AMBER force field line

    • System combination is done by ParmEd with the (extremely stylish) + operator

  • How does each of the following packages define “Molecule”, “Topology”, and “System”? (Some of the packages below only define one or two, don’t spend more than 5 minutes on a term if you can’t find it)

    • RDKit

      • Molecule: Atoms and bonds (optional bookmarks and arbitrary metadata) – through rdkit.Chem.rdchem.Mol

      • No explicit Topology or System. Can read groups of molecules in an SDF using Chem.SDMolSupplier.

    • MDTraj

      • Topology: Can be read from a PDB file. Includes atoms, chain, residues and connectivity information – through traj.topology – no charge information.

      • No explicit Molecule or System.

    • OpenMM

      • No explicit Molecule.

      • Topology: Similar to mdtraj’s. Atom, chains, residues and connectivity information. No charge information. Through pdb.topology and simtk.openmm.app.topology.Topology

      • System: Joins ForceField and Topology in a single object which has the mathematical information on how to do a MM simulation (forces, atoms, interactions, methods, box distance/type, constraints, etc.) – through forcefield.createSystem and simtk.openmm.openmm.System

    • OpenFF

      • Molecule: openff.toolkit.topology.Molecule – Object with atoms, bonds, torsions, charges, coordinates… basically everything you need to know from a chemical molecule at an atomistic level.

      • Topology: Ordered group of Molecules – openff.toolkit.topology.Topology

      • System: No explicit definition.

    • ParmEd

      • Molecule: Not explicitly, it does have parmed.topologyobjects.Residue which are composed of groups of atoms.

      • No explicit definition for Topology or System. Many clases are used to define a topology (Atoms, Residues, Bond, Angle…).

  • In the above packages, are molecules required to have coordinates?

    • mdtraj & OpenMM: No specific Molecule definition, but they use conformations – I’d say Yes.

    • RDKit & OpenFF – Not required. Can be added by adding conformers.

    • ParmEd – Not required in Atoms or Residues.

  • If the above SMIRNOFF_simulation example can simulate two molecules, why can’t we just plug a protein in as a third molecule? Why use OpenMM/ParmEd?
    IP – I can see that openff-toolkit’s Molecule(s) need detailed bond information that is not available in protein structures. Crystallography experiments are frequently not able to resolve parts of the proteins (specially hydrogen atoms).

    • Technical reasons?

      • We don’t have the required bond information from the protein structure (PDB).

      • We need some tool connecting both worlds, small-molecule world with SMIRNOFF/OpenFF forcefield (closer to QM) and “large”-molecule world with different classical forcefield (AMBER, GROMOS, CHARMM, etc.).

    • Scientific/theoretical reasons?

      • My understanding is that protein behavior up to some point can be well-described using classical physics and, therefore, classical potentials and are commonly described up to atomistic resolution. Small molecules (drug-like) require higher resolution structures (electron density maps, for example) and detailed chemical information that is commonly only achieved through QM or close-to-QM calculations.

      • Algorithms such as AMI1BCC would never finish with big protein-like molecules.

    • PDB format reasons?

      • PDB format only specifies atoms types and indices, residues types and indices and coordinates, chain, coordinates, b-factor and occupancy. No information for detailed chemical properties (bonds orders, aromaticity, stereochemistry, etc.)

 

 

Project background:

 

Engineering warmup:

  • Task: Make a development build

  • How would we add TopologyAtom.element?

    • from openff.toolkit.topology import Molecule mol = Molecule.from_smiles('CCO') print(mol.atoms[0].atomic_number) print(mol.atoms[0].element) top = mol.to_topology() print([at for at in top.topology_atoms][0].atomic_number) print([at for at in top.topology_atoms][0].element)

       

    • JW – I messed up and didn’t ever add the last line of code’s element access method. Where would this method go? What would the code look like?

      • IP – The element method should be part of the TopologyAtom class and the code should look something like:

        @property def element(self): """ Get the element name of this atom Returns ------- str """ return self._atom.element

        Tested succesfully.

Action items

@Iván Pulido will send Jeff a website picture, ORCID (if you have one), full name as you’d like it on the website, linkedin page, twitter, github, google scholar (all if applicable)

Decisions