2022-04-14 Davel/Madin/Wagner Meeting notes

Date

Apr 14, 2022

Participants

@Jeffrey Wagner
@Connor Davel
@Owen Madin

Discussion topics

Item	Notes

Item

Notes

General updates

JW – Biopolymer release delays continue.
- Working on topology interop (picking back up after a few months).
- Vsites will be removed from API (bit I don’t think this affects you)
CD – Took a small break from polymer assignment and look into charge problem. Talked with MShirts and he said that we’d have library charges to cover whole polymer.
Methods of charge assignment:
- Compute the charges right now
  - QM(ish)-based
    - AM1BCC
    - ChargeIncrementHandler
    - RESP
  - Not-QM-based (like graph/ML based)
    - Gasteiger
    - mmff
    - espaloma
- Charges were already computed, we just need to copy-paste them to the right place
  - charge_from_molecules
  - LibraryCharges
CD – My first attempt was AM1BCC-based. I’d look through the molecule and get all the unique monomers. Then I’d compute the AM1BCC charges for that unique monomer and apply that to all the other instances of it. So I did this for the rubber structure I’m looking at, and running it before (using vanilla AM1BCC) took 27 seconds, whereas breaking up the polymer and doing AM1BCC for unique monomers took 15 seconds.
- JW – I’d recommend not automating this whole workflow, but rather having a breakpoint/process separation where the monomer charges get saved to LibraryCharges. There are going to be a TON of knobs to turn and parameters to optimize in the breaking-up-the-monomer-and-calculating-charges thing, and we’ll want that to be inspectable.
- OM – Agree, the inclusion of polymer linkages/connection points in the charge-assignment substructures will be super tricky.
- CD – Agree, I haven’t looked at the effect of different handling of connection points. Right now I’m capping monomers with a methyl or alkyl chain.
- JW – This capping/neighboring monomer-included-in-charge-substructure question is huge. See Lily Wang’s polymetrizer:
- https://openforcefield.atlassian.net/wiki/spaces/~9498511/pages/2083717123
- https://openforcefield.atlassian.net/wiki/spaces/~9498511/pages/2091417601
- https://openforcefield.atlassian.net/wiki/spaces/~9498511/pages/2082799617
- CD – Is it possible to define librarycharges that REQUIRE some neighboring atoms to be present, but without ASSIGNING charges to those neighboring atoms?
  - JW – Yes, you can put noncapturing atoms in Librarycharges, like [H][O:1][H] would only have assign the charge for the oxygen
CD – MShirts mentioned automatic template generation - Basically looking through a polymer and breaking it into monomers. Could the OpenFF toolkit support this?
- JW – I think the toolkit code could be modified to handle this, but I would refuse to merge this into production code, since automated monomer perception is harder than simple charge assignment to druglike molecules, and there are still druglike molecules that we don’t support. So feel free to make a fork of the toolkit or a ParameterHandler plugin, but I can’t merge that into production code without a lot of coordination with the science team and likely more validation than we could do over the summer.
- OM – Let’s talk to MShirts again to understand the scope of this, and the potential overlap with NIST work.
JW – If we’re going to treat this as a science project, we should get the science lead involved, but Simon’s time is extremely limited right now. So I recommend breaking the charge assignment component into its own script/program right now, and ensuring that it just has basic functionality, but that we can come back to it in the future if we want charge assignment to be really accurate. So for now I think we should look at the infrastructure side, where we try to handle processing a wider variety of polymers.
- OM – Basically agree, we should define the scope as not being super concerned with the accuracy of charges.
- CD – So, if we don’t worry about the charges much, the engineering side of the problem should be confined to “generating library charges”
- JW – Yes, I think the librarycharge generation should be somewhat straightforward if you already have it coded internally. So it would be good to come up with a “smarter” way of loading polymermers from PDB, and defining a usable interface for external users who want to try using other types of polymers.
- CD – So, like, figuring out how a new user could define monomers and end groups. It seems like there are two options -
  - “Bottom up”: A complete pipeline where the user defines monomers and caps, and we use that to load PDBs
    - Could be a jupyter notebook where it reads
      - some SMARTS or SDF files
    - Then the user identifies monomers by:
      - NGLView selection
      - identifying atom indices in a 2D structure
      - Identifying atoms indices programmatically (like a for loop over the atoms, with an “if” statement to decide whether they’re in a monomer)
    - Then the output would be
      - A modified substructure dictionary file that can read polymer pdbs consisting of the identified monomers and caps
    - There could be a section at the end of the notebook where it can do a test-load of a PDB of interest
      - Can highlight the sections that are missing atom/bond parameters.
    - Additionally, if there are cases where the residue name needs to be known to assign chemistry from a PDB, then we could discuss making a new API poit for molecule loading that has this functionality but I’d need a bit of convincing that such a case exists.
  - “Top down”: A pipeline where we load existing polymers and analyze them to learn about their monomers.
    - JW – Tough thing about this is, if our goal is to load polymers from PDB, then this method will require an SDF of the same molecule. But if we already have an SDF, then we don’t need the PDB in the first place.
CD – I have an end-of-semester presentation coming up. Should I talk about the polymer work, or stick to the stuff I’ve done before?
- JW – Probably good to recap what you’ve done before, but it would be cool to show off the monomer identification notebook or experimental API, and possible graph-theory underpinnings of easy/medium/hard cases and math notation/proofs.

Meetings

2022-04-14 Davel/Madin/Wagner Meeting notes

Date

Participants

Discussion topics

Action items

Decisions