Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Participants

...

Discussion topics

Item

Notes

General updates

  • JW – Polymer loading functionality is implemented in OE, sitting in a PR right now. I’m hoping that I can look at this again after the annual meeting and get it merged. This doesn’t yet handle noncapturing atoms but we can add that in a later release.

    • CD – Should I keep using networkx?

    • JW – Yes, for now

Project page review

Polymer Loading Functionality

  • JW – It seems like there’s discrete steps where this can increase in complexity -

    • Handling more than two connection points

    • Handling heteropolymers

    • Allowing manual intervention

  • Why do we go through the intermediate of substructure dictionaries?

    • JW – This makes the graph assignment inspectable, and gives us substructures for a large number of downstream uses

  • Problem statement - What does “estimate parameters”

  • What’s the goal of substructure definition?

    • Building polymers from building blocks

    • Building polymers with geometry

    • Generating librarycharges

      • JW – The substructures needed to load PDBs

    • Loading from PDB

  • The minimal scope of this project is a PDB → SDF converter

Pysimm review

Polymer substructure creation automation

  • Challenging case: “Learn” the amino acids:

    • Input SMARTS: Take all mol2s from MainChain/ folder in

      Github link macro
      linkhttps://github.com/openforcefield/amber-ff-porting/blob/master/AllTripeptides.tar.gz
      , run fix_carboxylate_bond_orders on the offmols, and convert them to tagged SMARTS.

      • Harder starting point: Same as above but only with AllDipeptides from the same folder

    • Input PDB:

      Github link macro
      linkhttps://github.com/openforcefield/openff-toolkit/blob/master/openff/toolkit/data/proteins/T4-protein.pdb

    • Ideal output: something very similar to existing substructure dictionary, with each pattern looking like a single AA substructure

  • CD – Is it useful to generate substructures in the first place?

    • JW – In the context of PDB loading, they’re useful in a few ways

      • Inspectability

      • “caching” expensive isomorphism computations - If we can train on a “minimal” polymer, then that may avoid really nasty scaling issues with isomorphisms on larger polymers

        • CD – The substructures that are generated for a small PDB are only guaranteed to work on the small PDB file.

      • Interoperating with existing substructures (eg, protein with one unnatural AA)

        • JW – If we get this working in a user-firendly way I’ll be really happy, even if we don’t end up with a substructure for the modified AA

Previous to dos

  • MS will send workflow from collaborator group that takes a mol2 and capping atom indices as inputs, and produces a polymer.

    • Done

  • CD will draft a project page with goals in order and specified milestones

  • CD will experiment with an automated process that handles “monomer information type 1”. Ultimate method signature will be make_substructures([monomer_info_sources], [pdbs_to_load])--> substructure_information. The output format isn’t super well specified, but should have equivalent information content to CD’s current substructure dict format (with noncapturing atoms allowed)

  •  

To dos

  • CD will try to “learn” amino acids using instructions above

  • CD will determine whether there is a real need for substructures as an output, or whether it’ll always be find to just go from PDB to SDF without a substructure dictionary as an intermediate

  • CD will try loading all existing homopolymer PDBs with two connection points (except vulcanized rubber)

Action items

  •  

Decisions