JW – Polymer loading functionality is implemented in OE, sitting in a PR right now. I’m hoping that I can look at this again after the annual meeting and get it merged. This doesn’t yet handle noncapturing atoms but we can add that in a later release.
JW – It seems like there’s discrete steps where this can increase in complexity -
Handling more than two connection points
Handling heteropolymers
Allowing manual intervention
Why do we go through the intermediate of substructure dictionaries?
JW – This makes the graph assignment inspectable, and gives us substructures for a large number of downstream uses
Problem statement - What does “estimate parameters”
What’s the goal of substructure definition?
Building polymers from building blocks
Building polymers with geometry
Generating librarycharges
JW – The substructures needed to load PDBs
Loading from PDB
The minimal scope of this project is a PDB → SDF converter
Pysimm review
Polymer substructure creation automation
Challenging case: “Learn” the amino acids:
Input SMARTS: Take all mol2s from MainChain/ folder in , run fix_carboxylate_bond_orders on the offmols, and convert them to tagged SMARTS.
Harder starting point: Same as above but only with AllDipeptides from the same folder
Input PDB:
Ideal output: something very similar to existing substructure dictionary, with each pattern looking like a single AA substructure
CD – Is it useful to generate substructures in the first place?
JW – In the context of PDB loading, they’re useful in a few ways
Inspectability
“caching” expensive isomorphism computations - If we can train on a “minimal” polymer, then that may avoid really nasty scaling issues with isomorphisms on larger polymers
CD – The substructures that are generated for a small PDB are only guaranteed to work on the small PDB file.
Interoperating with existing substructures (eg, protein with one unnatural AA)
JW – If we get this working in a user-firendly way I’ll be really happy, even if we don’t end up with a substructure for the modified AA
Previous to dos
MS will send workflow from collaborator group that takes a mol2 and capping atom indices as inputs, and produces a polymer.
Done
CD will draft a project page with goals in order and specified milestones
CD will experiment with an automated process that handles “monomer information type 1”. Ultimate method signature will be make_substructures([monomer_info_sources], [pdbs_to_load])--> substructure_information. The output format isn’t super well specified, but should have equivalent information content to CD’s current substructure dict format (with noncapturing atoms allowed)
To dos
CD will try to “learn” amino acids using instructions above
CD will determine whether there is a real need for substructures as an output, or whether it’ll always be find to just go from PDB to SDF without a substructure dictionary as an intermediate
CD will try loading all existing homopolymer PDBs with two connection points (except vulcanized rubber)