2022_06_23 Davel/Madin/Wagner Check in

Participants

@Connor Davel
@Owen Madin
@Jeffrey Wagner

Discussion topics

Item	Notes

Item	Notes
General updates	JW – Polymer loading functionality is implemented in OE, sitting in a PR right now. I’m hoping that I can look at this again after the annual meeting and get it merged. This doesn’t yet handle noncapturing atoms but we can add that in a later release. CD – Should I keep using networkx? JW – Yes, for now
Project page review	Polymer Loading Functionality JW – It seems like there’s discrete steps where this can increase in complexity - Handling more than two connection points Handling heteropolymers Allowing manual intervention Why do we go through the intermediate of substructure dictionaries? JW – This makes the graph assignment inspectable, and gives us substructures for a large number of downstream uses Problem statement - What does “estimate parameters” What’s the goal of substructure definition? Building polymers from building blocks Building polymers with geometry Generating librarycharges JW – The substructures needed to load PDBs Loading from PDB The minimal scope of this project is a PDB → SDF converter
Pysimm review
Polymer substructure creation automation	Challenging case: “Learn” the amino acids: Input SMARTS: Take all mol2s from `MainChain/` folder in , run fix_carboxylate_bond_orders on the offmols, and convert them to tagged SMARTS. Harder starting point: Same as above but only with AllDipeptides from the same folder Input PDB: Ideal output: something very similar to existing substructure dictionary, with each pattern looking like a single AA substructure CD – Is it useful to generate substructures in the first place? JW – In the context of PDB loading, they’re useful in a few ways Inspectability “caching” expensive isomorphism computations - If we can train on a “minimal” polymer, then that may avoid really nasty scaling issues with isomorphisms on larger polymers CD – The substructures that are generated for a small PDB are only guaranteed to work on the small PDB file. Interoperating with existing substructures (eg, protein with one unnatural AA) JW – If we get this working in a user-firendly way I’ll be really happy, even if we don’t end up with a substructure for the modified AA
Previous to dos	MS will send workflow from collaborator group that takes a mol2 and capping atom indices as inputs, and produces a polymer. Done CD will draft a project page with goals in order and specified milestones CD will experiment with an automated process that handles “monomer information type 1”. Ultimate method signature will be `make_substructures([monomer_info_sources], [pdbs_to_load])--> substructure_information`. The output format isn’t super well specified, but should have equivalent information content to CD’s current substructure dict format (with noncapturing atoms allowed)
To dos	CD will try to “learn” amino acids using instructions above CD will determine whether there is a real need for substructures as an output, or whether it’ll always be find to just go from PDB to SDF without a substructure dictionary as an intermediate CD will try loading all existing homopolymer PDBs with two connection points (except vulcanized rubber)

Meetings

2022_06_23 Davel/Madin/Wagner Check in

Participants

Discussion topics

Action items

Decisions