2022-05-12Davel/Madin/Wagner Meeting notes

 Date

May 12, 2022

 Participants

  • @Jeffrey Wagner

  • @Connor Davel

  • @Owen Madin

 Discussion topics

Item

Notes

Item

Notes

General updates

  • JW – Biopolymer pre-alpha release is available. SHould have identical behavior for loading form PDBs and reading the substructure dict.

  • JW – HOWEVER, we’re looking at faster/safer ways to load from PDB, and these may change the format of the substructure dict (like to include [#7;D3;H1]. The approach under consideration is here:

    • JW – May be safer for identifying context at linkers/termini. Also there’s a problem of applying chemical info to bonds linking monomers.

    • CD – We had previously discussed doing chemical info assignment based on tagged atoms (so, like librarycharges,a llowing “context”/non capturing atoms in chemical info assignment patterns). What’s the status of that?

    • JW – I think the approach is the PR is better, but I can’t put my finger on why. One problem I had though about in the “context”/noncapturing atoms approach is how we deal with inter-monomer bonds (like, bonds where one atom is tagged, but another one is context). But I can’t really say why I think the approach above would fix it.

    •  

    •  

Davel progress

  • CD – End-of-semester presentation was delayed to unknown point in the future (early summer).

  • CD – I’ve been working on monomer-and-library-creation stuff

    • CD - I started using Py3DMol because of its more powerful selection tools (nglview only allowed selecting 4 atoms).

    • CD – Can currently load a PDB and click on atoms to select a monomer, then make a SMARTS from them and assign double bonds by clicking.

    • UI feedback

      •  

    • Functionality feedback

      • JW – It’s a bit interesting that this STARTS with a PDB, and then asks the user to assign double bond info. I’d think that a more common use case would be loading an SDF and identifying the monomer units in it. In the long run it would be great if this notebook could do both start-from-PDB AND start-from-SDF, but once one is complete the other is probably pretty easy to add, and it’s not important which order you go in.

      • JW – Does this change the graph upon assigning double bonds?

        • CD – No, this just updates the edge info. It won’t delete protons.

        • JW – Ok, great. We should be clear that this functionality will NEVER sanitize or sanity check the chemistry

      • JW – Would be good to also allow setting single, double, and triple bonds, and formal charges from -2 to +2. The extent of the chemistry that we’d need to assign would just be bond order and formal charges

      • JW – It’d be great if this notebook could finish by actually trying to load+parameterize the PDB based on the chemical info

      • Charge generation

        • OM + JW – It would be good to get a proof-of-concept to do quick capping and charging of monomers to get LibraryCharges. These don’t have to be “good” initially.

        • Could have an api point calculate_charges_for_substructures(substructure_list)--> list of LibraryCharge. The underlying functionality could just be naive capping and AM1BCC charging.

        • CD – Does this assume that the substructure list use for chemical information assignment will be the same as the one used for charge assignment?

          • JW – Yes, for now. But in the future we’d be able to have them deviate from each other. This would just be sufficient for an MVP.

    • Architecture feedback

      • JW – So, has this modified the source code of py3dmol?

        • CD – No. This just sits on top of existing libraries.

      • CD – For py3dmol to tell the python kernel about the selection, it currently dumps stuff into the html of the page, and then the python process reads the html of the page. It’s super janky.

        • JW + OM – We could have the javascript program dump info to a file on disk, and then have python read that. This should be “good enough”

      • CD – Should make this a webapp?

        • JW + OM – No, not because it’s a bad idea, just because nobody else here knows how to maintain a webapp. So if there were a problem in the future you’d be the only person who could fix it.

      • Modularity

        • JW + CD – It’ll be good to keep all the javascript in one class, and all the rdkit in another. This should be pretty close to the final architecture.

        • (General) – What about if we have a program use heuristics to identify the monomers automatically?

          • This should be possible to slot in in the future - Instead of a 3D visualization, a program could try to provide the atom selections that define monomers.

  •  

  •  

To dos

  • Immediate

    • Get the notebook able to work on a polymer - Should be able to go from loading PDB → OpenMM system in 10 minutes.

      • Implement formal charge and single/triple bond assigner

      • It’s OK if we need to copy and paste SMARTS after clicking on the structure

      • Try having components talk using file system instead of page html

      • Implement calculate_charges_for_substructures(substructure_list) (could use AM1BCC or gasteiger, we just need to have something)

      • Make (at least) two classes, and have all the javascript be in one, and all the rdkit be in another.

  • More distant

    • Same workflow but starting from SDF

    • Defining modular components and their interfaces/APIs

    • Trying to have a program identify monomers

    • Better charge assignment

    • 3D visualization that shows which parts of the substructure that are covered by the already-defined substructures

 Action items

 Decisions