Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Goals

Discussion topics

...

Item

...

Presenter

...

Discussion topics

Item

Notes

Project scope

  • JW – It seems like we’re not sure exactly where we want this project to go - MS is thinking about possible future grants, CD has found some interesting avenues to look down, and JW is still working on exactly what our polymer loading functionality will look like. So it’s probably best to just follow our curiosity and see where it goes.

  • CD – Agree. I’m going to keep considering the scope of this project to be “make a PDB to SDF converter”

  • JW – Before, I’d been fairly adamant that we must use a substructure library as an intermediate. But since there are so many places this project should go, I don’t think that needs to be a requirement.

  • CD – Sounds good. I think that there are cases where the substructure library make sense as an intermediate, but in some of the cases I checked, like protein loading, the substructure library because huge.

Updates from Connor

  • Pushing into heteropolymer areas

    • more advanced “isomorphism fitting” algorithm

      • CD will post slides here

    • JW – When making a substructure dictionary, how does this decide how many neighboring/”context” atoms to include?

      • CD – It includes the complete neighboring monomer on each side as context.

    • JW – Very cool. What do we anticipate will happen when we start encountering “ties”? Can we say anything about how to avoid them?

  • Follow-up discussion points:

    • Are substructures necessary? Are they useful for caching?

      • CD-maybe, but only for homopolymers and simple heteropolymers

        • useless for T4-protein

      • JW – Let’s say this isn’t necessary. If we move this toward production use in the future we can reconsider this question, but for now I don’t see it as needed.

      • CD – Ok, I’ll keep the code around in case it’s needed later. We can also use the code to help generate library charges.

    • CD - Is the way I inputted amino acids consistent with biology? Are there any proteins that this would not work on?

      • JW – The protein backbone should end in C(=O)([O-]) - I wonder how that’s getting assigned?

      • CD – I’m not sure, will look into this

      • Breadcumbs for how the toolkit handles this are here:

        Github link macro
        linkhttps://github.com/openforcefield/openff-toolkit/blob/e321fe9d9d805172275083ebfd1ed469ab817e5e/openff/toolkit/utils/_cif_to_substructure_dict.py#L536-L555
        extendedfalse

    • For inter-monomer bonds, I currently assume that they are single bonds and may allow for users to input this info manually like:

      • custom_bonds = {("mon1", "mon2"): Chem.rdchem.bondType.DOUBLE}

      • JW – That’s fine, just document the limitation/special case.

      • CD – Sounds good. There are lots of ways that this could be worked around.

Previous and future to-dos

  • CD will draft a project page with goals in order and specified milestones

  • CD will experiment with an automated process that handles “monomer information type 1”. Ultimate method signature will be make_substructures([monomer_info_sources], [pdbs_to_load])--> substructure_information. The output format isn’t super well specified, but should have equivalent information content to CD’s current substructure dict format (with noncapturing atoms allowed)

  • CD will try to “learn” amino acids using instructions above

  • CD will determine whether there is a real need for substructures as an output, or whether it’ll always be find to just go from PDB to SDF without a substructure dictionary as an intermediate

  • CD will try loading all existing homopolymer PDBs with two connection points (except vulcanized rubber)

  • Check/fix what happens to protein C terminal oxygen charge (or make terminal group behavior more robust)

  • Extend tools to be able to help users “debug” PDBs that can’t be fully loaded - Like, if they try to load a PDB with a PEGylated amino acid, the program could output a view of the unrecognized atoms and some context, and ask the user to fill in the missing info. It’s fine if this doesn’t use the interactive GUI, but rather just outputs 2D images to PDF or something.

    • Example post-translationally modified protein:

View file
namelabeled.pdb

More context:

View file
namencaa_param.zip

Action items

  •  

Decisions