General updates | JW JM Offline next week Thinking about the experience of discovering OpenFF - I’d like to prominently display a list/table of what we can simulate. Ex direclty say “if you have an organic small molecule, we can simulate it”, and have that link to the datasets and benchmarks and stuff. Possibly have a piece metadata associated with each FF that describes what it’s recommended for, etc. Ex conjugated systems JW – Good idea, not sure if it’s top priority compared to everything else but I’ll add to team backlog. JM – Even communicating that we have a product would be an improvement.
|
Loading census | JM – Tested first 1,000 and MDAnalysis can handle about 25% of them. Almost all errors were too-many-bonds (Hs and Cs). Others were toolkit choking on radicals. Might be that it’s a challenging test set (rough initial geometries from heavy atom replacement and hydrogen addition). Given that 75% had error-raising problems, I wonder how much of the 25% we can trust. So I’m updating my loader to have the reference (protonation states, etc) to compare against. JW – Hm, so even if we get rid of 90% of the loading errors by minimizing/doing process optimizations, we’ll still have 7.5% errors, which is way too high. Do you know what % the toolkit could load (ie, how many are totally vanilla proteins)? (General) – The MDA loader isn’t a clear winner here. JW – I need to figure out: How much of the census we want to continue doing/what to report to boards How to structure/prioritize the josh-loader JM – Target API is: class Topology:
def from_pdb(
file: PathLike | TextIO,
use_canonical_names: bool = False,
unique_molecules: list[Molecule] = [],
residue_database: Mapping[
str, list[ResidueDefinition]
] = CCD_RESIDUE_DEFINITION_CACHE,
) -> Topology:
class ResidueDefinition:
def from_smiles(
resname: str,
mapped_smiles: str,
atom_names: Mapping[int, str],
) -> ResidueDefinition
def from_molecule()
def from_capped_molecule() # For AAs, nucleotides, etc JW – I’ll need to think a little bit about the residuedefinition creation pathway - Not sure what I’d expect users to have for their NCAA. Test set design The PDBFixer set is good Be sure to include strained conformations Include pdbs from as many different software packages as possible (Amber, GROMACS, pymol, OpenMM, …) Capped and uncapped (both neutral and charged) AAs
Behavior/specification Support residue/atom name “synonyms”? Eg loading AMBER atom/residue names? Pre-populated residue library (to prevent airgapped computers from having connectivity issues?) Or would this get too big (hundreds of MB)? JM – It’s very likely to be too large, though we could ship a common subset. JW – Could print an error message in this case.
mmCIF and PDBx
All edge cases: Missing atoms compared to all residue definitions for resname Extra atoms compared to all residue definitions No match for residue name Stereo is unspecified in residue template Residue template has aromatic bond(s) Residue name and atom name matches template, but formal charge, element, or CONECT record does not match Multiple residue definitions match a residue (like, two identical ALAs are defined)
|