2024-08-26 Mitchell/Wagner Check-in meeting notes

Participants

  • @Josh Mitchell

  • @Jeffrey Wagner

Discussion topics

Item

Notes

Item

Notes

General updates

  • JM –

    • Been porting code I wrote in python to rust. Can load CCD in about one minute. Determined it’s not worth keeping the whole CCD around, just download parts as you need it. Have a system where it can stream from disk, so can process the whole PDB without running out of memory. Has an API for accessing records in a structured way. Have a pathway to load PDB into records and process it, and finally arrive at Python.

    • The loader so far is pretty strict and errors out if anything ambiguous is found. So we’ll need to confront those issues at some point, eg

      • atom index/serial number being reused, which is happens over 10(0?)k atoms and confounds CONECT records.

      • There’s also a charge column, but semantics are weird - like ` ` and 0 means the same thing, there’s no way to represent “unknown”.

      • Also need to handle missing atoms/altlocs

    • JW –

      • Let’s review big picture plans - Was under the impression that we wanted to show ad board what current loaders could do, and then use that as evidence that we want to make a new one. Also I don’t want us to ship rust code.

        • JM – I’m thinking the same thing, just haven’t been able to get the working of a better implementation out of my head. Rust just seems like the right way to write a performant/optimized version of this, but the advantages of rust don’t seem that impressive here, and python would be fine.

        • JW – You and BW are really big on Rust - Maybe the employee growth assessments will be a good way to communicate this up to management - We’re supposed to be helping people grow and learn new skills and maybe that means adopting some Rust as an org. So growth assessemnts will be a good way to pass this up to project management.

        • JM – It’s an interesting language, but its benefits don’t come through in this case. If performance were super crucial then it could make sense, but that’s probably not the case here. Also deployment will be a little more complicated (though it’d be quite smooth if we deployed using pip)

      • Current status -

        • Workflow through PDBFixer - OFFTK is far too slow to do whole PDB so need to use a stride/some other sampling.

        • Haven’t tried using MDAnalysis guesser yet.

        • Pathways -

          • PDBFixer → Topology.from_pdb

          • PDBFixer → MDA guesser → RDKit mol → OFF Mol/top

          • New thing being written

        • JW – We should assume that we’re given an explicit protonation state (we don’t add Hs)

        • JM – Should we look up SMILES/SDF for small mols being loaded?

          • JW – No, in the OFF loader pathway this should just fail in the first run on this study.

        • JM – Is there a database of PDB files that aren’t straight from the PDB (and don’t have “correct” ligand names)?

          • JW – Our protein-ligand benchmark dataset, all from Schrodinger

        •  

        • JM – Next up, I’ll put the workflows above together and run the census on the PDB (or a representative slice thereof). I’ll try to get this to you for our next meeting.

          •  

        •  

      •  

      •  

  • JW –

    • Next Mon is a US holiday, then I’m in Europe the two mondays following it (giving a talk at RDKit UGM in two weeks!). Could we skip next week’s and reschedule the following two?

      • Cancelled next week’s, rescheduled others to 5 PM canberra / 9 AM berlin

    • Any tips for hot new material for RDKit talk?

      • JM – Interchange run file exports + better interop (resnames and stuff)

    • Still working on FB debugging. Reported a bug to GROMACS and fix should be in upcoming release.

    • We currently use dockerhub, but something about our docker subscription lapsed, so I’ll need to move the QC worker docker images and conda env yamls to somewhere else - likely github.

  •  

Trello

https://trello.com/b/dzvFZnv4/infrastructure

Action items

Decisions