2024-04-23 BW/LW Check-in

Participants

  • @Brent Westbrook

  • @Lily Wang

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

General updates and discussion on projects

 

  • Torsion MSM outcomes

    • BW is getting complex numbers and high k values

    • BW pasted script in Slack:

    • Probably not worth spending too much extra on this

    • BW: the scaling difference is what worries me most, it’s probably not a unit issue since it’s not constant, which also means coming with some kind of scaling modifier wouldn’t be straightforward

    • Scripts + necessary files for searchability

    • BW: one possibility is espaloma

  • Torsion splitting in 2.2

    • BW: split the torsions and set up the valence force field for it

    • BW: ended up with ~38 new parameters

    • LW: next step: check that everything has data

      • Training and benchmark

    • LW: there are some torsions in Sage that don’t have commas but could cover multiple multiplicities, which weren’t picked up earlier by Meghan so worth going back to have a look

    • BW: I have access to the folder. Haven’t looked at them myself. Should be pretty straightforward

    • LW: next step after that probably a refit + benchmark

  • QCA dataset statuses.

    • All done!

  • Fragment dataset curation code

    • BW: inserting RecapDecomposition at the start of the process

    • BW: previously loading molecules from ChEMBL, storing SMILES in database. Then run query for SMARTS of interest, doing fragmentation on the fly. However, Recap is very slow. 180 molecules took 15 min.

    • LW: last time we discussed a project using a different fragmentation algorithm – RecapDecomposition gives us larger molecules or fragments than we’d like. Something that fragments more on rotatable bonds may give us smaller fragments. Ideally what we’re after is something like the XtalPi used to generate their elementary and secondary fragments

    • BW: all my dataset processing code is in Rust. I can write a Python interface to the database.

    • LW: OpenFF will find it hard to maintain any code outside Python.

    • BW: it’s a little hard for me to wrap my head around preferring Python. When I was first looking at dataset curation, I couldn’t parse the SDF file with our toolkit. Using RDKit directly, processing the SDF file and converting everything to RDKit Molecules + sanitizing would take 36-48 hours. Whereas using C++ or Rust, it would have taken 8 minutes.

    • LW: let’s stick to Python for now.

    • LW: next steps – focus on fragmentation algorithms, take small subset of 1000 molecules or so and check which algorithm gives best results. May have to write our own.

  • Torsion shapes

    • LW: I put my code up here:

      • Sage 2.2 (currently)

      • Sage 2.2-TM (some split out)

      • Sage 2.2-ultraTM (everything split out)

      • Sage 2.2-ultraTM-reshaped (changed shapes)

    • BW: happy to look into torsion shapes

    • LW: I’ll finish putting up code on GitHub and let BW when I do so. I’ll also figure out a way to send analysed datasets to save on re-doing

Event



  • Smart casual – shirt

  • LW: might be asking you for particular images for slides for the talk

  • BW: things we can highlight:

    • progress on torsion multiplicity



Action items

Decisions