2024-04-23 BW/LW Check-in

Participants

@Brent Westbrook
@Lily Wang

Goals

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
General updates and discussion on projects		Torsion MSM outcomes BW is getting complex numbers and high k values BW pasted script in Slack: Probably not worth spending too much extra on this BW: the scaling difference is what worries me most, it’s probably not a unit issue since it’s not constant, which also means coming with some kind of scaling modifier wouldn’t be straightforward Scripts + necessary files for searchability BW: one possibility is espaloma Torsion splitting in 2.2 BW: split the torsions and set up the valence force field for it BW: ended up with ~38 new parameters LW: next step: check that everything has data Training and benchmark LW: there are some torsions in Sage that don’t have commas but could cover multiple multiplicities, which weren’t picked up earlier by Meghan so worth going back to have a look BW: I have access to the folder. Haven’t looked at them myself. Should be pretty straightforward LW: next step after that probably a refit + benchmark QCA dataset statuses. All done! Fragment dataset curation code BW: inserting RecapDecomposition at the start of the process BW: previously loading molecules from ChEMBL, storing SMILES in database. Then run query for SMARTS of interest, doing fragmentation on the fly. However, Recap is very slow. 180 molecules took 15 min. LW: last time we discussed a project using a different fragmentation algorithm – RecapDecomposition gives us larger molecules or fragments than we’d like. Something that fragments more on rotatable bonds may give us smaller fragments. Ideally what we’re after is something like the XtalPi used to generate their elementary and secondary fragments BW: all my dataset processing code is in Rust. I can write a Python interface to the database. LW: OpenFF will find it hard to maintain any code outside Python. BW: it’s a little hard for me to wrap my head around preferring Python. When I was first looking at dataset curation, I couldn’t parse the SDF file with our toolkit. Using RDKit directly, processing the SDF file and converting everything to RDKit Molecules + sanitizing would take 36-48 hours. Whereas using C++ or Rust, it would have taken 8 minutes. LW: let’s stick to Python for now. LW: next steps – focus on fragmentation algorithms, take small subset of 1000 molecules or so and check which algorithm gives best results. May have to write our own. Torsion shapes LW: I put my code up here: Sage 2.2 (currently) Sage 2.2-TM (some split out) Sage 2.2-ultraTM (everything split out) Sage 2.2-ultraTM-reshaped (changed shapes) BW: happy to look into torsion shapes LW: I’ll finish putting up code on GitHub and let BW when I do so. I’ll also figure out a way to send analysed datasets to save on re-doing
Event		Smart casual – shirt LW: might be asking you for particular images for slides for the talk BW: things we can highlight: progress on torsion multiplicity

Meetings

2024-04-23 BW/LW Check-in

Participants

Goals

Discussion topics

Action items

Decisions