2024-08-12 Chemical Perception meeting notes

Participants

  • @Brent Westbrook

  • @David Mobley

  • @Alexandra McIsaac

  • @Chapin Cavender

  • @Lily Wang

  • @Trevor Gokey

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

 

 

  • Slides will be uploaded

  • TG: examples in besmarts are currently recently broken, but should be fixed very soon (hours after this meeting)

  • TG: number of bonds and angles is higher than Sage, but comparable level of magnitude

  • TG: with torsions – splitting on periodicities finished, but finishing on k values did not. Got to ~4k torsions. Reduced chi values indicate we could be overfitting data.

    • TG: split n finished with 400 torsions, split k split them further. 4k in split k.

    • LW: so Sage has fewer torsions but is more overfit than Split n?

    • TG: gives possible explanation ~5 min into recording. It may be because the periodicities of Sage are re-set and a threshold was set at 5 (for k?), and many torsions of the Sage set may just go to 5.

    • TG: BES is Split n, BESv2 is Split k set.

  • DM: are you training and testing on the same data, or is it transferrable?

    • TG: I am fitting to Gen 2 and looking at performance on that as well. I may look at splitting up the Gen2 dataset to do some cross validation

    • DM: If training and testing to the same set, may wind up overfitting without seeing it

    • DM: could fit to hessians and test on other benchmarks

    • TG: some issues, Industry benchmark set has more sulfonamides than training set for example.

    • LW: XFF 20% dataset has reasonable coverage of ChEMBL

    • TG: MSM requires Hessians so we should still be generating them

  • TG: on fitting protocol – what do people think of replacing torsion drives with ab initio targets

    • LW: …

    • AMI: my experiments involved replacing all targets with AI targets, saw worse performance, but I didn’t use torsiondrive data.

    • BW: I ran some experiments with smee, one that broadly repeated AMI’s experiment (without TD data), one with. Saw generally worse performance but improvements with TD

    • CC: My experience is that using Ab Initio targets instead of TD gave worse results. I tried switching to a pairwise energy target and that appeared to give improved performance.

    • TG: what was the relative weight compared to other objectives?

    • CC: … (~48 min into recording)

    • TG: sounds like I could get away with single points?

 

 

 

Action items

Decisions