2022-03-14 Chemical Perception meeting notes

Participants

  • @Trevor Gokey

  • @David Mobley

  • @Chapin Cavender

  • Caitlin Bannan

  • @Jeffrey Wagner

  • @Pavan Behara

Discussion topics

Item

Notes

Item

Notes

Fitting alkane torsions and introducing importance weighting

  • CB – (slide 4) frequency is number of times this parameter is applied across the training dataset?

    • TG – Yes. Choosing a coefficient C_w close to 0 (weight proportional to 1/f) makes the objective function into an average over parameters and makes this insensitive to what molecules make up the training dataset

  • TG – importance weighting identifies infrequent parameters that are far from optimal but don’t contribute much to the total objective

    • CB – I’m concerned about the choice for splitting these parameters. E.g. slide 10, this SMARTS will hit a methyl 3 times but also tetravalent carbons bonded to other heavy atoms

    • DM – (slide 10) Maybe the methyl rotor is getting over-emphasized because we’re looking at torsion deviation instead of RMSD? So, CB asked “is this worth fixing?” and if it’s just a methyl rotor then it may not be important, instead we could answer this more definitively by looking at situations with alkyl chains.

    • TG – It could be due to a lack of molecules in the dataset. I agree that, if this in only catching methyls, then it’s not as important.

    • DM – we could use this automated procedure to identify potential problems and then only fix the ones we think are important

  • CB – (slide 13) why do your SMARTS sometimes contain semicolons and sometimes not? Do you include ORs?

    • TG – Yes, I include ORs. These start from wild cards, so this is the easiest way to split things

    • CB – Sounds good. Just note that if your programmatically-generated SMILES can mix ANDs and ORs, they apply at different priorities, so you can get confusing outcomes.

  • JW – (slide 15) the first two operations add a new parameter and then delete it, but the objective drops by 1%. Why?

    • DM & TG – this is likely due to optimization of other parameters. Adding the new parameter allows FB to escape the local minimum.

  • CB – This format is great for internal presentations, but maybe be careful with this sort of details/technical table in external presentations.

  • TG – Also, during the optimizations, there was a torsion parameter (one that applies only to an angle where the central bond is in a 3-membered ring) that didn’t get split/combined, but its k value got set to 50 kcal/mol. Probably happened because the fit didn’t have a regularizer.

  • TG - not sure how to best fit torsion parameters. E.g. with a previous iteration of this project, simulations with these parameters don’t transition between conformations as much as I expected, so there’s some concern that barriers are too high

    • CB - you should fit on torsion scans and not just optimized geometries

    • TG - I want to minimize the number of torsion scans I have to do. Espaloma has shown that you can use the information from the optimization trajectory to fit barriers.

    • PB – I agree with trevor. With espaloma, they tried fit to optimization trajectories, but here it’d probably be better to fit to torsiondrives.

    • TG – I could do torsiondrives for every torsion I find.

    • DM – Could run bespokefit on the molecules in the set and then merge the QM results

    • TG – So, like, run torsiondrives for each split?

    • CB – Could check out existing torsiondrives to see if there’s already data for your molecules.

  • JW – The t17 split showed impressive improvement. Is it enough to make sage-2.1.0?

    • TG – I’ll put myself on the ff-release call agenda for Thursday and present on this

  •  

 

 

Action items

Decisions