2022-03-14 Chemical Perception meeting notes

Participants

@Trevor Gokey
@David Mobley
@Chapin Cavender
Caitlin Bannan
@Jeffrey Wagner
@Pavan Behara

Discussion topics

Item	Notes

Item

Notes

Fitting alkane torsions and introducing importance weighting

CB – (slide 4) frequency is number of times this parameter is applied across the training dataset?
- TG – Yes. Choosing a coefficient C_w close to 0 (weight proportional to 1/f) makes the objective function into an average over parameters and makes this insensitive to what molecules make up the training dataset
TG – importance weighting identifies infrequent parameters that are far from optimal but don’t contribute much to the total objective
- CB – I’m concerned about the choice for splitting these parameters. E.g. slide 10, this SMARTS will hit a methyl 3 times but also tetravalent carbons bonded to other heavy atoms
- DM – (slide 10) Maybe the methyl rotor is getting over-emphasized because we’re looking at torsion deviation instead of RMSD? So, CB asked “is this worth fixing?” and if it’s just a methyl rotor then it may not be important, instead we could answer this more definitively by looking at situations with alkyl chains.
- TG – It could be due to a lack of molecules in the dataset. I agree that, if this in only catching methyls, then it’s not as important.
- DM – we could use this automated procedure to identify potential problems and then only fix the ones we think are important
CB – (slide 13) why do your SMARTS sometimes contain semicolons and sometimes not? Do you include ORs?
- TG – Yes, I include ORs. These start from wild cards, so this is the easiest way to split things
- CB – Sounds good. Just note that if your programmatically-generated SMILES can mix ANDs and ORs, they apply at different priorities, so you can get confusing outcomes.
JW – (slide 15) the first two operations add a new parameter and then delete it, but the objective drops by 1%. Why?
- DM & TG – this is likely due to optimization of other parameters. Adding the new parameter allows FB to escape the local minimum.
CB – This format is great for internal presentations, but maybe be careful with this sort of details/technical table in external presentations.
TG – Also, during the optimizations, there was a torsion parameter (one that applies only to an angle where the central bond is in a 3-membered ring) that didn’t get split/combined, but its k value got set to 50 kcal/mol. Probably happened because the fit didn’t have a regularizer.
TG - not sure how to best fit torsion parameters. E.g. with a previous iteration of this project, simulations with these parameters don’t transition between conformations as much as I expected, so there’s some concern that barriers are too high
- CB - you should fit on torsion scans and not just optimized geometries
- TG - I want to minimize the number of torsion scans I have to do. Espaloma has shown that you can use the information from the optimization trajectory to fit barriers.
- PB – I agree with trevor. With espaloma, they tried fit to optimization trajectories, but here it’d probably be better to fit to torsiondrives.
- TG – I could do torsiondrives for every torsion I find.
- DM – Could run bespokefit on the molecules in the set and then merge the QM results
- TG – So, like, run torsiondrives for each split?
- CB – Could check out existing torsiondrives to see if there’s already data for your molecules.
JW – The t17 split showed impressive improvement. Is it enough to make sage-2.1.0?
- TG – I’ll put myself on the ff-release call agenda for Thursday and present on this

Meetings

2022-03-14 Chemical Perception meeting notes

Participants

Discussion topics

Action items

Decisions

Related content