Fitting alkane torsions and introducing importance weighting | CB – (slide 4) frequency is number of times this parameter is applied across the training dataset? TG – importance weighting identifies infrequent parameters that are far from optimal but don’t contribute much to the total objective CB – I’m concerned about the choice for splitting these parameters. E.g. slide 10, this SMARTS will hit a methyl 3 times but also tetravalent carbons bonded to other heavy atoms DM – (slide 10) Maybe the methyl rotor is getting over-emphasized because we’re looking at torsion deviation instead of RMSD? So, CB asked “is this worth fixing?” and if it’s just a methyl rotor then it may not be important, instead we could answer this more definitively by looking at situations with alkyl chains. TG – It could be due to a lack of molecules in the dataset. I agree that, if this in only catching methyls, then it’s not as important. DM – we could use this automated procedure to identify potential problems and then only fix the ones we think are important
CB – (slide 13) why do your SMARTS sometimes contain semicolons and sometimes not? Do you include ORs? TG – Yes, I include ORs. These start from wild cards, so this is the easiest way to split things CB – Sounds good. Just note that if your programmatically-generated SMILES can mix ANDs and ORs, they apply at different priorities, so you can get confusing outcomes.
JW – (slide 15) the first two operations add a new parameter and then delete it, but the objective drops by 1%. Why? CB – This format is great for internal presentations, but maybe be careful with this sort of details/technical table in external presentations. TG – Also, during the optimizations, there was a torsion parameter (one that applies only to an angle where the central bond is in a 3-membered ring) that didn’t get split/combined, but its k value got set to 50 kcal/mol. Probably happened because the fit didn’t have a regularizer. TG - not sure how to best fit torsion parameters. E.g. with a previous iteration of this project, simulations with these parameters don’t transition between conformations as much as I expected, so there’s some concern that barriers are too high CB - you should fit on torsion scans and not just optimized geometries TG - I want to minimize the number of torsion scans I have to do. Espaloma has shown that you can use the information from the optimization trajectory to fit barriers. PB – I agree with trevor. With espaloma, they tried fit to optimization trajectories, but here it’d probably be better to fit to torsiondrives. TG – I could do torsiondrives for every torsion I find. DM – Could run bespokefit on the molecules in the set and then merge the QM results TG – So, like, run torsiondrives for each split? CB – Could check out existing torsiondrives to see if there’s already data for your molecules.
JW – The t17 split showed impressive improvement. Is it enough to make sage-2.1.0?
|