2020-03-19 Meeting notes

Date

19 Mar 2020

Participants

Goals

Discussion topics

Time	Item	Presenter
	update on physical property data	SB and OM
10 min?	Designing 2nd generation QM training set update	JM and HJ
		VL

Note Taking

(1) Simon;

Add link for simon’s slides here

Slide 3

DLM – The OH hydrogen radius parameter is generally zero in other FFs. We are willing to consider changing it, but just remember the history.

SB – We let in change from zero in Parsley, but it stayed very small

Slide 7

LPW – was there a way to deal with high torsional barrier in acids and esers, and its effect on thermodynamic properties?

SB – We might add enhanced sampling to overcome those barriers, but we don’t do it yet.

SB – Relative contributions to the objective function from different properties – I set it so that the initial contributions from all the different property types are about equal in magnitude.

Slide 8

SB – Excess molar volume is SO NOISY that it screws up optimizations. We should NOT fit to excess molar volume.

Summary

Should just fit against enthalpy of mix, binary mass density, maybe pure density. Should expand to more than just alcohols and esters.

CIB – Using only alcohols and esters may bias our decision here. I can think of three categories of liquids

Alcohols are a 1:1 mox of donors and acceptors, esters have no donor
Dimethyl and trimethyl amines would have different characteristics like this.
A liquid like dimethylamine is a mix of donors and acceptors, versus acetone which is purely acceptor. I wonder if our dataset is biased to using a more diverse mixed systems in terms of fraction/characteristics of donors and acceptors.
Might also try liquids with no dipole, like pyridizine/pyrizine (where there are Ns opposite each other in the ring). The pure liquid there doesn’t have a dipole. So the liquid properties reflect vdW and quadropole interactions. Then, mixing those with something like dimethylamine would investigate an interesting set of liquid interactions.

SB – Completely agree

JRW – Do we know how we’ll operationally do the refit for this generation?

LPW – Probably same as the last generation – valence → nonbond → valence cleanup

JRW – How about “The valence team sends Simon a reasonable-looking valence-optimized FF on April 1 using incomplete data, so he can start test runs. The final QM calculations will be done April 20, and the final valence refit FF will be sent to Simon on April 25.”

(General) – We’ll decide on this later

LPW – How did computational cost of these calculations break down?

SB – Pure properties were cheap. Mixtures were expensive, but all about the same as each other. Thankfully they could share some parameter sets

CIB – In objective function, do some properties systematically pull properties in a single direction? Like, does hvap always pull radii smaller?

SB – Good point. I’d like to look closer at the gradients of the properties. That would be really interesting.

CIB – The gradient approach worked well when I did BCCs.

(2) JM and HJ

Slide 2

DLM – Clarify – DBSCAN clustering based on fingerprint similarity?

JM – Yes

Slide 3

XL – How do you generate tautomeric states?

HJ – Fragmenter/CMILES functions

DLM – Can send code later

CIB – How will we find whether these results are representative/what are benchmarking plans?

LPW – We have plans for benchmarking, but they’re not totally operational yet.

LPW – Since a single central bond can host many unique torsion parameters, we want to ensure that our dataset doesn’t always put certain torsions together, since then they’ll be always be fit together, and it will be unclear which contributions come from which torsions

Fingerprint type (slide 7)

CIB – MACCS keys are old, and I’m not a fan of them. Now there’s ECFP, Tree, Path, the which are really good (with all defaults). LINGO is interesting, but not particularly good in my opinion. Regarding DBSCAN, I’ve had a good experience using H-DBSCAN, which only has “one knob”, so I’d recommend H-DBSCAN if DBSCAN is being difficult. It’s also in scikit-learn.

HJ – When I tried MACCS, the clustering was a lot less dependent on epsilon.

Slide 9

CIB – I’m really impressed by this. I interpret from this that there are real differences between tree and MACCS keys. I trust this sort of investigation more than my own generic intuition. Let talk offline in more depth.

LPW – Want to confirm that JM’s optimization datasets can also be submitted to QCA before the end of the month. The geometry optimizations

JM – Could have everything ready for submission by tomorrow.

Action items

Hyesu Jang and Christopher Bayly (and anyone else who's interested) will schedule a time to talk about fingerprint selection.
@victoria will present at friday’s ff-release call
Jessica Maat (Deactivated) will submit geometry optimizations by the end of the month.
Jeffrey Wagner will coordinate with MolSSI to see if we can get QCA jobs done by April 15
Hyesu Jang will send Simon Boothroyd a draft of the valence refit in the first week of April