pros/cons of atoms-in-molecule charges, and preliminary data fitting to charges/dipoles | DC will post slides here (slide 1) DM – How far do you expand multipole series ? CA – Quadrupole, after that the error decreases a lot JR – Antony stone and sally price go out to hexadecapole, but most people have settled on quadrupole. AS and SP want to predict xtal structs. We need to make sure that we’re in a real perturbation theory regime. JR – I’m always been concerned about hbonds. When we do implicit solvent, we don’t account for those. I wonder if an epsilon of 80 is really going to catch that. DC – We have a publication with miguel (york?) about that, so I’m kinda going against our own conclusions here.
Chat CC – I have an unstable connection, so asking in chat. Does reconstruction of ESPs from multipoles work for charged molecules? Dipole depends on the choice of origin for nonzero charge. DM – Yeah, the other answer to the “what about HBonds” question (or see also the old asymmetry work I had done in Dill lab relating to asymmetric solvation response of water to charges of the same magnitude but opposite sign) would be something like the IPolQ work of Cerutti, Swope and Rice, which I think is what Julia alluded to when she said there’s another approach we could talk about later. CA – Hi Chapin, from our tests it does. The properties origin is kept consistent always set at 0,0,0 (as per the psi4 defaults). DM – In general I felt IPolQ or similar was going to be hard to do for drug discovery applications because of the difficulty involved in parameterization, but for something like NAGL it could make more sense because you only have to go through the parameterization pain when making the training set, not every time you want to parameterize something. DM – BUT … I also think what Danny is suggesting is worth exploring. I would want to see whether/if an IPolQ-type model beats the “simple implicit QM” approach.
LM – (slide 2) – Is this data published? This is super helpful. PB – Is riniker model DASH? JR – (slides w/ molecules) same confs? LM – Why didn’t you use diffuse functions in your basis sets? LM – I’ve seen a few ways to do impl solvent. One is RESP2. SRiniker uses epsilon as 4 or 5 as an intermediate. Do you have experience on which is better? JR – Which program are you using to run these calcs? DC – Psi4 JR – I’m a little surprised that you can’t run with diffuse function. Could you send me one of the failing runs? DC – Yes, will do CA – One thing that kind of confirmed out suspicion was to scale the cavity areauntil it works. But we don’t know what we’re doing with those numbers. JR – Yeah, there has been a lot of research on the sensitivity of the cavity values. … DM – Yeah, to one extent you have to evaluate the sensitivity of impl pol models. So it’s appealing that they’re simple and cheap, but if we’re looking at generating a training set, we aren’t super constrained by “cheap”. So maybe something more ipolq-like could make sense here.
PB (chat) – Just throwing it out there, it would be better to check Bill's charge sloshing example with the molecule BRI-00266 from industry BM set, where ridiculously higher dipole moments were observed when compared to QM BS – How well does this work for vsites? How do you decide where the sites go? DC – When we were doing stuff with QUBEKit, we’d have fit vsite position and charge. But openFF at the moment is fitting to ESPs? DM – LW would know, I think it’s ESPs and possibly multipoles. DC – At this level, you’d construct the … BS – Can you construct charge density from the wavefunction and construct the vsites from that? DC – I guess AIM methods do give a density, though we’re not storing that at the moment.
BS – AMOEBA does something like this, could you comment on how this is different from that? JH (chat) – An issue for the implicit solvent https://github.com/psi4/psi4/issues/3135 JH – We found that it can converge sometimes but give very incorrect answers LM – Thanks so much, that was super helpful!
DC – I don’t think we should do a new big fit on HF, I think we can refit on what we’ve got in the coming weeks JR – How many atoms in the dataset reflect hypervalency? Eg S and P? DC – Our training set is 50k mols. JH – There’s S and P in there, but they’re the least represented. But we’re planning to expand the dataset as we find gaps. JR – And could do an analysis of how well represented they are
|
updating on some smee data | DC – Thought is to train bonded parameters directly to energies and forces, without stuff like torsion scans. Will let us use larger, more diverse SPICE-style datasets for training JH will post slides here DM (chat) –yeah if we can do full fits in a day that’s great! As long as we can get them to be good enough, which is the key question. :) DM (chat) – I would also want to look at performance on some of the fragment sets we have, like Roche; it’s possible industry benchmark is not really representative in some ways (especially for the larger molecules in the set). DM (chat) – (Biaryl set is gonna be hard because these are partly a test of typing (and handling of conjugation). DC – BW, does anything spring to mind that you’re doing in smee900 that we’re not doing? And how expensive are the smee900 fits? BW – It took a long time for me - 12 days? - and main limiting factor was memory. LW mentioned you may have improved memory use. I also started from sage 2.1 but I don’t expect that accounts for much difference. The main thing I was going for with the training run was reproducing SB’s data, especially using OE instead of AT. DC – Are you training to force as well? BW – Yes. And you’d mentioned needing to fix a sign error in smee, and I didn’t use the fixed version. DC – Right, there was a problem with the sign with force being the negative gradient of energy.
LM – Did you compare geometries to those in SPICE? The energies look good on SPICE but the RMSDs are bad. I wonder if the JH – We fit to geoms in SPICE, but SPICE doesn’t have opt geos. That’s kinda what we’re doing with torsiondrive benchmark, and we have opt geos for that but we’re still underperforming relative to sage. I don’t think there’s a serious theory difference between SPICE and the other training sets.
JR – Where do the SPICE confs come from? JH – High temp MD, optimized to local minima. JR – So gradients aren’t all 0? JH – Right, there are some high temp geoms with large gradient too.
CC (chat) – I wonder if allowing 4 periodicity terms for each torsion makes the optimization problem too hard. This makes sense for bespokefit because you don't care about transferability. But for training transferable parameters, fewer torsion parameters (i.e. only the terms in Sage 2.X) might be better. DC – It’s a good point, but same re: smee900 JH – I think smee900 started from sage 2.1, so periodicities aren’t expanded. I was hoping the data would tell us whether we’re missing any important periodicities.
CC (chat) – Can smee do LASSO regression / L1 regularization? You could do a two-stage fit. First stage give the fit all of the torsion terms and use L1. Then drop the terms whose coefficients go to zero and do a second fit with L2. DC – Recommend a deep dive into differences between this run and smee900. We can have further meetings on that.
|