JW – Added AM1 rearrangement handlign to infrastructure roadmap - I know this is on CD’s backburner, I’ll probably get to it myself in a while.
CD – Any update on possible project handling post-translational modificaitons?
JW – We’re building out the infrastructure that will be required to start looking into that. I’ll know more in a few weeks.
JW – I can’t make this meeting time next week
CD – Let’s make this every-other-weekly
JW – Agree, I will have schedule conflicts every other week at this time.
JW updated the event to be every-other-weekly
Todos from last time
Connor will see if there’s a way to expand RDKit conformer generation energy window to allow better candidate ELF conformers to be generated.
CD – in 100% of cases, RDKIt and OE select the same conformer as being ELF1 from an OE-generated conformer ensemble
CD – in ~95% of cases, RDKIt and OE select the same conformer as being ELF from an RDKit-generated conformer ensemble
(General) – This is probably due to our RDKit workflow using RMS threshold=0 and making nearly-identical conformers that differ at machine precision
CD – (Showing notebook)
CD – The only thing you have to do to get RDKit to generate way more conformers is to set the min RMSD to 0, and use max_conformers to control the number of outputs instead.
JW – Do we know whether these output conformers are substantially distinct?
CD – I suspect they are but haven’t verified. What you could do is generate more conformers than you want, and then prune them by diversity.
LW – I know that RDKit generates a alrge umber of conformers internally and then prunes them, possibly using the provided RMSD, util the desired number of conformers is achieved.
LW – Can we set an energy window in RDKit?
JW – I think we’d looked into this a little last time, and I don’t think we found a way to set this.
LW – I have some code that prunes conformers by an energy window that I specify (calculating energies using MMFF)
CD – Shows plot of ELF1 comparison now that RDKit is generating way more conformers
JW – I look at the bottom plot in terms of “should we throw out the current random conformer selection and implement ELF1 immediately?”, and the data is making me think “no”
Re: plotting for information density: seaborn jointplot can make a scatter + histograms from a pandas dataframe
LW – What’s up with the extreme outliers in the ELF1 plot? Eg, one atom was given 0.3 in OE but -0.6 in RDKit.
JW – Maybe a chemical perception difference? I remember that there were certain points on the “same conformer computer with no geometry optimization in both engines” where most of the results were identical, but some molecules had slightly different outputs.
CD – I don’t think that explains it.
Connor will evaluate whether OE vs. antechamber charge differences correlate with RMSD
CD – I plotted the charge RMSE against the geometric RMSD
CD – I’m surprised that charge RMSE doesn’t correlate with geometric RMSD.
LW – Which RMSD method?
CD – RDKit.AllChem.GetConformerRMS with all default settings.
LW – Should use GetBestRMS instead, since that will check over symmetries
LW – Symmetrization problem? Analagously to how we do symeetry-aware RMSD, should we do symmetry-aware partial charge RMSE?
JW – Charge engines should already be symmetry-aware, so there shouldn’t be a need for multiple atom mappings. But some settings will disable this (liek openeyes nosymstopt)
LW – Are WBOs symmetrized by charge engines?
JW – I don’t know, but I doubt it
Next steps
Why aren’t ELF1 charges agreeing?
Different conformer generation settings – See below
Are we doing valid RMSD calculations?
Use GetBestRMS to ensure that the geometric comparisons are valid
How do we get RDKit to make the same conformers as OE?
EmbedMultipleConfs' useRandomCoords kwarg?
Breadcrumbs:
Also maybe look into difference between pruneRmsThresh =0 vs. -1
CD – How many conformers to generate?
LW – OE defaults to 400-500 for ELF10 calcs
JW – I’d say “as many as you can generate in 2-3 secs on CD’s computer for the average molecule in this dataset”
CD – 20 conformers takes several seconds on my machine
JW – Maybe it’d be better to say “conf gen could take up to 10-20% of the total charge assignment process”, which would include the AM1 optimization.
CD – This could be un to try and figure out
LW – Right now as we’re doing the research on this topic, we shouldn’t prematurely constrain the number of conformers – We’re hoping to find the “ideal behavior” given infinite compute time, and then later on we’ll reduce the number of conformers to find a reasonable approximation for this ideal answer
JW – Agree.
CD – So, let’s say 400 for now
(General) – Agree
Find a way to correct for heavy vs all atom RMSD settings (OE defaults to all, rdkit defaults to heavy)