2021-10-14 AM1 Studies Meeting notes

Date

Oct 14, 2021

Participants

  • @Jeffrey Wagner

  • @Connor Davel

  • @Lily Wang

Discussion topics

Item

Notes

Item

Notes

General updates

  • JW – Added AM1 rearrangement handlign to infrastructure roadmap - I know this is on CD’s backburner, I’ll probably get to it myself in a while.

  • CD – Any update on possible project handling post-translational modificaitons?

    • JW – We’re building out the infrastructure that will be required to start looking into that. I’ll know more in a few weeks.

  • JW – I can’t make this meeting time next week

    • CD – Let’s make this every-other-weekly

    • JW – Agree, I will have schedule conflicts every other week at this time.

    • JW updated the event to be every-other-weekly

Todos from last time

  • Connor will see if there’s a way to expand RDKit conformer generation energy window to allow better candidate ELF conformers to be generated.

  • CD – in 100% of cases, RDKIt and OE select the same conformer as being ELF1 from an OE-generated conformer ensemble

  • CD – in ~95% of cases, RDKIt and OE select the same conformer as being ELF from an RDKit-generated conformer ensemble

    • (General) – This is probably due to our RDKit workflow using RMS threshold=0 and making nearly-identical conformers that differ at machine precision

  • CD – (Showing notebook)

    • CD – The only thing you have to do to get RDKit to generate way more conformers is to set the min RMSD to 0, and use max_conformers to control the number of outputs instead.

    • JW – Do we know whether these output conformers are substantially distinct?

    • CD – I suspect they are but haven’t verified. What you could do is generate more conformers than you want, and then prune them by diversity.

    • LW – I know that RDKit generates a alrge umber of conformers internally and then prunes them, possibly using the provided RMSD, util the desired number of conformers is achieved.

    • LW – Can we set an energy window in RDKit?

      • JW – I think we’d looked into this a little last time, and I don’t think we found a way to set this.

      • LW – I have some code that prunes conformers by an energy window that I specify (calculating energies using MMFF)

    • CD – Shows plot of ELF1 comparison now that RDKit is generating way more conformers

  • JW – I look at the bottom plot in terms of “should we throw out the current random conformer selection and implement ELF1 immediately?”, and the data is making me think “no”

  • Re: plotting for information density: seaborn jointplot can make a scatter + histograms from a pandas dataframe

  • Connor will evaluate whether OE vs. antechamber charge differences correlate with RMSD

    • CD – I plotted the charge RMSE against the geometric RMSD

  • CD – I’m surprised that charge RMSE doesn’t correlate with geometric RMSD.

    • LW – Which RMSD method?

      • CD – RDKit.AllChem.GetConformerRMS with all default settings.

      • LW – Should use GetBestRMS instead, since that will check over symmetries

    •  

  • LW – Symmetrization problem? Analagously to how we do symeetry-aware RMSD, should we do symmetry-aware partial charge RMSE?

    • JW – Charge engines should already be symmetry-aware, so there shouldn’t be a need for multiple atom mappings. But some settings will disable this (liek openeyes nosymstopt)

    • LW – Are WBOs symmetrized by charge engines?

      • JW – I don’t know, but I doubt it

Next steps

  • Why aren’t ELF1 charges agreeing?

    • Different conformer generation settings – See below

  • Are we doing valid RMSD calculations?

    • Use GetBestRMS to ensure that the geometric comparisons are valid

  • How do we get RDKit to make the same conformers as OE?

    • EmbedMultipleConfs' useRandomCoords kwarg?

      • Breadcrumbs:

      • Also maybe look into difference between pruneRmsThresh =0 vs. -1

      • CD – How many conformers to generate?

        • LW – OE defaults to 400-500 for ELF10 calcs

        • JW – I’d say “as many as you can generate in 2-3 secs on CD’s computer for the average molecule in this dataset”

        • CD – 20 conformers takes several seconds on my machine

        • JW – Maybe it’d be better to say “conf gen could take up to 10-20% of the total charge assignment process”, which would include the AM1 optimization.

        • CD – This could be un to try and figure out

        • LW – Right now as we’re doing the research on this topic, we shouldn’t prematurely constrain the number of conformers – We’re hoping to find the “ideal behavior” given infinite compute time, and then later on we’ll reduce the number of conformers to find a reasonable approximation for this ideal answer

          • JW – Agree.

        • CD – So, let’s say 400 for now

          • (General) – Agree

      •  

    • Find a way to correct for heavy vs all atom RMSD settings (OE defaults to all, rdkit defaults to heavy)

Action items

Decisions