Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
stylenone

Literature background

Several groups have benchmarked the best level of QM theory for calculating electrostatic properties. Electron densities have also been benchmarked using the aug-cc-PVTZ basis, finding TPSSh and PBE0 performing well compared to CCSD/aug-cc-PVTZ.[22] Dipole moments have been benchmarked against CCSD(T) at the CBS limit, finding that most hybrid GGAs perform very well, with SOGGA11-X, PBE0, and wB97X-V all performing nearly equivalently and similarly to the hybrid mGGA PW6B95, and wB97X-D and wb97X-D3 performing only slightly worse, all using the aug-pc-4 basis[24]. ESPs have been benchmarked against the highly-accurate-for-dipoles[24] DSD-PBEP86-D3BJ/aug-cc-pV(Q + d)Z double-hybrid GGA and found that PBE0, B3LYP, and PW6B95 all perform similarly across a variety of basis sets, with the latter performing slightly better.[13] Regarding basis set, aug-cc-pV(T+d)Z performed almost as well as the much larger aug-cc-pV(Q + d)Z for a fraction of the cost.[13] It was noted that tight d functions (denoted by +d in the Dunning basis sets and a second P in the def2 basis sets second P in the def2 basis sets) are necessary for accurate electrostatics of S-containing molecules[13]; the aug-pc-3 and aug-pc-4 bases also perform well for S[25], but are much larger and not available in Psi4.

Charges used for large molecules and proteins are more sensitive to the charge method than small molecules[4,7,8,14]. This is hypothesized to be because there tend to be more buried atoms. RESP (with HF/6-31G*) achieves good accuracy for ligand simulations due to the overpolarization from the low level of theory, however, all the charges are equally over-polarized. In a real system, only the solvent-accessible atoms are sensitive to the presence of solvent, and the buried atoms should not typically be very polarized. This effect seems to be better captured by using a high level of QM theory combined with a solvent model with a low but non-zero dielectric, or an average of charges solvated with water and in gas phase. Additionally, it has been shown that HF/6-31G* does not uniformly overpolarize charges, and that its performance can be particularly unreliable for hydrocarbons and sulfur or phosphorous-containing molecules; B3LYP/aug-ccPVTZ in implicit benzene was shown to be much more reliable.[23] Presumably, the larger the molecule, the more this non-uniformity would affect the result as well.

Because we anticipate using NAGL2 on both ligands and proteins/large molecules, I believe it will be worth the computational cost to use a higher level of theory. Based on prior benchmarking, and considering both cost and how commonly used each method is, I think that wB97X-V, PBE0, B3LYP, or wB97X-D would all be good choices of functionals. PW6B95 could also be a good choice, though I would expect it to be more expensive and not provide much extra accuracy. For basis sets, both the aug-cc-pV(T+d)Z or def2-TZVPPD basis sets should work well. aug-cc-pV(T+d)Z has been benchmarked explicitly for these properties and particularly performs reasonably for S-containing molecules, however the correpsonding pseudopotentials would be needed to treat iodine. def2-TZVPPD is nearly equivalent and its effective core potentials have been extensively benchmarked, so I would overall recommend def2-TZVPPD. I would also suggest using implicit solvent. Using a dielectric around 4 or averaging vacuum and implicit solvent with a dielectric of 78 both seem to yield good results, without venturing into the complicated world of explicit solvation and iterative calculations.

Conclusion

Charge methods

[Note this is not a complete literature search, just a survey of some commonly used methods and their pros/cons]

RESP[16] is a partial charge assignment method that is based on the Merz-Kollman charge method, designed to assign charges that best represent the QM ESP of a molecule. Unlike the Merz-Kollman method, RESP restrains charges to stay close to 0, with the intention of preventing spuriously large charges on “buried” atoms. However, these buried atoms can still be assigned unphysical charges, due to the lack of information about the chemical environment provided by the ESP. Typically, for force field applications, RESP charges are calculated in gas phase at the HF/6-31G* level of theory. This low level of theory is chosen intentionally, as it seems to over-polarize gas phase charges by an amount that mimics solvation, leading to charges that can be appropriate to use in condensed phase simulations [16]. However, RESP charges can be computed at any level of theory, and in QM benchmarking/method development literature, they usually are not calculated at the HF/6-31G* level [6,7,18,19].

AM1-BCC (what we currently use) is a partial charge method based on charges from semi-empirical AM1 calculations, with empirical bond charge correction (BCC) terms fit to reproduce RESP charges with the HF/6-31G* level of theory. However, the BCCs are also fit to solvation free energies, leading to occasional large differences compared to RESP, including better performance at predicting the hydration free energies of tertiary amines and nitrates [14]. Both RESP and AM1-BCC charges are highly conformer-dependent[6], which is a challenge for force field applications. The ELF10 conformer-averaging method can be used to partially mitigate this. AM1-BCC charges are used frequently in force field applications, as they are much cheaper to compute than RESP charges and provide reasonable results.

Minimal Basis Iterative Stockholder (MBIS) charges have recently gained a lot of attention as a good alternative to RESP charges [6,7,8,9]. They are much less conformer-dependent than RESP charges, and typically perform well at reproducing QM ESPs, while performing better than RESP on other electrostatic properties[6,7]. They also typically perform better than RESP for buried atoms, though the spherical pro-atom densities don’t treat lone pairs well without the addition of virtual sites [personal communication with Danny Cole].

 Solvation

If one moves away from the HF/6-31G* level of theory, charges must be calculated considering solvation effects. For nonpolarizable force fields, it is desirable to use a solvent with a dielectric constant lower than that of water, as the cost of polarizing the molecule is neglected by the force field, leading to over-polarization[13]. Typically this will be accounted for by averaging charges calculated in water with gas-phase charges[4,13,14], or by selecting a solvent with a dielectric constant that is lower than that of water[15,22], . Solvated charges can be calculated using either explicit[1,2,3,4,9] or implicit solvent[14,15,22]. However, there can be inconsistencies when using solvated charges but fitting to gas-phase torsion and other data[4,5,15].

 Non-bonded terms

In general, when changing the charge assignment method for a force field, the nonbonded terms must also be re-fit in order to assess the accuracy, as the nonbonded terms are trained based on a specific charge model and are not necessarily appropriate for a new model.[9,13]

Performance of different charge models in simulation 

In protein-ligand simulations, typically the ligand will be assigned charges on-the-fly using AM1-BCC or another charge model. There have been several papers comparing the difference in performance when using RESP vs AM1-BCC charges for ligands, with mixed results that do not strongly support using one method over the other[14,17]. Additionally, results for methods with higher levels of theory such as RESP2[13] or IPolQ-Mod[14]  also were mixed and did not show substantial improvement with the higher level of theory. However, one study using MBIS charges (with B3LYP/def2-TZVP) and re-trained LJ parameters for only the ligand showed improvement over GAFF.[9]

Charges used for large molecules and proteins are more sensitive to the charge method[4,7,8,14]. This is hypothesized to be because there tend to be more buried atoms. RESP (with HF/6-31G*) achieves good accuracy for ligand simulations due to the overpolarization from the low level of theory, however, all the charges are equally over-polarized. In a real system, only the solvent-accessible atoms are sensitive to the presence of solvent, and the buried atoms should not typically be very polarized. This effect seems to be better captured by using a high level of QM theory combined with a solvent model with a low but non-zero dielectric, or an average of charges solvated with water and in gas phase. Additionally, it has been shown that HF/6-31G* does not uniformly overpolarize charges, and that its performance can be particularly unreliable for hydrocarbons and sulfur or phosphorous-containing molecules, with B3LYP/aug-ccPVTZ in implicit benzene was shown to be much more reliable.[23] Presumably, the larger the molecule, the more this non-uniformity would affect the result.

QM theory for electrostatic properties

Several groups have benchmarked the best level of QM theory for calculating electrostatic properties. Electron densities have been benchmarked using the aug-cc-PVTZ basis, finding TPSSh and PBE0 performing well compared to CCSD/aug-cc-PVTZ.[22] Dipole moments have been benchmarked against CCSD(T) at the complete basis set limit, finding that most hybrid GGAs perform very well, with SOGGA11-X, PBE0, and wB97X-V all performing nearly equivalently and similarly to the hybrid mGGA PW6B95, and wB97X-D and wb97X-D3 performing only slightly worse, all using the aug-pc-4 basis[24]. The second spatial cumulant of the electron density, which is related to the quadrupole and measures the spatial extent of the electron density, has also been benchmarked against CCSD(T) at the complete basis set limit, finding that SCAN0, B97-2, TPSSh, and PBE0 all performed equivalently, with wB97X-V and wB97X-D performing slightly worse but still well, again with the aug-pc-4 basis.[26] ESPs have been benchmarked against the highly-accurate-for-dipoles[24] DSD-PBEP86-D3BJ/aug-cc-pV(Q + d)Z double-hybrid GGA and found that PBE0, B3LYP, and PW6B95 all perform similarly across a variety of basis sets, with the latter performing slightly better.[13]

Regarding basis set, aug-cc-pV(T+d)Z performed almost as well as the much larger aug-cc-pV(Q + d)Z for a fraction of the cost.[13] It was noted that tight d functions (denoted by +d in the Dunning basis sets and a second P in the def2 basis sets second P in the def2 basis sets) are necessary for accurate electrostatics of S-containing molecules[13]; the aug-pc-3 and aug-pc-4 bases also perform well for S[25], but are much larger and not available in Psi4.

Discussion

Because we anticipate using NAGL2 on both ligands and proteins/large molecules, I believe it will be worth the computational cost to use a higher level of theory. Based on prior benchmarking, and considering both cost and how commonly used each method is, I think that wB97X-V, PBE0, B3LYP, or wB97X-D would all be good choices of functionals that balance accuracy and speed. For basis sets, both the aug-cc-pV(T+d)Z or def2-TZVPPD basis sets should work well. aug-cc-pV(T+d)Z has been benchmarked explicitly for these properties and particularly performs reasonably for S-containing molecules, however the correpsonding pseudopotentials would be needed to treat iodine. def2-TZVPPD is nearly equivalent and its effective core potentials have been extensively benchmarked, so I would overall recommend def2-TZVPPD. I would also suggest using implicit solvent. Using a dielectric around 4 or averaging vacuum and implicit solvent with a dielectric of 78 both seem to yield good results, without venturing into the complicated world of explicit solvation and iterative calculations.

Decision

We will do two sets of calculations, one in vacuum and one in PCM with epsilon = 78, and average the two following a RESP2-style approach. This was chosen for maximum flexibility, given that it provides similar results to PCM with epsilon = 4.

Initially we will sidestep the issue of charge methodology, opting to train only on QM observables like ESPs, dipoles, and quadrupoles. However, as a backup, we will calculate MBIS charges, and save the wavefunctions to allow calculation of RESP charges.

Based on these literature results, we selected several possible levels of theory: wb97X-V/def2-TZVPPD, PBE0/def2-TZVPPD, or wb97X-D/def2-TZVPPD. wb97X-V was eliminated as analytical gradients are not available in Psi4, and thus post-SCF analysis like polarizabilities cannot be conducted. PBE0/def2-TZVPPD was eliminated due to errors with Psi4’s DDX PCM code with I-containing molecules. wB97X-D/def2-TZVPPD was eliminated due to errors with Psi4’s DDX PCM code with sulfur- and phosphorus-containing functional groups. Ultimately, we will do geometry optimizations at the “default” B3LYP-D3BJ/DZVP level of theory, and single points with saved wavefunctions at the wB97X-D/def2-TZVPP level of theory.

TestingTesting

All of the analysis described below can be found in this Jupyter notebook.

PCM errors with diffuse functions

...

On the whole, the charge deviation due to solvent is much larger (~5x) than the deviation due to diffuse functions for non-anions, and the deviations are slightly larger (~2x) for anions.

References

...

(1) MacKerell, A. D. Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L. Jr.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102 (18), 3586–3616. https://doi.org/10.1021/jp973084f .

...

(6) Verstraelen, T.; Pauwels, E.; De Proft, F.; Van Speybroeck, V.; Geerlings, P.; Waroquier, M. Assessment of Atomic Charge Models for Gas-Phase Computations on Polypeptides. J. Chem. Theory Comput. 2012, 8 (2), 661–676. https://doi.org/10.1021/ct200512e .

(7) Verstraelen, T.; Vandenbrande, S.; Heidar-Zadeh, F.; Vanduyfhuys, L.; Van Speybroeck, V.; Waroquier, M.; Ayers, P. W. Minimal Basis Iterative Stockholder: Atoms in Molecules for Force-Field Development. J. Chem. Theory Comput. 2016, 12 (8), 3894–3912. https://doi.org/10.1021/acs.jctc.6b00456 .

...

(9) Macaya, L.; González, D.; Vöhringer-Martinez, E. Nonbonded Force Field Parameters from MBIS Partitioning of the Molecular Electron Density Improve Binding Affinity Predictions of the T4-Lysozyme Double Mutant. J. Chem. Inf. Model. 2024, 64 (8), 3269–3277. https://doi.org/10.1021/acs.jcim.3c01912 .

(10) Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J. M.; Lu, C.; Dahlgren, M. K.; Mondal, S.; Chen, W.; Wang, L.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput. 2019, 15 (3), 1863–1874. https://doi.org/10.1021/acs.jctc.8b01026 .

...

(12) Heidar-Zadeh, F.; Ayers, P. W.; Verstraelen, T.; Vinogradov, I.; Vöhringer-Martinez, E.; Bultinck, P. Information-Theoretic Approaches to Atoms-in-Molecules: Hirshfeld Family of Partitioning Schemes. J. Phys. Chem. A 2018, 122 (17), 4219–4245. https://doi.org/10.1021/acs.jpca.7b08966 .

(13) Schauperl, M.; Nerenberg, P. S.; Jang, H.; Wang, L.-P.; Bayly, C. I.; Mobley, D. L.; Gilson, M. K. Non-Bonded Force Field Model with Advanced Restrained Electrostatic Potential Charges (RESP2). Commun Chem 2020, 3 (1), 1–11. https://doi.org/10.1038/s42004-020-0291-4 .

(14) Muddana, H. S.; Sapra, N. V.; Fenley, A. T.; Gilson, M. K. The SAMPL4 Hydration Challenge: Evaluation of Partial Charge Sets with Explicit-Water Molecular Dynamics Simulations. J Comput Aided Mol Des 2014, 28(3), 277–287. https://doi.org/10.1007/s10822-014-9714-6 .

(15) Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J.; Kollman, P. A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. Journal of Computational Chemistry 2003, 24(16), 1999–2012. https://doi.org/10.1002/jcc.10349 .

(16) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP Model. J. Phys. Chem. 1993, 97 (40), 10269–10280. https://doi.org/10.1021/j100142a004 .

(17) Huggins, D. J. Comparing the Performance of Different AMBER Protein Forcefields, Partial Charge Assignments, and Water Models for Absolute Binding Free Energy Calculations. J. Chem. Theory Comput.2022, 18 (4), 2616–2630. https://doi.org/10.1021/acs.jctc.1c01208 .

(18) Cho, M.; Sylvetsky, N.; Eshafi, S.; Santra, G.; Efremenko, I.; Martin, J. M. L. The Atomic Partial Charges Arboretum: Trying to See the Forest for the Trees. ChemPhysChem 2020, 21 (8), 688–696. https://doi.org/10.1002/cphc.202000040 .

(19) Manz, T. A. Apples to Apples Comparison of Standardized to Unstandardized Principal Component Analysis of Methods That Assign Partial Atomic Charges in Molecules. RSC Adv. 2022, 12 (49), 31617–31628. https://doi.org/10.1039/D2RA06349B .

(20) Lehner, M. T.; Katzberger, P.; Maeder, N.; Schiebroek, C. C. G.; Teetz, J.; Landrum, G. A.; Riniker, S. DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment. J. Chem. Inf. Model.2023, 63 (19), 6014–6028. https://doi.org/10.1021/acs.jcim.3c00800 .

(21) Sifain, A. E.; Lubbers, N.; Nebgen, B. T.; Smith, J. S.; Lokhov, A. Y.; Isayev, O.; Roitberg, A. E.; Barros, K.; Tretiak, S. Discovering a Transferable Charge Assignment Model Using Machine Learning. J. Phys. Chem. Lett. 2018, 9 (16), 4495–4501. https://doi.org/10.1021/acs.jpclett.8b01939 .

...

(24) Hait, D.; Head-Gordon, M. How Accurate Is Density Functional Theory at Predicting Dipole Moments? An Assessment Using a New Database of 200 Benchmark Values. J. Chem. Theory Comput. 2018, 14 (4), 1969–1981. https://doi.org/10.1021/acs.jctc.7b01252 .

(25) Denis, P. A. Basis Set Requirements for Sulfur Compounds in Density Functional Theory: A Comparison between Correlation-Consistent, Polarized-Consistent, and Pople-Type Basis Sets. J. Chem. Theory Comput.2005, 1 (5), 900–907-Consistent, Polarized-Consistent, and Pople-Type Basis Sets. J. Chem. Theory Comput.2005, 1 (5), 900–907. https://doi.org/10.1021/ct0500702 .

(26) Hait, D.; Liang, Y. H.; Head-Gordon, M. Too Big, Too Small, or Just Right? A Benchmark Assessment of Density Functional Theory for Predicting the Spatial Extent of the Electron Density of Small Chemical Systems. The Journal of Chemical Physics 2021, 154 (7), 074109. https://doi.org/10.10211063/ct05007025.0038694.