Selecting level of theory for underlying data

Literature background

Charge methods

[Note this is not a complete literature search, just a survey of some commonly used methods and their pros/cons]

RESP[16] is a partial charge assignment method that is based on the Merz-Kollman charge method, designed to assign charges that best represent the QM ESP of a molecule. Unlike the Merz-Kollman method, RESP restrains charges to stay close to 0, with the intention of preventing spuriously large charges on “buried” atoms. However, these buried atoms can still be assigned unphysical charges, due to the lack of information about the chemical environment provided by the ESP. Typically, for force field applications, RESP charges are calculated in gas phase at the HF/6-31G* level of theory. This low level of theory is chosen intentionally, as it seems to over-polarize gas phase charges by an amount that mimics solvation, leading to charges that can be appropriate to use in condensed phase simulations [16]. However, RESP charges can be computed at any level of theory, and in QM benchmarking/method development literature, they usually are not calculated at the HF/6-31G* level [6,7,18,19].

AM1-BCC (what we currently use) is a partial charge method based on charges from semi-empirical AM1 calculations, with empirical bond charge correction (BCC) terms fit to reproduce RESP charges with the HF/6-31G* level of theory. However, the BCCs are also fit to solvation free energies, leading to occasional large differences compared to RESP, including better performance at predicting the hydration free energies of tertiary amines and nitrates [14]. Both RESP and AM1-BCC charges are highly conformer-dependent[6], which is a challenge for force field applications. The ELF10 conformer-averaging method can be used to partially mitigate this. AM1-BCC charges are used frequently in force field applications, as they are much cheaper to compute than RESP charges and provide reasonable results.

Minimal Basis Iterative Stockholder (MBIS) charges have recently gained a lot of attention as a good alternative to RESP charges [6,7,8,9]. They are much less conformer-dependent than RESP charges, and typically perform well at reproducing QM ESPs, while performing better than RESP on other electrostatic properties[6,7]. They also typically perform better than RESP for buried atoms, though the spherical pro-atom densities don’t treat lone pairs well without the addition of virtual sites [personal communication with Danny Cole].

 Solvation

If one moves away from the HF/6-31G* level of theory, charges must be calculated considering solvation effects. For nonpolarizable force fields, it is desirable to use a solvent with a dielectric constant lower than that of water, as the cost of polarizing the molecule is neglected by the force field, leading to over-polarization[13]. Typically this will be accounted for by averaging charges calculated in water with gas-phase charges[4,13,14], or by selecting a solvent with a dielectric constant that is lower than that of water[15,22], . Solvated charges can be calculated using either explicit[1,2,3,4,9] or implicit solvent[14,15,22]. However, there can be inconsistencies when using solvated charges but fitting to gas-phase torsion and other data[4,5,15].

 Non-bonded terms

In general, when changing the charge assignment method for a force field, the nonbonded terms must also be re-fit in order to assess the accuracy, as the nonbonded terms are trained based on a specific charge model and are not necessarily appropriate for a new model.[9,13]

Performance of different charge models in simulation 

In protein-ligand simulations, typically the ligand will be assigned charges on-the-fly using AM1-BCC or another charge model. There have been several papers comparing the difference in performance when using RESP vs AM1-BCC charges for ligands, with mixed results that do not strongly support using one method over the other[14,17]. Additionally, results for methods with higher levels of theory such as RESP2[13] or IPolQ-Mod[14]  also were mixed and did not show substantial improvement with the higher level of theory. However, one study using MBIS charges (with B3LYP/def2-TZVP) and re-trained LJ parameters for only the ligand showed improvement over GAFF.[9]

Charges used for large molecules and proteins are more sensitive to the charge method[4,7,8,14]. This is hypothesized to be because there tend to be more buried atoms. RESP (with HF/6-31G*) achieves good accuracy for ligand simulations due to the overpolarization from the low level of theory, however, all the charges are equally over-polarized. In a real system, only the solvent-accessible atoms are sensitive to the presence of solvent, and the buried atoms should not typically be very polarized. This effect seems to be better captured by using a high level of QM theory combined with a solvent model with a low but non-zero dielectric, or an average of charges solvated with water and in gas phase. Additionally, it has been shown that HF/6-31G* does not uniformly overpolarize charges, and that its performance can be particularly unreliable for hydrocarbons and sulfur or phosphorous-containing molecules, with B3LYP/aug-ccPVTZ in implicit benzene was shown to be much more reliable.[23] Presumably, the larger the molecule, the more this non-uniformity would affect the result.

QM theory for electrostatic properties

Several groups have benchmarked the best level of QM theory for calculating electrostatic properties. Electron densities have been benchmarked using the aug-cc-PVTZ basis, finding TPSSh and PBE0 performing well compared to CCSD/aug-cc-PVTZ.[22] Dipole moments have been benchmarked against CCSD(T) at the complete basis set limit, finding that most hybrid GGAs perform very well, with SOGGA11-X, PBE0, and wB97X-V all performing nearly equivalently and similarly to the hybrid mGGA PW6B95, and wB97X-D and wb97X-D3 performing only slightly worse, all using the aug-pc-4 basis[24]. The second spatial cumulant of the electron density, which is related to the quadrupole and measures the spatial extent of the electron density, has also been benchmarked against CCSD(T) at the complete basis set limit, finding that SCAN0, B97-2, TPSSh, and PBE0 all performed equivalently, with wB97X-V and wB97X-D performing slightly worse but still well, again with the aug-pc-4 basis.[26] ESPs have been benchmarked against the highly-accurate-for-dipoles[24] DSD-PBEP86-D3BJ/aug-cc-pV(Q + d)Z double-hybrid GGA and found that PBE0, B3LYP, and PW6B95 all perform similarly across a variety of basis sets, with the latter performing slightly better.[13]

Regarding basis set, aug-cc-pV(T+d)Z performed almost as well as the much larger aug-cc-pV(Q + d)Z for a fraction of the cost.[13] It was noted that tight d functions (denoted by +d in the Dunning basis sets and a second P in the def2 basis sets second P in the def2 basis sets) are necessary for accurate electrostatics of S-containing molecules[13]; the aug-pc-3 and aug-pc-4 bases also perform well for S[25], but are much larger and not available in Psi4.

Discussion

Because we anticipate using NAGL2 on both ligands and proteins/large molecules, I believe it will be worth the computational cost to use a higher level of theory. Based on prior benchmarking, and considering both cost and how commonly used each method is, I think that wB97X-V, PBE0, B3LYP, or wB97X-D would all be good choices of functionals that balance accuracy and speed. For basis sets, both the aug-cc-pV(T+d)Z or def2-TZVPPD basis sets should work well. aug-cc-pV(T+d)Z has been benchmarked explicitly for these properties and particularly performs reasonably for S-containing molecules, however the correpsonding pseudopotentials would be needed to treat iodine. def2-TZVPPD is nearly equivalent and its effective core potentials have been extensively benchmarked, so I would overall recommend def2-TZVPPD. I would also suggest using implicit solvent. Using a dielectric around 4 or averaging vacuum and implicit solvent with a dielectric of 78 both seem to yield good results, without venturing into the complicated world of explicit solvation and iterative calculations.

Decision

We will do two sets of calculations, one in vacuum and one in PCM with epsilon = 78, and average the two following a RESP2-style approach. This was chosen for maximum flexibility, given that it provides similar results to PCM with epsilon = 4.

Initially we will sidestep the issue of charge methodology, opting to train only on QM observables like ESPs, dipoles, and quadrupoles. However, as a backup, we will calculate MBIS charges, and save the wavefunctions to allow calculation of RESP charges.

Based on these literature results, we selected several possible levels of theory: wb97X-V/def2-TZVPPD, PBE0/def2-TZVPPD, or wb97X-D/def2-TZVPPD. wb97X-V was eliminated as analytical gradients are not available in Psi4, and thus post-SCF analysis like polarizabilities cannot be conducted. PBE0/def2-TZVPPD was eliminated due to errors with Psi4’s DDX PCM code with I-containing molecules. wB97X-D/def2-TZVPPD was eliminated due to errors with Psi4’s DDX PCM code with sulfur- and phosphorus-containing functional groups. Ultimately, we will do geometry optimizations at the “default” B3LYP-D3BJ/DZVP level of theory, and single points with saved wavefunctions at the wB97X-D/def2-TZVPP level of theory.

Testing

All of the analysis described below can be found in this Jupyter notebook.

PCM errors with diffuse functions

Psi4 has two possible plug-ins for PCM: PCMSolver or pyDDX. The Cole group identified potential issues with both.

PCMSolver

PCMSolver, which has been in use for many years in Psi4, produces a fatal error for larger systems (noted here). It seems that this is due to the initial package being written with small molecules in mind, and the cavity mesh is not appropriate for larger systems. Since “larger” systems refers to ~10 heavy atoms, this disqualified PCMSolver.

PyDDX

PyDDX, on the other hand, has been documented to have issues with diffuse basis sets (noted here), which appear to arise due to the interaction between the diffuse wavefunction and the cavity. In some systems, the wavefunction either fails to converge, or converges to a spurious solution hundreds of Hartrees below the desired solution, and produces nonsensical charges as a result. In an initial test on 1009 molecules pulled from the ESP 50k and multi-Br datasets, about 1/3 of molecules either crashed or converged to an unpysical result (defined by the MBIS charge difference between the vacuum and PCM calculations being larger than 1 e) using PBE0/def2-TZVPPD with DDX PCM water (epsilon = 78), UFF radii, and a 1.1x scaling factor.

6 problematic molecules were identified to attempt to identify and solve this issue, using the PBE0/def2-TZVPPD level of theory. It doesn’t appear to be a universal problem with ddPCM as a theory, as PySCF’s ddPCM was able to converge calculations that had issues in Psi4, although PySCF is incompatible with our existing workflow. Additionally, it was solvable by increasing the cavity size to use UFF radii and a 1.2x scaling factor (instead of the 1.1x scaling factor typically used with UFF radii). Many different SCF convergence keywords were tried (every built-in initial guess, reading in the vacuum or def2-TZVPP wavefunctions as an initial guess, damping, etc), and in a few cases using a Huckel guess or the vacuum wavefunction helped, but not universally. These results are consistent with numerical issues due to the interaction between the diffuse basis set and the solvent cavity.

Finally, Trevor identified the keyword dft_bs_radius_alpha, which controls the spacial extent of the radial grid and is recommended for use with diffuse basis sets. Setting dft_bs_radius_alpha 5.0 led to all 6 initial test calculations converging to the same solution as the vacuum wavefunction, but was unable to converge additional test calculations involving iodine. 10 new problematic molecules that contain iodine were identified, and we were able to converge 7 of these calculations using wB97X-D/def2-TZVPPD, and an additional 2 by also using a Huckel initial guess.

We re-ran our test dataset of 1009 molecules using these settings (wB97X-D/def2-TZVPPD and dft_bs_radius_alpha 5.0, with DDX PCM water (epsilon = 78), UFF radii, and a 1.1x scaling factor), which led to only 56 molecules crashing or converging to an unphysical result (defined by the MBIS charge difference between the vacuum and PCM calculations being larger than 0.25 e). We deemed this an acceptable error rate in the abstract.

Error analysis

We then analyzed the distribution of errors to ensure that they were randomly distributed and not overrepresented in a particular type of chemistry. Results are shown below. “Good DDX” refers to molecules where there were no errors and the MBIS charges gave physical results (with no atom showing differences of > 0.25 e between the vacuum and PCM calculations). “DDX problems” refers to molecules where the PCM calculation either crashed or gave unphysical MBIS charges (> 0.25e difference between the PCM and vacuum results). Because the sub-set of data where there were DDX problems was very small, we have presented both the actual distributions, as well as scaled distributions. Where a plot is marked with “scaled,” the data sub-sets (“Good DDX” and “DDX problems”) have been scaled to ensure that they are on the same order of magnitude as the overall dataset. This was accomplished by first calculating the fraction of the total dataset contained in the sub-set, then tiling the dataset so it is extended by that amount:

x_good = N_good/N_total # 0.94 x_problems = N_problems/N_total # 0.06 molecular_weights_good_scaled = np.tile(molecular_weights_good,int(1/x_good)) molecular_weights_problem_scaled = np.tile(molecular_weights_problem,int(1/x_problems))

In the scaled plots, if the sub-set “DDX problems” bin exceeds the original dataset’s bin height, that indicates that the molecules in that bin were over-represented in the errored calculations compared to their representation in the overall dataset.

mw_dist.png
Breaking down the distribution of DDX errors by molecular weight. In the scaled plot (top right), we can see that DDX errors are disproportionately affecting large molecules with molecular weight over 200 Da. This is not surprising given the hypothesis that the errors are partially due to grid issues.
atom_dist.png
Breaking down the distribution of DDX errors by element. In the scaled plot (top right), we can see that the DDX errors are disproportionately affecting large and diffuse atoms like I, P, and S. Again, this is not surprising given the hypothesis that the errors are due to the interaction between the diffuse wavefunction and the solvent cavity.

This analysis shows that DDX errors are disproportionately affecting large molecules, large atoms, and anions (which have diffuse wavefunctions). This is not surprising, given the hypothesis that the errors are caused by interaction between the diffuse wavefunction and the solvent cavity, perhaps due to insufficient grid in this area. While certain types of molecules did have more errors than others, nothing was fully eliminated from the dataset, e.g. there are still many large molecules, large atoms, and (single/double) anions that were successfully computed.

We then used CheckMol to identify specific functional groups where errors were overrepresented. Because there were many functional groups in the dataset, below are shown only functional groups where >10% of the molecules containing that functional group had errors.

Unfortunately, the DDX errors eliminate large swaths of sulfur- and phosphorus- containing functional groups. 45% of sulfinic acids and sulfinic acid derivatives, 50% of sulfuric acid derivatives, 63% of sulfonic acid derivatives, and 83% of sulfonamides had errors. Additionally, 50% of phosphonic acids and 30% of phosphonic acid derivatives had errors. Other categories were eliminated entirely, though these were largely functional groups that were only present in one molecule, making the performance difficult to generalize. Since sulfur- and phosphorus- containing functional groups are one of the main targets for improvement with NAGL2, we believe this is a disqualifying result for the use of diffuse functions with Psi4’s DDX implementation of PCM.

Anion errors without diffuse functions

The primary motivation for using diffuse functions is that they are important for accurately describing the electrostatic properties of anions, which typically have very diffuse wavefunctions. Here, we assess the errors associated with choosing a basis set without diffuse functions. We compare wb97X-D/def2-TZVPPD with wb97X-D/def2-TZVPP using Psi4’s DDX PCM plugin with PCM water (epsilon = 78). We only compare molecules where the PCM calculation with def2-TZVPPD converged to a physical result (the “Good DDX” subset shown above).

Effect of diffuse function on MBIS charges

First, we examine the effect of diffuse functions on MBIS charges, comparing the effect on non-anions and anions. Below we show the distribution of RMS deviations between MBIS charges calculated with def2-TZVPP vs def2-TZVPPD, both solvated and in vacuum. Each data point shows the RMSD in atomic MBIS charges over all atoms in the molecule. Overall, for non-anions, the RMS charge difference between def2-TZVPP and def2-TZVPPD is small (median 0.0046 e for PCM and 0.0029 e for vacuum), though it is slightly more pronounced in PCM than in vacuum. For anions, on the other hand, the overall difference is much larger (median 0.016 e for PCM and 0.0125 e for vacuum), and there are a few outliers for which the difference is quite large, > 0.05 e.

 

PCM (non-anion)

Vacuum (non-anion)

PCM (anion)

Vacuum (anion)

 

PCM (non-anion)

Vacuum (non-anion)

PCM (anion)

Vacuum (anion)

Mean RMSD (TZVPP vs TZVPPD)

0.0049 e

0.0031 e

0.0226 e

0.0196 e

Median RMSD (TZVPP vs TZVPPD)

0.0046 e

0.0029 e

0.0160 e

0.0125 e

Only mean and median are shown for simplicity of comparison, however, the min, max, and standard deviation follow the same trend.

Effect of solvation on MBIS charges

Next, we examine the effect of solvent on MBIS charges, comparing the effect on diffuse and non-diffuse basis sets. Below we show the distribution of RMS deviations between MBIS charges calculated with PCM vs vacuum, comparing the results for def2-TZVPP and def2-TZVPPD. The effect of solvent on MBIS charges does not seem to be strongly basis set dependent, with very similar distributions and statistics arising for both def2-TZVPP and def2-TZVPPD (mean/median RMSD ~ 0.02 e for non-anions and ~ 0.03 e for anions). Interestingly the average effect on anions and non-anions is very similar, however the minimum RMSD is much higher for anions (0.017 e) than non-anions (0.002 e), which makes sense.

 

 

def2-TZVPPD (non-anion)

def2-TZVPP (non-anion)

def2-TZVPPD (anion)

def2-TZVPP (anion)

 

def2-TZVPPD (non-anion)

def2-TZVPP (non-anion)

def2-TZVPPD (anion)

def2-TZVPP (anion)

Mean RMSD (vacuum vs PCM)

0.0236

0.0216

0.0324

0.0297

Median RMSD (vacuum vs PCM)

0.0226

0.0206

0.0308

0.0278

Min RMSD (vacuum vs PCM)

0.0024

0.0021

0.0176

0.0171

On the whole, the charge deviation due to solvent is much larger (~5x) than the deviation due to diffuse functions for non-anions, and the deviations are slightly larger (~2x) for anions.

 

References

(1) MacKerell, A. D. Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L. Jr.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102 (18), 3586–3616. https://doi.org/10.1021/jp973084f .

(2) Wang, X.; Wang, Y.; Guo, M.; Wang, X.; Li, Y.; Zhang, J. Z. H. Assessment of an Electrostatic Energy-Based Charge Model for Modeling the Electrostatic Interactions in Water Solvent. J. Chem. Theory Comput.2023, 19 (18), 6294–6312. https://doi.org/10.1021/acs.jctc.3c00467 .

(3) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015,11 (8), 3696–3713. https://doi.org/10.1021/acs.jctc.5b00255 .

(4) Cerutti, D. S.; Rice, J. E.; Swope, W. C.; Case, D. A. Derivation of Fixed Partial Charges for Amino Acids Accommodating a Specific Water Model and Implicit Polarization. J. Phys. Chem. B 2013, 117 (8), 2328–2338. https://doi.org/10.1021/jp311851r .

(5) Cerutti, D. S.; Swope, W. C.; Rice, J. E.; Case, D. A. Ff14ipq: A Self-Consistent Force Field for Condensed-Phase Simulations of Proteins. J. Chem. Theory Comput. 2014, 10 (10), 4515–4534. https://doi.org/10.1021/ct500643c .

(6) Verstraelen, T.; Pauwels, E.; De Proft, F.; Van Speybroeck, V.; Geerlings, P.; Waroquier, M. Assessment of Atomic Charge Models for Gas-Phase Computations on Polypeptides. J. Chem. Theory Comput. 2012, 8 (2), 661–676. https://doi.org/10.1021/ct200512e .

(7) Verstraelen, T.; Vandenbrande, S.; Heidar-Zadeh, F.; Vanduyfhuys, L.; Van Speybroeck, V.; Waroquier, M.; Ayers, P. W. Minimal Basis Iterative Stockholder: Atoms in Molecules for Force-Field Development. J. Chem. Theory Comput. 2016, 12 (8), 3894–3912. https://doi.org/10.1021/acs.jctc.6b00456 .

(8) González, D.; Macaya, L.; Castillo-Orellana, C.; Verstraelen, T.; Vogt-Geisse, S.; Vöhringer-Martinez, E. Nonbonded Force Field Parameters from Minimal Basis Iterative Stockholder Partitioning of the Molecular Electron Density Improve CB7 Host–Guest Affinity Predictions. J. Chem. Inf. Model. 2022, 62 (17), 4162–4174. https://doi.org/10.1021/acs.jcim.2c00316 .

(9) Macaya, L.; González, D.; Vöhringer-Martinez, E. Nonbonded Force Field Parameters from MBIS Partitioning of the Molecular Electron Density Improve Binding Affinity Predictions of the T4-Lysozyme Double Mutant. J. Chem. Inf. Model. 2024, 64 (8), 3269–3277. https://doi.org/10.1021/acs.jcim.3c01912 .

(10) Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J. M.; Lu, C.; Dahlgren, M. K.; Mondal, S.; Chen, W.; Wang, L.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput. 2019, 15 (3), 1863–1874. https://doi.org/10.1021/acs.jctc.8b01026 .

(11) Vanommeslaeghe, K.; Raman, E. P.; MacKerell, A. D. Jr. Automation of the CHARMM General Force Field (CGenFF) II: Assignment of Bonded Parameters and Partial Atomic Charges. J. Chem. Inf. Model. 2012, 52(12), 3155–3168. https://doi.org/10.1021/ci3003649 .

(12) Heidar-Zadeh, F.; Ayers, P. W.; Verstraelen, T.; Vinogradov, I.; Vöhringer-Martinez, E.; Bultinck, P. Information-Theoretic Approaches to Atoms-in-Molecules: Hirshfeld Family of Partitioning Schemes. J. Phys. Chem. A 2018, 122 (17), 4219–4245. https://doi.org/10.1021/acs.jpca.7b08966 .

(13) Schauperl, M.; Nerenberg, P. S.; Jang, H.; Wang, L.-P.; Bayly, C. I.; Mobley, D. L.; Gilson, M. K. Non-Bonded Force Field Model with Advanced Restrained Electrostatic Potential Charges (RESP2). Commun Chem 2020, 3 (1), 1–11. https://doi.org/10.1038/s42004-020-0291-4 .

(14) Muddana, H. S.; Sapra, N. V.; Fenley, A. T.; Gilson, M. K. The SAMPL4 Hydration Challenge: Evaluation of Partial Charge Sets with Explicit-Water Molecular Dynamics Simulations. J Comput Aided Mol Des 2014, 28(3), 277–287. https://doi.org/10.1007/s10822-014-9714-6 .

(15) Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J.; Kollman, P. A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. Journal of Computational Chemistry 2003, 24(16), 1999–2012. https://doi.org/10.1002/jcc.10349 .

(16) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP Model. J. Phys. Chem. 1993, 97 (40), 10269–10280. https://doi.org/10.1021/j100142a004 .

(17) Huggins, D. J. Comparing the Performance of Different AMBER Protein Forcefields, Partial Charge Assignments, and Water Models for Absolute Binding Free Energy Calculations. J. Chem. Theory Comput.2022, 18 (4), 2616–2630. https://doi.org/10.1021/acs.jctc.1c01208 .

(18) Cho, M.; Sylvetsky, N.; Eshafi, S.; Santra, G.; Efremenko, I.; Martin, J. M. L. The Atomic Partial Charges Arboretum: Trying to See the Forest for the Trees. ChemPhysChem 2020, 21 (8), 688–696. https://doi.org/10.1002/cphc.202000040 .

(19) Manz, T. A. Apples to Apples Comparison of Standardized to Unstandardized Principal Component Analysis of Methods That Assign Partial Atomic Charges in Molecules. RSC Adv. 2022, 12 (49), 31617–31628. https://doi.org/10.1039/D2RA06349B .

(20) Lehner, M. T.; Katzberger, P.; Maeder, N.; Schiebroek, C. C. G.; Teetz, J.; Landrum, G. A.; Riniker, S. DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment. J. Chem. Inf. Model.2023, 63 (19), 6014–6028. https://doi.org/10.1021/acs.jcim.3c00800 .

(21) Sifain, A. E.; Lubbers, N.; Nebgen, B. T.; Smith, J. S.; Lokhov, A. Y.; Isayev, O.; Roitberg, A. E.; Barros, K.; Tretiak, S. Discovering a Transferable Charge Assignment Model Using Machine Learning. J. Phys. Chem. Lett. 2018, 9 (16), 4495–4501. https://doi.org/10.1021/acs.jpclett.8b01939 .

(22) Bleiziffer, P.; Schaller, K.; Riniker, S. Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations. J. Chem. Inf. Model. 2018, 58 (3), 579–590. https://doi.org/10.1021/acs.jcim.7b00663 .

(23) Zhou, A.; Schauperl, M.; Nerenberg, P. S. Benchmarking Electronic Structure Methods for Accurate Fixed-Charge Electrostatic Models. J. Chem. Inf. Model. 2020, 60 (1), 249–258. https://doi.org/10.1021/acs.jcim.9b00962 .

(24) Hait, D.; Head-Gordon, M. How Accurate Is Density Functional Theory at Predicting Dipole Moments? An Assessment Using a New Database of 200 Benchmark Values. J. Chem. Theory Comput. 2018, 14 (4), 1969–1981. https://doi.org/10.1021/acs.jctc.7b01252 .

(25) Denis, P. A. Basis Set Requirements for Sulfur Compounds in Density Functional Theory: A Comparison between Correlation-Consistent, Polarized-Consistent, and Pople-Type Basis Sets. J. Chem. Theory Comput.2005, 1 (5), 900–907. https://doi.org/10.1021/ct0500702 .

(26) Hait, D.; Liang, Y. H.; Head-Gordon, M. Too Big, Too Small, or Just Right? A Benchmark Assessment of Density Functional Theory for Predicting the Spatial Extent of the Electron Density of Small Chemical Systems. The Journal of Chemical Physics 2021, 154 (7), 074109. https://doi.org/10.1063/5.0038694.