XFF dataset | @Lily Wang | Seeing warnings in generating dataset – anything to worry about? Not sure where they’re coming from – not qcsubmit?  LW – Max size 314 Da. As many as 30 confs for some mols. One option if they don’t want us sharing their confs is to regenerate confs from SMILES. But I’d prefer to use theirs, since LM and I looked at geometries and some of them aren’t great. So I’d prefer to use the xtalpi geoms. JW, any thoughts on workflow purity? JW – No strong preference. If they’re not sharing their data in the first place then workflow purity isn’t on us. LW – BW, do you know whether there are issues mixing method/basis and QM program? BW – When we’d publish papers, we’d try to use a consistent method/basis and program/version. If we’re reoptimizing then I wouldn’t worry at all. JW – Do I recall their calcs had constraints? LW – Not their opts. Their torsiondrives would have, but all I have for those are output mol2s. What I ended up doing with their TD was using the minimum conformer and using our infrastructure to scan around the selected dihedral. So we’ll probably end up with different final values for the torsiondrives.
LW – When generating optimizationdataset, I ran into some warnings and I wasn’t sure where they came from. Charges neutralized
Proton(s) added/removed
JW – could be INCHI handling? Unexplained warnings/errors printed to STDOUT sometimes come from there LW – This happens when adding mols to optimizationdataset. The CMILES look identical before/after, so I’m not concerned about connectivity changes. LW – TRIPOS mol2 files with custom tags. JW – MOL2 processing is famously cursed. RDKit reads one flavor and silently mangles the other. OE reads another failure and fails loudly if it can’t read safely. So you might look closely at something like carboxylates to make sure bond orders and formal charges come across right. Â
BW – Some issues with rdkit reading SDF and sanitizing.
|