/
2020-08-11 QCFractal User Meeting notes

2020-08-11 QCFractal User Meeting notes

Date

Aug 11, 2020

Participants

  • @Jeffrey Wagner

  • @David Dotson

  • @Joshua Horton

  • @David Cerutti (Deactivated)

Discussion topics

Item

Notes

Item

Notes

Updates from MolSSI

  • BP is teaching at MolSSI this week and is unable to join

Queue/Manager status

  • JH just submitted fragmented ligands from benchmarking set (JACS dataset, fragmented). Mix of torsiondrives and MM scans. Exciting new errors (some involving psi4 version queries). Also seems to be something up with automation.

  • DD – Still working on incorrectly-registered incompletes.

User questions

  • Fragmenter status and plans

CMILES for disaccharide set

  • JW – Need CMILES to make sure we’re not misinterpreting the molecules we submit to QCA.

  • JH – I was able to make CMILES from PDB. All bond orders were 1.

  • DC – Original structures from GMML. They don’t recommend writing topologies. Writes PDB and OFF formats. Takes string of saccharide identifiers as input. Generally catches errors/bad inputs.

  • DC – Can output PDB, OFF, and topology (.top or .prmtop)

  • Options:

    • Assume all bond orders are 1, use PDB + OE to interpret

      • Downside is that this would only work for current set, we’d be back to this discussion again if we tried to do bond order >1, charged atoms, or mols with S or P

      • During this meeting: OE can infer bond orders

      • Downside – Hard requirement on OE.

    • Make antechamber+tleap write mol2 and use format converters to reach sdf

      • Upside is that they must be doing bond order perception during small molecule parameterization

      • Upside – Fully open source pathway (ambertools, openbabel, rdkit)

      • During this meeting: This pathway successfully generates bond orders

    • Get unordered SMILES for mols and match up with PDB to get SDF/CMILES

 

Path 1 (OE reading PDBs):

python -c "from openforcefield.topology import Molecule; import sys; molecule = Molecule.from_file(sys.argv[1]); print(molecule.to_string()) "

 

Path 2:

antechamber -fi pdb -i glu.pdb -fo mol2 -o test.mol2 antechamber -i 1_ac.mol2 -fi mol2 -o 1_ac_sy.mol2 -fo mol2 -at sybyl -dr no obabel -imol2 1_ac_sy.mol2 -osdf -O new.sdf python -c "from openforcefield.topology import Molecule; import sys; molecule = Molecule.from_file(sys.argv[1]); print(molecule.to_smiles()) " new.sdf

 

On structure with carbonyl and sulfate, OE interprets PDB correctly, antechamber loses track of bond orders + charge on sulfate

(off-dev) jeffreywagner@JW-MBP$ diff out_ac out_pdb 1c1 < [H][C@]1([C@@]([C@](O[C@@]([C@]1([H])OS([O])([O])[O])([H])O[C@@]2([C@]([C@@]([C@](O[C@]2([H])C([H])([H])O[H])([H])OC([H])([H])[H])([H])OC(=O)C([H])([H])[H])([H])O[H])[H])([H])C([H])([H])O[H])([H])O[H])O[H] --- > [H][C@]1([C@@]([C@](O[C@@]([C@]1([H])OS(=O)(=O)[O-])([H])O[C@@]2([C@]([C@@]([C@](O[C@]2([H])C([H])([H])O[H])([H])OC([H])([H])[H])([H])OC(=O)C([H])([H])[H])([H])O[H])[H])([H])C([H])([H])O[H])([H])O[H])O[H]

 

 

Protein dataset CMILES

JH- After the call I have found that the CMILES in the initial protein dataset optimizations are not correct due to the input mol2 files having all of the bond orders set to 1. The optimizations are still correct however as the net charge of the molecule was still 0. The v2.0 dataset will fix all CMILES strings and fully complete the dataset.

Action items

@David Cerutti (Deactivated) will convert current JSON to Bohr
@David Cerutti (Deactivated) will tar+gz up final saccharide submission files when he updates github submission branch, then notify Horton and Dotson
Once above is complete,@Joshua Horton will take PDBs from disaccharide submission and use OE to make corresponding CMILES and SDF before submission
Once above is complete, @David Dotson will submit first batch of disaccharide set.
@Joshua Horton will add notebooks to pull down protein / saccharide results into respective submission directories, and point Cerutti to them
@David Cerutti (Deactivated) will make sure that protein optimizations have completed correctly, using notebooks posted by Horton. If so, he’ll notify Horton+Dotson.
If above is acceptable, @Joshua Horton and @David Dotson will submit the rest of the protein optimization as a “version 2” of the dataset, as well as under the DZVP basis set
We’ll wait on submitting ESP calcs until we get a green light from Hyesu that they’re being computed correctly.

Decisions