Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Participants

Goals

Discussion topics

...

Time

...

Item

...

Presenter

...

Notes

Action items

  •  

Discussion topics

Item

Notes

Updates from MolSSI

  • BP is teaching at MolSSI this week and is unable to join

Queue/Manager status

  • JH just submitted fragmented ligands from benchmarking set (JACS dataset, fragmented). Mix of torsiondrives and MM scans. Exciting new errors (some involving psi4 version queries). Also seems to be something up with automation.

  • DD – Still working on incorrectly-registered incompletes.

User questions

  • Fragmenter status and plans

CMILES for disaccharide set

  • JW – Need CMILES to make sure we’re not misinterpreting the molecules we submit to QCA.

  • JH – I was able to make CMILES from PDB. All bond orders were 1.

  • DC – Original structures from GMML. They don’t recommend writing topologies. Writes PDB and OFF formats. Takes string of saccharide identifiers as input. Generally catches errors/bad inputs.

  • DC – Can output PDB, OFF, and topology (.top or .prmtop)

  • Options:

    • Assume all bond orders are 1, use PDB + OE to interpret

      • Downside is that this would only work for current set, we’d be back to this discussion again if we tried to do bond order >1, charged atoms, or mols with S or P

      • During this meeting: OE can infer bond orders

      • Downside – Hard requirement on OE.

    • Make antechamber+tleap write mol2 and use format converters to reach sdf

      • Upside is that they must be doing bond order perception during small molecule parameterization

      • Upside – Fully open source pathway (ambertools, openbabel, rdkit)

      • Github link macro
        linkhttps://github.com/openforcefield/amber-ff-porting/blob/master/GenerateDipeptides.sh#L136

      • During this meeting: This pathway successfully generates bond orders

    • Get unordered SMILES for mols and match up with PDB to get SDF/CMILES

Path 1 (OE reading PDBs):

Code Block
python -c "from openforcefield.topology import Molecule; import sys; molecule = Molecule.from_file(sys.argv[1]); print(molecule.to_string()) "

Path 2:

Code Block
antechamber -fi pdb -i glu.pdb -fo mol2 -o test.mol2  
antechamber -i 1_ac.mol2 -fi mol2 -o 1_ac_sy.mol2 -fo mol2 -at sybyl -dr no
obabel -imol2 1_ac_sy.mol2 -osdf -O new.sdf
python -c "from openforcefield.topology import Molecule; import sys; molecule = Molecule.from_file(sys.argv[1]); print(molecule.to_smiles()) " new.sdf

On structure with carbonyl and sulfate, OE interprets PDB correctly, antechamber loses track of bond orders + charge on sulfate

Code Block
(off-dev) jeffreywagner@JW-MBP$ diff out_ac out_pdb 
1c1
< [H][C@]1([C@@]([C@](O[C@@]([C@]1([H])OS([O])([O])[O])([H])O[C@@]2([C@]([C@@]([C@](O[C@]2([H])C([H])([H])O[H])([H])OC([H])([H])[H])([H])OC(=O)C([H])([H])[H])([H])O[H])[H])([H])C([H])([H])O[H])([H])O[H])O[H]
---
> [H][C@]1([C@@]([C@](O[C@@]([C@]1([H])OS(=O)(=O)[O-])([H])O[C@@]2([C@]([C@@]([C@](O[C@]2([H])C([H])([H])O[H])([H])OC([H])([H])[H])([H])OC(=O)C([H])([H])[H])([H])O[H])[H])([H])C([H])([H])O[H])([H])O[H])O[H]

Protein dataset CMILES

JH- After the call I have found that the CMILES in the initial protein dataset optimizations are not correct due to the input mol2 files having all of the bond orders set to 1. The optimizations are still correct however as the net charge of the molecule was still 0. The v2.0 dataset will fix all CMILES strings and fully complete the dataset.

Action items

  •  David Cerutti (Deactivated) will convert current JSON to Bohr
  •  David Cerutti (Deactivated) will tar+gz up final saccharide submission files when he updates github submission branch, then notify Horton and Dotson
  •  Once above is complete,Joshua Horton will take PDBs from disaccharide submission and use OE to make corresponding CMILES and SDF before submission
  •   Once above is complete, David Dotson will submit first batch of disaccharide set.
  •  Joshua Horton will add notebooks to pull down protein / saccharide results into respective submission directories, and point Cerutti to them
  •  David Cerutti (Deactivated) will make sure that protein optimizations have completed correctly, using notebooks posted by Horton. If so, he’ll notify Horton+Dotson.
  •  If above is acceptable, Joshua Horton and David Dotson will submit the rest of the protein optimization as a “version 2” of the dataset, as well as under the DZVP basis set
  •  We’ll wait on submitting ESP calcs until we get a green light from Hyesu that they’re being computed correctly.

Decisions