Call for Public Compounds

The call for public compounds below was sent to industry partners by @Gary Tresadern on 2020.11.05


Eight groups are interested in assessing your public sets of compounds at the QCarchive. Bayer already have a set deposited that can be used and do not wish to run extra calculations at this time.

Therefore, the seven partners shown in the table below can begin selecting the compounds we wish. Here are some points for the selections:

  • Number of compounds: Discussing with David Dotson Josh Horton and we agreed ~10K total is feasible to start – lets aim for 1500 compounds from each partner.

  • File format: Please provide compounds in 3D sd file with hydrogens added.

  • ID’s: Please add a unique non-confidential ID for each of your compounds, assign it in the title for each molecule in the sd file, and as a field named ID

  • Selection criteria: We encourage different selection criteria, but please apply some filters, <30-35 heavy atoms are the current guidelines, avoid some unusual atom types, extreme outliers of phys-chem properties(?)…

  • Conformers: Conformers will be generated at QCarchive using rdkit with the existing protocol. They will generate up to 10 conformers per molecule. If you want to provide specific conformers for a molecule please include them in your 3D sd file, and the protocol will keep those and generate more, if needed, until reaching the max of 10 for the compound.

  • Ionisation: David Dotson and Josh Horton are looking for an open source ionisation tool (currently they use OEtoolkit), if a tool is found this step can be done automatically at the QCarchive. We await a final decision here, however, it seems unlikely. Therefore, if you would like to assess charged compounds each partner can ionize some or all of your selection with your preferred method. If you do this please add a field “CHARGE” in the SD file that captures the formal charge.

  • QM Basis set: Not relevant for our selections, but an evaluation is underway to choose a suitable basis set for charged compounds.

Please get underway with your selections and clearance requests if needed. We hope to start pushing sets to the QCarchive within the coming weeks.

The QCarchive protocol will be adapted slightly to match exactly what we will run internally on our confidential compounds –these public sets will act as test cases for David D, Josh and David H to build and test the protocol.

 

Public QC Archive Set

Public Cmpds to Qcarchive?

Sum of unique sets

Comments

Gary Tresadern

Janssen

1

1

 

Arjun Narayanan

Vertex

1

2

Clearance tbc

Bai Xue

Xtalpi

1

3

50 compounds so far

Cristina Schindler

Merck KGaA

1

4

 

Daniel Kuhn

Merck KGaA

1

4

 

Thomas Fox

Boehringer

1

5

 

Aniket Magarkar

Boehringer

1

5

 

Xavier Lucas

Roche

1

6

 

Alberto Gobbi

Genentech

1

7

648 PDB ligands, summary document provided

William Swope

Genentech

1

7

 

Katharina Mayer

Bayer

0

7

Currently, no need for additional calculations, use the existing set (https://github.com/openforcefield/qca-dataset-submission/search?q=bayer)