Call for Public Compounds
The call for public compounds below was sent to industry partners by @Gary Tresadern on 2020.11.05
Eight groups are interested in assessing your public sets of compounds at the QCarchive. Bayer already have a set deposited that can be used and do not wish to run extra calculations at this time.
Therefore, the seven partners shown in the table below can begin selecting the compounds we wish. Here are some points for the selections:
Number of compounds: Discussing with David Dotson Josh Horton and we agreed ~10K total is feasible to start – lets aim for 1500 compounds from each partner.
File format: Please provide compounds in 3D sd file with hydrogens added.
ID’s: Please add a unique non-confidential ID for each of your compounds, assign it in the title for each molecule in the sd file, and as a field named ID
Selection criteria: We encourage different selection criteria, but please apply some filters, <30-35 heavy atoms are the current guidelines, avoid some unusual atom types, extreme outliers of phys-chem properties(?)…
Conformers: Conformers will be generated at QCarchive using rdkit with the existing protocol. They will generate up to 10 conformers per molecule. If you want to provide specific conformers for a molecule please include them in your 3D sd file, and the protocol will keep those and generate more, if needed, until reaching the max of 10 for the compound.
Ionisation: David Dotson and Josh Horton are looking for an open source ionisation tool (currently they use OEtoolkit), if a tool is found this step can be done automatically at the QCarchive. We await a final decision here, however, it seems unlikely. Therefore, if you would like to assess charged compounds each partner can ionize some or all of your selection with your preferred method. If you do this please add a field “CHARGE” in the SD file that captures the formal charge.
QM Basis set: Not relevant for our selections, but an evaluation is underway to choose a suitable basis set for charged compounds.
Please get underway with your selections and clearance requests if needed. We hope to start pushing sets to the QCarchive within the coming weeks.
The QCarchive protocol will be adapted slightly to match exactly what we will run internally on our confidential compounds –these public sets will act as test cases for David D, Josh and David H to build and test the protocol.
Public QC Archive Set | Public Cmpds to Qcarchive? | Sum of unique sets | Comments | |
Gary Tresadern | Janssen | 1 | 1 |
|
Arjun Narayanan | Vertex | 1 | 2 | Clearance tbc |
Bai Xue | Xtalpi | 1 | 3 | 50 compounds so far |
Cristina Schindler | Merck KGaA | 1 | 4 |
|
Daniel Kuhn | Merck KGaA | 1 | 4 |
|
Thomas Fox | Boehringer | 1 | 5 |
|
Aniket Magarkar | Boehringer | 1 | 5 |
|
Xavier Lucas | Roche | 1 | 6 |
|
Alberto Gobbi | Genentech | 1 | 7 | 648 PDB ligands, summary document provided |
William Swope | Genentech | 1 | 7 |
|
Katharina Mayer | Bayer | 0 | 7 | Currently, no need for additional calculations, use the existing set (https://github.com/openforcefield/qca-dataset-submission/search?q=bayer) |