/
Protein-Ligand Benchmarks

Protein-Ligand Benchmarks

Code

 

 

Data

(currently on google drive, should be released on github:https://drive.google.com/drive/folders/1A8ncO30eaS1vE1czGl3lwaIkjXgMyc4G?usp=sharing )

Content

  • CB: valid benchmark set:

    • what’s the protein? → identity (EC number)

    • what are the ligands? → smiles

    • what are the activities?

    • SB: state information (temperature, pressure, ion concentration, …)

    • everything else is an interpretation. (methods, ff, poses, charges, ….)

  • Stages

    1. above data

    2. + structures (PDB + poses)

    3. + partial charges, FF parameters

    4. method, method parameters

  • Ligands:

    • Structure as sdf file

      • coordinates

      • partial charges?

      • activity?

      • reference?

    • Charges (CB: if you want to evaluate e.g. another pose, we want to keep the charges constant)

  • Protein:

    • Structure as pdb file (protein.pdb + all other crystal molecules in <find_a_name>.pdb)

      • generated with gromacs gmx pdb2gmx

      • <find_a_name> : ‘water+other’, ‘water+cofactors’, ‘other’?

  • Hybrid:

    • hybrid struct based on ligand A

    • hybrid struct based on ligand B

  • Problems with current version:

    • partial charges

    • boxes (dimensions, number of mols)

    • position of waters and ions

  • Split up data into essential input (coordinates in sdf + pdb) and results (detailed topologies (exact ff version and files), MD engine version, …)?
    Essential input: A file which cannot be generated from the other files in the release data (except if it takes very long/to many resources to generate it)

  • Experimental data → OpenFF-wide centralized generalized storage?

  • conda env export / gromacs exact version

Directory structure

Current directory structure

 

Current directory structure

 

├── <date>_<target_name_1>
│   ├── 00_data
│   │   ├── edges.yml
│   │   ├── ligands.yml
│   │   └── target.yml
│   ├── 01_protein
│   │   ├── crd
│   │   │   └── protein.pdb
│   │   └── top
│   │   └── amber99sb-star-ildn-mut.ff
│   │   ├── topol.itp
│   │   └── topol.top
│   └── 02_ligands
│   ├── lig_<name_1>
│   │   ├── crd
│   │   │   └── lig_<name_1>.sdf
│   │   └── top
│   │   └── openff-1.0.0.offxml
│   │   ├── fflig_<name_1>.itp
│   │   ├── lig_<name_1>.itp
│   │   ├── lig_<name_1>.top
│   │   └── posre_lig_<name_1>.itp
│   ├── lig_<name_2>
│   …..
│   └── 03_hybrid
│   ├── edge_<name_1>_<name_2>
│   │   └── water
│   │   ├── crd
│   │   │   ├── mergedA.pdb
│   │   │   ├── mergedB.pdb
│   │   │   ├── pairs.dat
│   │   │   └── score.dat
│   │   └── top
│   │   └── openff-1.0.0.offxml
│   │   ├── ffmerged.itp
│   │   ├── ffMOL.itp
│   │   └── merged.itp
│   ….
├── <date>_<target_name_2>

 

  • parent folder for target <target_name_1>

    • metadata

      • information about edges / perturbations

      • information about ligands

      • information about target

    • protein data

    •  

    •  

    •  

    •  

    •  

    • ligand data

      • ligand <name_1>

        • coordinate

          • SDF coordinate file

        • topology

        •  

        •  



 

 

  • ligand.yml:

    • currently in

      • name (identifier)

      • smiles: CCOc1c(c(cc(n1)NC(=O)Cc2cc(c(cc2OC)Br)OC)N)C#N

      • (outdated: relative path to sdf file docked: 03_docked/lig_17124-1/lig_17124-1.sdf)

      • measurement

        • activity with error (if available) and unit

        • reference (doi)

        • comment

    • maybe add

      • author (person who transcribed it/added it to the database)

      • data version

 

Related content

Internal coordinate hessian target
Internal coordinate hessian target
Read with this
Follow-up workshop: BespokeFit
Follow-up workshop: BespokeFit
More like this
Diverse conformer sampling through MD
Diverse conformer sampling through MD
Read with this
Protein-Ligand Binding Free Energy Benchmarks via alchemiscale
Protein-Ligand Binding Free Energy Benchmarks via alchemiscale
More like this
Validation of small molecule crystal structures with Open Force Field
Validation of small molecule crystal structures with Open Force Field
Read with this
(Complete) Protein-Ligand Benchmarks Automation via Folding@Home
(Complete) Protein-Ligand Benchmarks Automation via Folding@Home
More like this