...
CB: valid benchmark set:
what’s the protein? → identity (EC number)
what are the ligands? → smiles
what are the activities?
SB: state information (temperature, pressure, ion concentration, …)
everything else is an interpretation. (methods, ff, poses, charges, ….)
Stages
above data
+ structures (PDB + poses)
+ partial charges, FF parameters
method, method parameters
Ligands:
Structure as sdf file
coordinates
partial charges?
activity?
reference?
Charges (CB: if you want to evaluate e.g. another pose, we want to keep the charges constant)
Protein:
Structure as pdb file (protein.pdb + all other crystal molecules in <find_a_name>.pdb)
generated with gromacs
gmx pdb2gmx
<find_a_name> : ‘water+other’, ‘water+cofactors’, ‘other’?
Hybrid:
hybrid struct based on ligand A
hybrid struct based on ligand B
Problems with current version:
partial charges
boxes (dimensions, number of mols)
position of waters and ions
Split up data into essential input (coordinates in sdf + pdb) and results (detailed topologies (exact ff version and files), MD engine version, …)?
Essential input: A file which cannot be generated from the other files in the release data (except if it takes very long/to many resources to generate it)Experimental data → OpenFF-wide centralized generalized storage?
conda env export
/ gromacs exact version
Directory structure
Current directory structure | |
---|---|
├── <date>_<target_name_name>1> |
|
ligand.yml:
currently in
name (identifier)
smiles: CCOc1c(c(cc(n1)NC(=O)Cc2cc(c(cc2OC)Br)OC)N)C#N
(outdated: relative path to sdf file docked: 03_docked/lig_17124-1/lig_17124-1.sdf)
measurement
activity with error (if available) and unit
reference (doi)
comment
maybe add
author (person who transcribed it/added it to the database)
data version