Code
Github link macro | ||
---|---|---|
|
Data
(currently on google drive, should be released on github:https://drive.google.com/drive/folders/1A8ncO30eaS1vE1czGl3lwaIkjXgMyc4G?usp=sharing )
Content
CB: valid benchmark set:
what’s the protein? → identity (EC number)
what are the ligands? → smiles
what are the activities?
SB: state information (temperature, pressure, ion concentration, …)
everything else is an interpretation. (methods, ff, poses, charges, ….)
Stages
above data
+ structures (PDB + poses)
+ partial charges, FF parameters
method, method parameters
Ligands:
Structure as sdf file
coordinates
partial charges?
activity?
reference?
Charges (CB: if you want to evaluate e.g. another pose, we want to keep the charges constant)
Protein:
Structure as pdb file (protein.pdb + all other crystal molecules in <find_a_name>.pdb)
generated with gromacs
gmx pdb2gmx
<find_a_name> : ‘water+other’, ‘water+cofactors’, ‘other’?
Hybrid:
hybrid struct based on ligand A
hybrid struct based on ligand B
Problems with current version:
partial charges
boxes (dimensions, number of mols)
position of waters and ions
Split up data into essential input (coordinates in sdf + pdb) and results (detailed topologies (exact ff version and files), MD engine version, …)?
Essential input: A file which cannot be generated from the other files in the release data (except if it takes very long/to many resources to generate it)Experimental data → OpenFF-wide centralized generalized storage?
conda env export
/ gromacs exact version
Directory structure
Current directory structure | |
---|---|
├── <date>_<target_name_1> |
|
...