PL Benchmarks on F@H - Design Document
This document is separated into Discovery and Proposal sections. Discovery is an inventory of the available resources and components for any proposed system. Proposal is our proposal for the system we will build and deploy given these resources and components.
Discovery
We have the following resources and components at our disposal.
repository of existing protein-ligand systems; well-organized and curated by @David Hahn
questions:
what is the
iridium_classifier
?how do we currently generate gromacs params from openff forcefields?
via
parmed
Do we have or want a
conda-forge
package for it?
we will likely start with this repo as our source for systems and any Github Actions-based automation for managing benchmark execution; could also use another repo for automation
@David Hahn 's fork of
pmx
, implementing required changes to codebase and including workflow scripts for benchmark executionProtocol for execution here: pmx Binding Free Energy Workflow | Protocol
Could use these components as basis for initial automation for generating Gromacs systems fed to Folding@Home; will likely want them to live outside of the
pmx
fork, and we will want to migratepmx
fork to theopenforcefield
Github orgpmx
does not appear to have a regular release cycle or sustained development (and source repo appears to be Python 2-only), so although it may be our current approach for Gromacs use, other options should be considered before pursuing further
we will utilize
perses
for setup of protein-ligand systems executed with OpenMMwill want to produce a protocol using
perses
that is consistent with our protocol for Gromacsactively developed, though not yet stable; many moving parts yet
We want to automate running the standard benchmarks (Schrodinger JACS, etc.) for new releases of perses.
Python-based wrapper to Gromacs command-line tool; unclear if this would be a sufficient interface, but it does receive somewhat regular development and is Python 3-compliant
May not work with Gromacs 2020/2021; not listed as tested with these
Open-source Orion cubes (e.g. openmm_orion)
Proposed by @John Chodera: may make sense to utilize (open source) workflow components from Orion.
Need to verify license on these if we choose to use any
AWS
FAH work server hosting
API hosting for triggering events
result data hosting via S3
Github
preferred submission and provenance/versioning mechanism for new benchmark systems
automation can kick off activities on AWS, or conversely could be watched by AWS resources
Proposal
See the Architecture Diagram below for relationship between components.
This system features four necessary components:
A server presenting a REST API that can be called directly via HTTPS requests, or via a Python client.
A Folding@Home work server, featuring three services.
a work unit generator, which creates inputs to the Folding@Home work server
for GROMACS input, uses
pmx
workflowfor OpenMM input, uses
perses
the Folding@Home work server service itself, which interacts with Folding@Home volunteer hosts, assignment servers, etc.
a result gatherer, which crawls results deposited by the work server from volunteer hosts, packages and ships them to S3 results storage
An AWS S3 bucket hosting benchmarking results in a versioned, evolveable schema.
A web application for browsing and visualizing benchmark results.
This system allows authorized users to submit protein x ligand x forcefield x engine-version x result-version requests to the server. The server will reference these requests against the existing results; requests that do not have existing results will be forwarded to the work server for execution.
Users can interact with results in a number of ways:
Queries against the server’s REST API (1), either directly via HTTPS or via a Python client interface.
rate limits for large data pulls are applied to avoid large egress costs, abuse
Direct data download via AWS S3 (3).
authenticated requestor pays, as this could be abused with cost to OpenFF
Visualization and subset download via the web app (4).