Technical overview
| David Dotson
| Overall architecture - High level details DD – JC – Advantages of colocation would be that Chodera lab could cover hosting expenses for OpenFF (both technical and personnel). JC – I’d like to share as much as possible with OpenFE - Molecule transformations, network mapping, etc. JC – Can make data available on AWS for redownload, for people to download at their own discretion DD – Could save data on AWS S3. Different tiers of speeds/schedules, could be accessed by a public URI. DD – Interested in getting output spec information from OpenFE.
JC – What’s missing here is info about how inputs are specified. JW – how do we avoid versioning hell on the stuff inside the green box? JC – First thing to standardize what goes in and out of the green box. Could freeze an env JW – how do you do this where there aren’t any automation details for free energy calculations in the green box? JC – what you get is a person doing all the work of generating inputs, which manifests as a slow and expensive feedback loop we want to be able to make iterations fast and cheap; this is scalability being able to get information about failures is key for FF development / drug discovery / advancing the free energy calculation infrastructure
JC – would like us to come up with inputs that are a starting point, then we can iterate on them to build out downstream components
Current plans/functionality JC – ideally we are in a position where in 3-6 months we have a functional green box that supports one engine JW – would like to see free energy benchmarking results, want things to be reproducible and isolatable network planner, system generator, etc.; need to be able to keep everything constant except one component, etc., ability to run locally with only suspected problem inputs
JW – do we get to specify which version of Gromacs / OpenMM we want to run on client machine? JC – to some extent, yes; within a few recent releases can specify core versions JC – public benchmarking data will be the only way for OpenFE to identify failures; won’t be able to operate on proprietary data
JW – how do we want visibility into errors; how do we delete or invalidate previous errors? are we planning on having the ability to delete? JC – We’ve definitely needed to trash datasets/results before. JC – It’s become helpful to be able to pause or delete individual runs/edges. JC – an adaptive supervisory process is possible later on that can add or delete edges in a network graph and propagate those changes into the work server JW – Generally agree. This could become complex so I’d be flexible on this
Collect use cases/user stories DD – Let’s populate the issue tracker of the following repo: issue tracker JS – How black-box/agnostic would this be? JW – I think JS may have multiple individual requests - Things like “make the inputs agnostic enough to take host and guest systems” and “allow people to use pAPRika as a workflow component” JS – Could this run forcebalance optimizaitons internally? JW – I don’t think so, but this could be a backend for optimizations if we engineer it right. DD – Understanding your use case, as well as the involvement of forcebalance in it, will allow us to ensure that we are including the information you need in outputs to use it productively.
JC – I’d talked to RGowers about shared object models/APIs, and keeping touch about that. Identify functionality gaps Use cases need to eventually be written, in fah-alchemy GH repo issue tracker Will have an issue template: Aspirational API, edge cases (must support/should support/shouldn’t support)
|