Updates | | JW: been working on adding capability to extend dataset to validate step not complete yet; will keep pushing next, sprinkling tests over various steps validate and generate-conformers will both get a --add flag to allow the addition path
DH: worked on components from last week’s call removed plotly dependency DD: need to add analysis components to production env now the analysis scripts are more robust; can work with incomplete data (QM results in, but perhaps incomplete MM results) consistent interface for each report subcommand the report plots command does need --method to be the filename; not the biggest deal for now, still have time to refine plotting code JW: if some QM molecules don’t converge, how do we want to handle this in analysis?
JH: changed QCSubmit to follow namespace migration should fix geometric, added in dlc, reset=True, indexing smirnoff99Frosst change will be made in the toolkit; best place for it to go (#816) DD: if we did switch to openff-toolkit, where would changes be needed?
DD: worked on execution approaches, removed snowflake from execute, made execute-from-server work similarly; both now return full JSON results as well from qcengine.compute_procedure for introspection DH: I have an old QM dataset that doesn’t use dlc, reset=True; how can I make it not compute DD: no real mechanism for this yet JH: can use compute tags, QCSubmit sets openff as tag by default on datasets DD: will add to manager instructions to set openff compute tag; will add a set-tag subcommand that allows for e.g. defunct to be set on a dataset; this will keep the manager(s) from computing it
DH: on cluster filesystem timeouts: admins can’t do much without hardware/architecture changes; will try doing the conda install on a local filesystem within the option B submission script
|
Upcoming project itinerary | | On 1/22, we will present the following approach. 1/22 will mark the kick-off of the "OpenFF Geometry Optimization Benchmark - Season 1" Starting on 1/25, we will begin deployment and testing of the approach at each partner site. Each partner will be provided with the deployment document, as well as the protocol. They will also be given a collection of SDF files as a test/burn-in set. The purpose of this is as follows: the test set is a set of open molecules; problem cases can be freely shared in the benchmarks-support channel will give partners a quick way to evaluate the available execution approaches on their resources the results are throwaway; mistakes are easy to correct the results can be fully shared, including walltimes; this data can be shared and used to estimate how long each partner's full set will take partners that succeed in executing the test set can proceed to running their production set, but no earlier than 2/1 the test set will also give partners a point of comparison for their input SDFs for their production set in case there are validation issues if they used a server for their test set, they will need to explicitly wipe their server after
By 2/1 we hope to have some partners ready to launch their production workloads each partner will be assigned a company ID to use for the workflow partners can add molecules to datasets after beginning the workflow using the approach outlined in the protocol doc for addition (working on this yet for validation, conformer generation) they can immediately share the coverage report portion if they like via benchmarks-partners as results complete for QM, they will periodically execute MM and run these datasets through the downstream analysis; preliminary result CSVs or plots can be shared via benchmarks-partners problem cases can be troubleshooted via benchmarks-partners, with a Zoom call if necessary (but care taken to avoid showing the contents of SDF files); tracebacks can be shared
By 3/1 we hope some partners will have completed most or all of their desired dataset, up to 10,000 optimizations a drop site for final result CSVs will be designated CSVs are designed to be practically concatenatable to combine all partner datasets for combined analysis a cutoff date for execution has not been determined; we will need to assess the rate of completion based on production run metrics for each partner
|