A user story is usually written from the user's perspective and follows the format: “As [a user persona], I want [to perform this action] so that [I can accomplish this goal].”
A scientist needs to grab an (input) data set, which may be used several times in development iterations
A reviewer wants to to re-run a benchmark as reported by a paper
MT - As a YAMMBS developer, I want to re-run an analysis on an existing data set and compare the results to previous results
MT - As a person curious about force fields, I want a web interface that compares physical properties of different force fields
LW – As someone researching FFs, I want to compare different force field fitting experiments that have been computed with the same versions and restrictions
LM – I benchmarked all versions of Sage and Parsley and now everyone in the future can use my results to compare to their new analyses without needing to generate a bunch of new data.
BW – I’d love “CI for forcefields” - like, when I get a FF out of ForceBalance, I’d like a job to be dispatched and to receive summary graphics about whether it’s good or not.
LM – I want to use my sqlite store in other analyses, like to drill down into problem molecules. So I could just upload my notebook with the assumption that other people could grab the weighty sqlite store
LW – I want to easily add datasets to my benchmarks in comparisons, so if I initially benchmarked on the OpenFF industry dataset I could append a new dataset for more summary graphics like Brent mentioned
BW – Could be good to have serializable dataset models, where eg we could dump out a subset of just our molecules of interest to json/csv
Have standard formats/inputs where there can be a rolling database of known “bad” entries that are filtered out each time.
BW – This could be handled by maintaining a standardized input dataset.
LW – Kinda, but it’d be good to have the full dataset in there ahead of time so folks can see what we’re filtering out.
MT – I could see two approaches here - Either have a filtering layer in YAMMBS or have an evolving dataset outside YAMMBS.
JW – compatibility of YAMMBS with datasets will change – story is something like: I want to load an old dataset with a new version of YAMMBS, and I get either an error message or instructions on how to deal with it
LW – as a maintainer of YAMMBS, I could migrate databases to a new format and tag them on Zenodo so they’re loaded from compatible YAMMBS versions
MT – What needs to be our ability to scale to large datasets?
JW – Would it be reasonable to say “we should be able to handle 100x our largest benchmark, manipulate it on a macbook, and fit within the zenodo/figshare limit”
LW – I’d like the ability to load and concatenate datasets from multiple zenodo uploads.
MT – what are use cases external users might come to us with? e.g. our collaborators, us in the future
JW – As an external user, I'd want to pull up a list of all benchmarking runs and search through them to find a benchmark that’s useful to me.
LW – As an external user, I want to run a benchmark like what OpenFF does (and possibly store it on OpenFF infrastructure)
JW – as an external user, I want to look at one observation but not pull down the entire dataset (I personally would reject this user story unless it’s trivial in our implementation, but I’m adding it here to record that we’re discussing it)
JW – guessing the science team will want to upload on the scale of 10-100x/year. It would be nice to have the upload be as easy as possible (e.g. a cli tool that automates the upload and fills in the description/zenodo metadata)
LW – Does zenodo allow deletion?
JW – We should be able to change the URL where data is fetched from (like, if we’re fetching datasets by name and not URL) in case our provider (figshare/zenodo) goes out of business.