User questions | JH – With off-by-one problem with protein dataset, can we get rid of those datasets? BP – In a future release, we’ll add a mechanism to mark these as defunct/invalid. This would make them impossible to pull down, and/or effectively hide them from queries. JW – Could change dataset name? BP – Not possible yet, but that would be a nice feature. TG – In my dataset standards, I’m going to require a changelog in the metadata, which could store important context/information like this. (General) – Could either continually increment new dataset numbers, OR could have each dataset be separate, and have one “meta” dataset for each series that points to latest version. DD – Both of these seem possible, it’d be nice if QCA became opinionated about which way to do it. BP – I’m up to consider having versions of datasets be supported in QCF. Decision: Our dataset naming scheme should not assume any changes from QCF
JM – DD, which of my datasets are you preparing for submission? DD – Based on feedback from JM and Dominic Rufa, we’ve gotten a lot of user feedback that we’ll be incorporating into the infrastructure. TG – I’d like to have managers report which specific task ID they are pushing back to server on completion/failure. DD/JH – From the client, you can delete a collection
JH – Not sure if this is on the database level, or the client view level? BP – It’ll be the client view level. JH – Is there an SQL command I can use to delete the dataset? BP – I doubt it BP – Something to possibly discuss at Friday QC submission meeting – The optimization objects have a way to delete some part of output.
JH – Do we still need to store BOTH WBOs and Meyer indices? DD – JM recently asked for both, and other users are likely using it too. TG – IIRC, the compute and storage requirements to store both are small Decision: We will continue storing both
PB – Do all datasets have WBOs? JH – Galileo hosts a QCA instance. How can we record a database when the hosting docker container gets shut down? JW – We could either pull the database file out before the container shuts down, or we could save the state of the container. IIRC, we can’t export the running image, but we may have a way to store artifacts. DD – Do they have other protocols that require persistence? Or something that dumps data out to Amazon S3? JH – If we know where the database file goes, we could specifically export the database. BP – There’s a database dump option that will produce a file DD – Could make the final step in the workflow that triggers this file output, but you’d need to be able to initialize a new container with that file. TG – You could, at the end of the docker run, export the database, store it somewhere, and inject it at the beginning of the subsequent run. (General) – We want a way to have a persistent database for testing bespoke workflow/qcsubmit on Galileo Option 1: Could make pre-populated image and host on openff dockerhub, but this will start to take up a lot of space Option 2: Could make Galileo images dump out database file on shutdown (SIGKILL/SIGTERM), and find a way to inject it into images when they start
JW – Could include database file in folder that gets submitted DD – Would need to have conditional to search for a database dump at startup and incorporate it if present.
(General) – Could Galileo host compute for industry benchmarking project? DD – Would pharma companies allow in-house molecules to go to external compute provider like Galileo? JW – Galileo was also asking about industry security requirements. Next Galileo meeting in 9 days (next Thursday) DD – I can talk to Roche and Janssen about this, since we’re already in contact. (General) – Industry doesn’t know about Galileo yet.
|