...
DR – Shows slides, legacy of NIST TRC. ThermoData Engine drives data discovery and formatting. Main thing infra is Oracle database. ThermoML is expensive to collect and maintain, so it was “hard to get” – XML format, uncertainties hard to getnot released. The tarball of this is available and free, but hard to access. Working on making it available via JSON-LD, which is embedded in webpages, and searchable by datasetsearch.research.google.com. Looking at adding API for searching using Cordra, which offers REST API, user authentication. Currently defining JSON schema for datasets.
...
Looking to have API prototype flying by in June
DGS – Cordra seems cool, never used it before, but it seem handy.
...
SB – We would prefer to use the REST API. Currently we download tarball and it’s a pain to search using python
CM – What kidns kinds of searchers do you do?
SB – Looking for properties in certain ranges. Eg densities around reasonable pressures, temperatures. The same compound having several measurements.
JC – It’s frequently useful to do population analysis on the results we COULD pull. So, getting results count without getting whole records.
CM – Re: making this data public – We want to make sure that users cite the ORIGINAL data, which is why the citation tag goes everywhere. Lifecycle hooks are a method to keep citations attachesattached, even as people strip and segment data.
...
CM – Currently we’re making about 30% of our data publicly available – Only things data from journals, which have citation records.
...
CM – When possible, we include InChI that has specific stereochemistry. When the underfined stereochemistry We leave it undefined when we’re unable to determine the stereochemistry.
JW – Protonation state?
CM – We basically don’t know the protonation state any better than the experimentalists, so we don’t provide it.
...
JC – We need to talk to advisory board about pushing advisory board to convince convincing NIH to support these sorts this sort of effort.
CM – NIST recieved some AI funding this year, and we may be able to pitch ThermoML archive as a resource for AI community to get some of this funding. OpenFF is a sort of machine learning, and we would benefit by showing that you’re using this.
...
MS – What other ways can we show support for ThermoML? Letters of support? Direct moneyfunding?
CM – We should talk about transferring money separately. But internal and external publicity is really good for us, so I can draft a writeup for an internal publication here and send it to you for review. You should imagine that this might go on the front page of NIST and/or in C&EN.
...