| |
---|
QCArchive status | BP – Freed up a ton of space - Removed lots of wavefunctions, now 52% full on the SSD moved remaining wavefunctions to spinning disk Down to 52% disk utilization. Datasets themselves (as in collections) are still present and visible from client, but records associated are gone
BP – good to take a policy of asking for wavefunctions sparingly BP – We’d been looking at funding sources for a new server , but due to confusion about grant timing we didn’t submit in time. BP – still looking into a new server, since we as a policy want to have plenty of free space, not hitting >90%
|
Refactor status | BP – how do you guys want the refactor to be put in front of you? JW – is there room to have both the new and old DB in place at the same time? BP – not on the prod server, no BP – could put it on the “molssi10” host
DD – If we want to use the new host for the deployment, maybe we could move production QCA to “molssi10”, then http://qcarchive.molssi.org could rediret to that, and the migrated db could live on the “real” server, and be accessed through a different address. BP – That could work. The current db has /v1 in all the API calls, whereas the new one will have /v2 , so I could use that to determine redirects. DD – We’ll want to coordinate a little bit on when we’d want to switch over writes from one to the other.
DD – In the meantime, is there a test host that maybe doesn’t have all the data, but folks on this call could hammer on it for an hour or two? DO the folks on this call have time for that? We’d like to smooth out any pain points before it becomes the only choice. SB – I haven’t fullow followed the refactor, but could you give a high-level summary of the refactor? So I understand what to test out. BP – Basically, “everything has changed”. So that’s many of the function names, many aspects of dataset navigation were made more intuitive. But the concepts are the same. SB – So, for example, metadata was getting put into a big json blob for whole datasets. Is that getting more broken up? Also some pain points in downloading large datasets. Happy to test out how those are changed in the refactor. BP – Absolutely. The db is now no longer dependent on large json blobs. It’s now broken out into tables, and structured like a much more traditional relational database. As you’re interacting with a dataset in qcportal, interacts with DB more often but in smaller pieces; no passing around fat JSON blobs
DD – So calling compute now no longer requires shunting the entire dataset back and forth between client and server. Now it all happens on the server. SB – So if we could have access to a test server with some of the datasets (especially the chunkier ones that have been hard to pull down) perhaps a partial migration? Then we could try submitting some fat sets BP – The migration script is really slow, we could make a few new datasets and that might be better. SB – Either migrating or making a new dataset is OK by me. I just want to be able to test large access operations.
BP – can also invalidate records, cancel records; so control surface for failures or bad calculations is richer SB – know there have been chats about internalizing error cycling; is that included? BP – I do have the place in the code that would allow for automating restarts of failed calculations also implemented compute history, where we hold on to previous errors necessary for restart logic that needs to see failure counts, types of failures
TG – being able to query what kind of tasks feature which compute tags? BP – Not yet, that will come when we implement more permissions. Eg, we don’t want someone from outside to be able to assign their dataset the openff tag and take over your compute TG – For me, I’m interested in separating out resource requirements based on tag, and implementing some sort of logic for managers to know which tag they should be on.
BP – tasks are now pretty hidden; everything goes through records restarts are done on records, cancels are done on records if you pull down a record you can inspect the task TG – And the tasks still have a baseresult/baserecord ID? DD – WRT “result” vs “record” and other name changes, BP and I discussed which terms we use currently, and how to standardize them in a meaningful way. BP – ... No more “collections”, only “datasets” now. All have “specifications” “entries”, and “records”.
SB – A use case that we might like is the ability to “cherry pick” the records in one or more datasets into a new dataset, to use it as a record for the data that was actually used. So we could, at the end, take a bunch of records from different datasets and have a final “here’s what we actually used for this process” TG – I kinda do this already, formal support would be great. BP – DD – BP, we could look into the type of backend that would support this user story. SB and TG, could you provide more details on specifically what you’d like here?
|
Infrastructure needs/advances | DD – New optimization schema with scf properties evaluated at optimized geometry one extra step of doing single point at the final geometry, acceptable cost, any thoughts? SB – This is something I’ve been interested in doing before. Eg, with my recent dataset where I’m computing wavefuncitons, I only want wavefunctions and wiberg bond orders only for the last step. So I’d like to be able to do something like “optimize then do a special step at the end” BP – This seems like more of a QCEngine thing. DD – I could take this on, basically having QCEngine performing an optimization, then dropping the information from the intermediate steps. Would this require any change in QCF? BP – That’s pretty much how the protocols work now. The OptimizationResult pydantic model takes info from the whole trajectory, then knows whether it should keep the info from all the steps, or just the final step/just a few steps. Could …, or add a mode where it only gets stored at the last step, or could have QCF/QCEng know at each step could be dropped. DD – Are wavefunctions made anyway at each step by psi4? DD – Are there other things that should be kept? SB – wiberrg bond orders, lowdin indices, mbiis charges TG – There are some datasets where mbiis charges fail, and I have to tiptoe around successes there BP – I did lots of expensive operations in grad school, so I feel this DD + SB – Would be good to have native support for “use a cheap method for an optimization, then once that’s converged, use an expensive method” BP – That seems like it would fit most neatly in QCEngine. So like a process where it does an expensive single-point at the final step. IT would fit in QCEngine better than QCFractal. DD – I think we have enough information to proceed with a PR to QCEngine. I’ll be the primary driver but will loop in SB and JH for feedback. Importantly, I’ll look into whether it can be handled entirely in QCEng, or if it will need special support in QCF. SB – Let me know if I can help out. This would be a huge win for bespokefit. And if it can fit into existing data models that would be great.
New default policy for optimization trajectory storage? DD – How often do we really use intermediate geometries/properties in an optimization trajectory? Or should we switch over our default to just storing the first and the last? TG – THuefner and someone from Chodera lab use intermediate geometries/properties from optimization trajectories SB – Yes, JC and Yuanqing, and some folks from the Cole lab experimented with using intermediate geometries. JH – Yeah, plenty of experimentation going on there TG – I found that the last ~50% of optimization trajectories aren’t very informative. DD – Maybe we should change the default in QCSubmit from “full” to “first and last”, and let people specify “full” if they really want the whole trajectory. TG – Personal opinion is not to call it “default”, since the spec is used BOTH during submission, but also during retrieval. And this would break the meaning of “default” in retrieval operations. DD – I’ll need to check on where in the process this “default” term is used to determine whether TG’s issue would affect existing calls. TG – It wouldn’t be a huge deal but it’d be good to be deliberate DD – I’ll make a PR on QCSubmit to propose this change and we can continue the discussion there. (Can also handle other suggested changes there)
New QCEngine release with QCEngine#351 DD – I’ll push for a release with this change, will coordinate with LBurns in case there are other things that need to get in. SB – That’s great. If the release is out by the end of the week then we can incorporate it into bespokefit, if not, it’s not a huge deal.
SB – H5py bug on QCFractal. JW – I’ll try to take this on but no promises about how far I can progress it, since it’s in someone else’s feedstock. SB -- There’s a workaround where we can pin h5py, but I’d rather have it become an optional dep. BP – Upcoming version doesn’t have a hard dep on hdf5.
|
Throughput status | |
New submissions | |
User questions | |
Science support needs | |