2022-12-06 QC Meeting notes

 Date

Dec 6, 2022

 Participants

  • @Pavan Behara

  • @Jeffrey Wagner

  • @David Dotson

  • BenjaminPritchard

 Discussion topics

Item

Notes

Item

Notes

Updates from MolSSI

  • BP – Nothing much. We sent in the proposal, should hear back fairly shortly (maybe next month)

  • BP – I’m validating the next branch, and finding that there’s a problem when there’s a lot of intnerla workers connected (not compute workers, but something else)

  • BP – Also some issue with the writeahead log in the produciton server. It’s filling up the disk faster than I expected. Now there’s like 500GB of postgres logs coming from 5GB of actual added content.

    • DD – Is there a way to compact/prune the logs?

    • BP – I should be keeping them for backups/as diffs. I think what’s happening is, if you make a change, the writeahead log doesn’t just store the change, it stores the entire page. And I think our database is really fragmented, it has to save a bunch of pages.

    • DD – I wonder why this problem isn’t seen in other postgres databases.

    • BP – I’m not sure, the solutions are things called “full vaccuum” or “repack”, basically defragmenting the database. So I’m going to keep working on this, but we had some weirdness recently where like the production server filled up.

  • DD – How’s capacity looking?

    • BP – Server’s at 77% capacity right now, about 1.2 TB remaining.

  • DD – Could you send us minimum specs for server?

    • BP – We were working with Dell to spec it out, ended up asking for $90k in grant to buy 64 cores, 0.5 TB RAM, 180 TB SSD (+backups for RAID6). Probably a hardware RAID controller.

    • DD – Could you send the full spec sheet?

    • BP – Sure

  • JW – I’ll push the governing board to go for the March equipment supplement, so you may get looped into that if they choose to go forward with it, BP.

    • BP – Gotcha, we’ll at least be able to reuse the materials from last year.

Infrastructure advances



Throughput status

OpenFF PEPCONF OptimizationDataset v1.0

  • 2892 → 3074 → 3673

    • DD – Looks like we’re not hitting many errors here, so that’s good. And I’m just running pre-emptible workers on Lilac.Should I change anything about the compute setup?

    • PB – What we have seems to be working right now. Though more workers could be nice.

    • DD – Currently asking for up to 250 workers at a time, actually running 87. So I don’t think I can bump up numbers. Should I add non-preemptible workers? These are pretty high-memory but I haven’t gotten any complaints.

    • PB – Would we be spending any sort of limited resource if we do non-preemptible?

    • DD – Nope. I’ll add some.

    • JW – Could PRP provide workers with this much memory?

    • DD – Maybe

    • JW – Let’s just add the lilac workers for now.

User questions/issues

 

Science needs

 

 Action items

 Decisions