2022-04-26 QC Meeting notes

 Date

Apr 26, 2022

 Participants

  • @Pavan Behara

  • @David Dotson

  • @Chapin Cavender

  • @BenPritchard

 Goals

  •  

 Discussion topics

Item

Notes

Item

Notes

Updates from Molssi

  • BP: Intermediate server went down over the weekend, seems to be behaving good now, maybe some power related issue.

    • DD: We discussed hardware solutions before, any other updates on that regarding networking related issues?

    • BP: For security reasons main server is not exposed to outside world.

    • DD: For next hardware setup exposing some ports to internet?

    • BP: That’s the plan. Right now we’re forwarding requests to a cloud service hosted on campus.

    • BP: No other issues other than this intermittent power thing.

    • BP: We’re fine on storage space now, nothing major that requires attention.

Infrastructure needs/advances

  • DD: Updates to qcsubmit to be compatible to qcf-next

    • DD: Met with BP last friday on a working session and made major updates to qcsubmit submission and interface to fractal server, it’s a work in progress. Will be doing another session to move this forward.

    • BP: Yeah, lot of workarounds like chunking submissions, etc., are obsolete now, no need for shortcuts with qcf-next.

Throughput status

  • OpenFF Protein Capped 1-mer Sidechains v1.2 - 42/46 TDs

    • 1579 new optimizations

    • CC: One of the three is taking too much time and we see them pop up as complete now.

  • SPICE PubChem Set 2 Single Points Dataset v1.2: 121428 from 121383, almost complete, around 100 remaining.

    • suggest status quo of frequent errorcycling

  • SPICE PubChem Set 3 Single Points Dataset v1.2: 69397 from 49181

    • 20K calcs in one week, Trevor mentioned he would up the deployment on UCI resources (low number of workers due to Matt/Jeff’s need of compute for tk testing purposes)

    • any deployment/availability issues with Lilac/PRP or they’re already hitting max usage?

      • DD: PRP pod availability is volatile and because of our memory requirements as well this is the max we can get out from PRP. Wrt Lilac, admins requested to scale down our array jobs because of their monitoring tool choking up.

User questions/issues

  • CC: Can we alter specification of maxiter on ongoing submissions?

    • CC: I might’ve seen a different error, cannot reproduce it, playing with geometric convergence parameters. There’s trouble capturing output at intermediate steps from qcengine.

    • DD: Yeah, qcengine doesn’t give you out tmp files usually, you may have to make some modifications to spit out intermediate information.

    • CC: These jobs still fail with a larger number of iterations.

    • DD: You still have useful information from the current incomplete scans, right?

    • CC: Yeap.

    • DD: I’m thinking of doing a set of random starting points for each of these capped 1-mer scans.

    • CC: I am generating one starting point for one grid point.

    • DD: Given we may hit a small % of failures, may be we can create a few clones of the same with different starting points.

    • CC: I don’t think this would be a preferred path since there are lot of grid points and compute would be a bottleneck but if a task is failing repeatedly we can use your approach.

    • DD: Sounds good, so for the remaining four failures may be we can apply this? And if everything still fails, is that a problem?

    • CC: Hmm, if everything fails it is still useful data.

    • DD: If some of them fails then it might be a case of choice of grid points or other issues with geometric.

    • CC: Yeah, I will prepare a subset with your approach. Also, coarse grained approach might be another way, downsample to 30 deg grid points instead of 15 deg.

    • DD: Yeah, couple of approaches would be better.

  • Data retrieval for SPICE sets

    • DD: PE’s solution seems elegant for now.

    • BP: We’re having discussions at Molssi as well. If we have a hdf5 dump then it should ideally have everything computed with the dataset. Custom exports to hdf5 would be mostly from the user side.

    • DD: May be we can create a pathway to make it much easier.

    • BP: We already have functionality to dump compressed sqlite.

    • DD: I think QCF should have an export strategy or recommendation for users post data generation.

    • BP: Export to

      • a different program as input

      • local cache

      • zenodo and generate doi

    • BP: We have a postdoc(future) to do this work.

    • BP: Client won’t time out soon as before. Streaming results can be another future option.

    • DD: Yeah, will respond there on a broader solution to this, for now whatever PE has seem good.

    • BP: Agree, I can run it server side and it will be fast enough.

  • PB: QCF-next portal down?

    • client = ptl.PortalClient() gives bad gateway 502 error

    • BP: Yeap, down.

Science support needs

 

 Action items

 Decisions