2021-06-04 QCA Submission meeting notes

Participants

  • @Trevor Gokey

  • @Pavan Behara

  • Ben Pritchard

  • @David Dotson

Goals

  • User questions/issues, new submissions

  • Science support needs

  • Infrastructure needs / advances

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Incomplete #173

Pavan

  • TorsionDrive dataset #173 reports complete, but doesn’t appear to be complete at all. At least two distinct issues:

    • ds.status(status='COMPLETE') gives different answer than ds.status().

    • Also, the GHA automation appears to declare complete on the wrong dataset in #173

Production env updated to new psi4 with iodine fixes

Pavan

  • JH: do we want a standard suite of tests for psi4

    • DD: can set up CI against psi4- containing prod environments

      • JH: could use openff-qcsubmits own tests, run those

        • could also do single molecule point calculations, energies, gradients, hessians

      • DD: can I get some molecules to toss into those point calculations?

  • PB: having newer psi4 with iodine support may be useful for Sage refit; definitely needed to re-run Genentech iodine-containing mols

Aniline 2D impropers #197

Josh

  • JH: #197 should be ready to go

    • will be first 2D scan in production; have done testing locally

    • DD: will review this today and aim to merge

Timeout issues

Trevor

  • TG: I find that timeouts when pulling Hessians happens pretty consistently; what’s more, after it happens, I can’t connect for some time

    • Only way it’s not flaky is if I pull one at a time

    • DD: would this be in nginx?

    • BP: around when was this last?

    • TG: Sunday

    • JH: is this pretty reproducible? I can’t say I’ve hit this myself

    • TG: using 10 threads at a time

    • BP: oh, that would be too much. We’re only doing 8 processes for serving traffic, so you’re probably exhausting resources on the Fractal server

      • if you’re asking for 10, but it can only serve 8 at a time, 2 will be waiting; because these are larger than e.g. molecules, the likelihood of the waiting being longer than the timeout is larger, and so happens more often

  • BP: 16 core machine public QCA; allocate 8 for reads on QCA server, 3 for manager communication, 4? for database; also have nginx

Error cycling as a policy implementation on Collections

David

  • DD: brought up by Simon as an idea for advancing error cycling on QCFractal

    • being able to apply an error cycling policy on a per-collection basis; the policy would specify how to error cycle different error types, whether to cycle at all, how many cycles, etc.

    • DD: will draft an issue capturing these ideas, basic theory of operation, etc. on QCFractal

Upping iterations on live calculations

Pavan

  • PB: is there a way to increase the number of convergence iterations on existing optimizations / point calculations?

    • DD: there isn’t unfortunately; we have done some hacks in the past where we start up a manager with modified max iterations baked in, but this isn’t something we can scale well

    • iterations is a parameter that generally doesn’t change the character of the calculation, so we’ve had discussions as to whether this can be made mutable

    • BP: been a topic we’ve hit before, and we don’t have a good solution; does get tricky because another parameter you might want to change to get convergence are tolerances, but these can change the nature of the calculation

    • DD: with the changes coming in the Server and Client this may become more possible to implement, even if care should be taken in use

Action items

@David Dotson will investigate #173 dual issues (see notes for detailed findings)
@David Dotson will create PR for CI against psi4-containing prod environments, run openff-qcsubmit tests; update psi4 in prod envs to newest available Linux version
@David Dotson will review and merge #197
@Trevor Gokey will try reducing thread count, increasing per-request result count, for Hessian pulls; report issues or if he finds a sweet spot in performance without timeouts
@David Dotson will articulate automatic error cycling as a policy attached to Collections as a new feature in QCFractal
@David Dotson will articulate mutability of convergence iterations as an issue on QCFractal in the context of upcoming Server and Client functionality

Decisions

Â