Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Item

Presenter

Notes

PCM-based implicit solvent

Simon

  • PCM appears to be working on the COH submisson

  • JH: Also first dataset storing wavefunctions/eigenvalues, so another first

  • SB: storage and retrieval working just fine!

  • DD: would be worth showing this off at next show-and-tell; I’ll find out from Jeff

Submissions


  • SB: COH is about 50% complete

    • don’t have error cycling in place for basic DataSets yet; will get today

  • PB: genentech optimization; working on first submission. Only 20% of dataset would be submitted in this first run

    • is this acceptable?

    • Yes, we’ll proceed with the smaller, 127 molecule (20%) subset for the first submission

    • DD: feel free to reach out to me when desired; we’ll re-roll the PR off of master (DD messed up long-lived branches with squash merges)

  • PB: protomers/tautomers

    • JH: fewer tasks than there are conformers; due to QCF index not being case sensitive, and some of the SMILES clash when reduced to lowercase

      • Do we have another solution? Do we drop the use of SMILES for the index?

      • TG: For torsiondrives, this is still useful.

      • JH: Still want to be able to group molecules that are just peer conformers

    • JH: change how we index molecules, just do molecule-0, conformer-0, basically avoid SMILES for OptimizationDatasets, Basic DataSets; keep SMILES as index on TorsionDrives

    • TG: May still run into issues on this with TorsionDrives, but like this because we tag the driven torsion

    • DD+BP: Could also go with removing the lowercase-casting on indices; would be almost a trivial change, and non-destructive for database access (we’ll pursue this)

      • Issue raised:

        Github link macro
        linkhttps://github.com/MolSSI/QCFractal/issues/650

  • DD: PEPCONF

    • We’re getting some user pressure; why is it proceeding slowly?

    • Decide on a rebalancing of priorities for datasets:

      • reduce priority to low for some optimization sets

    • TG: Many of these molecules will take a lot of memory > 50GiB

    • DD: Perhaps time to scale up all our nodes to a minimum amount of memory for QM jobs

      • Do we know if there are ways to reduce the memory usage of Optimizations?

      • BP: Psi4 can write to disk if needed when memory gets constrained

      • DD: I will reduce the memory offered to the manager to below the constraints given to each worker; may trigger writing to local storage

        • also increase the total memory of each replica to 64GiB

        • Could also scale the CPUs to 32, perhaps even 64

    • We’ll increase the priority of PEPCONF to high

    • TG: will reduce number of workers deployed, see if this reduces pre-emption frequency

  • Phenyl Dataset - will start to starve others

    • DD: I’ll touch base with Jessica, find out timeline needs for Phenyl set

Strategies for user timelines, expectations

David

  • JH: I think we can be faster in merging datasets now, especially with STANDARDS coming into place

  • DD: we’re already defaulting to ‘high’ priority for fitting datasets, more discretionary for others

  • JH: Some of the datasets were from PI pressure to get things running; could be re-tagged to ‘low’ priority

  • DD: compute tags are an avenue for controlling flows, but dangerous if we park tasks in a compute tag for which we have no managers

Dataset index

Josh

  • Probably good to merge; can’t find the script used to generate

  • DD: we can merge and manually curate for now, add automation later

Error Cycling

David

  • TG: Restarts of SCF convergence, optimization convergence appear to clear often enough, probably don’t want to exclude these

    • High memory for psi4 can be dealt with through better configuration of workers (setting memory available to less than memory allocated on the node)

  • DD: We’ll close for now; can chew on more ways to utilize compute tags for routing, how we want to filter error cycling

Enforced C1 symmetry

Josh

  • C1 symmetry is coming in Psi4, old datasets where we didn’t do this will still work

    • if method requires a specific symmetry, psi4 will set it itself

Action items

  •  David Dotson will get next show-and-tell date from Jeffrey Wagner, relay to group for PCM, wavefunction demonstration
  •  David Dotson will add in error cycling for basic DataSets to lifecycle
  •  Pavan Behara will proceed with Genentech dataset, with initial submission only including smaller molecules (~20% of the full dataset); reach out to David Dotson for help fixing the branch/PR when ready
  •  Trevor Gokey will experiment with reducing the number of workers deployed on pre-emptible queues, see if this positively impacts pre-empt frequency; potentially reach out to admins for assistance
  •  David Dotson will re-work PRP deployment of QM workers with manager limits below those given to the container; use fewer CPUs, more memory per replica, more replicas
  •  David Dotson will touch base with Jessica Maat on timeline needs for Phenyl set; assess priority of other sets relative to it
  •  David Dotson will review and merge the index on qca-dataset-submission; create issue for automated curation

...