Page Comparison

...

Item

Presenter

Notes

PCM-based implicit solvent

Simon

PCM appears to be working on the COH submisson
JH: Also first dataset storing wavefunctions/eigenvalues, so another first
SB: storage and retrieval working just fine!
DD: would be worth showing this off at next show-and-tell; I’ll find out from Jeff

Submissions

SB: COH is about 50% complete
- don’t have error cycling in place for basic DataSets yet; will get today
PB: genentech optimization; working on first submission. Only 20% of dataset would be submitted in this first run
- is this acceptable?
- Yes, we’ll proceed with the smaller, 127 molecule (20%) subset for the first submission
- DD: feel free to reach out to me when desired; we’ll re-roll the PR off of master (DD messed up long-lived branches with squash merges)
PB: protomers/tautomers
- JH: fewer tasks than there are conformers; due to QCF index not being case sensitive, and some of the SMILES clash when reduced to lowercase
  - Do we have another solution? Do we drop the use of SMILES for the index?
  - TG: For torsiondrives, this is still useful.
  - JH: Still want to be able to group molecules that are just peer conformers
- JH: change how we index molecules, just do molecule-0, conformer-0, basically avoid SMILES for OptimizationDatasets, Basic DataSets; keep SMILES as index on TorsionDrives
- TG: May still run into issues on this with TorsionDrives, but like this because we tag the driven torsion
- DD+BP: Could also go with removing the lowercase-casting on indices; would be almost a trivial change, and non-destructive for database access (we’ll pursue this)
  - Issue raised:
    Github link macro
    link https://github.com/MolSSI/QCFractal/issues/650
DD: PEPCONF
- We’re getting some user pressure; why is it proceeding slowly?
- Decide on a rebalancing of priorities for datasets:
  - reduce priority to low for some optimization sets
- TG: Many of these molecules will take a lot of memory > 50GiB
- DD: Perhaps time to scale up all our nodes to a minimum amount of memory for QM jobs
  - Do we know if there are ways to reduce the memory usage of Optimizations?
  - BP: Psi4 can write to disk if needed when memory gets constrained
  - DD: I will reduce the memory offered to the manager to below the constraints given to each worker; may trigger writing to local storage
    - also increase the total memory of each replica to 64GiB
    - Could also scale the CPUs to 32, perhaps even 64
- We’ll increase the priority of PEPCONF to high
- TG: will reduce number of workers deployed, see if this reduces pre-emption frequency
Phenyl Dataset - will start to starve others
- DD: I’ll touch base with Jessica, find out timeline needs for Phenyl set

Strategies for user timelines, expectations

David

JH: I think we can be faster in merging datasets now, especially with STANDARDS coming into place
DD: we’re already defaulting to ‘high’ priority for fitting datasets, more discretionary for others
JH: Some of the datasets were from PI pressure to get things running; could be re-tagged to ‘low’ priority
DD: compute tags are an avenue for controlling flows, but dangerous if we park tasks in a compute tag for which we have no managers

Dataset index

Josh

Probably good to merge; can’t find the script used to generate
DD: we can merge and manually curate for now, add automation later

Error Cycling

David

TG: Restarts of SCF convergence, optimization convergence appear to clear often enough, probably don’t want to exclude these
- High memory for psi4 can be dealt with through better configuration of workers (setting memory available to less than memory allocated on the node)
DD: We’ll close for now; can chew on more ways to utilize compute tags for routing, how we want to filter error cycling

Enforced C1 symmetry

Josh

C1 symmetry is coming in Psi4, old datasets where we didn’t do this will still work
- if method requires a specific symmetry, psi4 will set it itself

Action items

David Dotson will get next show-and-tell date from Jeffrey Wagner, relay to group for PCM, wavefunction demonstration
David Dotson will add in error cycling for basic DataSets to lifecycle
Pavan Behara will proceed with Genentech dataset, with initial submission only including smaller molecules (~20% of the full dataset); reach out to David Dotson for help fixing the branch/PR when ready
Trevor Gokey will experiment with reducing the number of workers deployed on pre-emptible queues, see if this positively impacts pre-empt frequency; potentially reach out to admins for assistance
David Dotson will re-work PRP deployment of QM workers with manager limits below those given to the container; use fewer CPUs, more memory per replica, more replicas
David Dotson will touch base with Jessica Maat on timeline needs for Phenyl set; assess priority of other sets relative to it
David Dotson will review and merge the index on qca-dataset-submission; create issue for automated curation

...

Version	Old Version 5	New Version 6
Changes made by	David Dotson	David Dotson
Saved on	Nov 20, 2020	Nov 20, 2020

Versions Compared

Key

Action items