Do we need to change the index system we use for molecule submissions?
More targeted error cycling; what else do we need in the report for decision-making?
Upcoming science support
Enforced c1 symmetry in psi4 is almost ready
Larger advances
Automated FF coverage gap identification, torsion prioritization, submission generation
Benchmarking (dashboard, etc.)
Discussion topics
Item
Presenter
Notes
PCM-based implicit solvent
Simon
PCM appears to be working on the COH submisson
JH: Also first dataset storing wavefunctions/eigenvalues, so another first
SB: storage and retrieval working just fine!
DD: would be worth showing this off at next show-and-tell; I’ll find out from Jeff
Submissions
SB: COH is about 50% complete
don’t have error cycling in place for basic DataSets yet; will get today
PB: genentech optimization; working on first submission. Only 20% of dataset would be submitted in this first run
is this acceptable?
Yes, we’ll proceed with the smaller, 127 molecule (20%) subset for the first submission
DD: feel free to reach out to me when desired; we’ll re-roll the PR off of master (DD messed up long-lived branches with squash merges)
PB: protomers/tautomers
JH: fewer tasks than there are conformers; due to QCF index not being case sensitive, and some of the SMILES clash when reduced to lowercase
Do we have another solution? Do we drop the use of SMILES for the index?
TG: For torsiondrives, this is still useful.
JH: Still want to be able to group molecules that are just peer conformers
JH: change how we index molecules, just do molecule-0, conformer-0, basically avoid SMILES for OptimizationDatasets, Basic DataSets; keep SMILES as index on TorsionDrives
TG: May still run into issues on this with TorsionDrives, but like this because we tag the driven torsion
DD+BP: Could also go with removing the lowercase-casting on indices; would be almost a trivial change, and non-destructive for database access (we’ll pursue this)
Issue raised:
DD: PEPCONF
We’re getting some user pressure; why is it proceeding slowly?
Decide on a rebalancing of priorities for datasets:
reduce priority to low for some optimization sets
TG: Many of these molecules will take a lot of memory > 50GiB
DD: Perhaps time to scale up all our nodes to a minimum amount of memory for QM jobs
Do we know if there are ways to reduce the memory usage of Optimizations?
BP: Psi4 can write to disk if needed when memory gets constrained
DD: I will reduce the memory offered to the manager to below the constraints given to each worker; may trigger writing to local storage
also increase the total memory of each replica to 64GiB
Could also scale the CPUs to 32, perhaps even 64
We’ll increase the priority of PEPCONF to high
TG: will reduce number of workers deployed, see if this reduces pre-emption frequency
Phenyl Dataset - will start to starve others
DD: I’ll touch base with Jessica, find out timeline needs for Phenyl set
Strategies for user timelines, expectations
David
JH: I think we can be faster in merging datasets now, especially with STANDARDS coming into place
DD: we’re already defaulting to ‘high’ priority for fitting datasets, more discretionary for others
JH: Some of the datasets were from PI pressure to get things running; could be re-tagged to ‘low’ priority
DD: compute tags are an avenue for controlling flows, but dangerous if we park tasks in a compute tag for which we have no managers
Dataset index
Josh
Probably good to merge; can’t find the script used to generate
DD: we can merge and manually curate for now, add automation later
Error Cycling
David
TG: Restarts of SCF convergence, optimization convergence appear to clear often enough, probably don’t want to exclude these
High memory for psi4 can be dealt with through better configuration of workers (setting memory available to less than memory allocated on the node)
DD: We’ll close for now; can chew on more ways to utilize compute tags for routing, how we want to filter error cycling
Enforced C1 symmetry
Josh
C1 symmetry is coming in Psi4, old datasets where we didn’t do this will still work
if method requires a specific symmetry, psi4 will set it itself
Action items
David Dotson will get next show-and-tell date from Jeffrey Wagner, relay to group for PCM, wavefunction demonstration
David Dotson will add in error cycling for basic DataSets to lifecycle
Pavan Behara will proceed with Genentech dataset, with initial submission only including smaller molecules (~20% of the full dataset); reach out to David Dotson for help fixing the branch/PR when ready
Trevor Gokey will experiment with reducing the number of workers deployed on pre-emptible queues, see if this positively impacts pre-empt frequency; potentially reach out to admins for assistance
David Dotson will re-work PRP deployment of QM workers with manager limits below those given to the container; use fewer CPUs, more memory per replica, more replicas
David Dotson will touch base with Jessica Maat on timeline needs for Phenyl set; assess priority of other sets relative to it
David Dotson will review and merge the index on qca-dataset-submission; create issue for automated curation
Decisions
No labels
0 Comments
You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.
0 Comments