2020-03-04 QCA Submission Meeting notes

Date

Mar 4, 2020

Participants

  • @Jeffrey Wagner

  • @Joshua Horton

  • @Jessica Maat (Deactivated)

  • @Hyesu Jang

Goals

  •  

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

 

Dataset filtering

 

  • Maat and Jang need to make code to make training data for the next fitting generation

  • HJ – Will be based on taking a large dataset of compounds, clustering, and picking the molecule with a lot of coverage

  • JM – Open to a lot of different options for how to do this

  • JH – Filtering could become a component in the submission workflow

  • JM – This dataset would be used for the Sage fit

  • JW – Are we approaching a limit on how much data we can feed into ForceBalance?

    • JW + HJ – Let’s assume no upper limit for now

  • HJ – We will use Roche set, coverage set, new set

What will dataset look like?

  • We have 200 torsion terms in our FF, so we’d want 5 scans for each torsion, so 1000 torsiondrives

 



QCArchive submission





 

Timeline

 

HJ – Running optimization takes ~1 day, sometimes need to run 5 times for data trouble.

JW – Assuming 2 days per torsion, time 1000 torsions.

Action items

@Jeffrey Wagner will tell @John Chodera NOT to submit 50k dataset, or to submit at LOW PRIORITY. We will need bandwidth for this submission
@Jessica Maat (Deactivated)@Hyesu Jang will coordinate to work together on this in the coming weeks. Target date for submission is March 20th
@Joshua Horton will make a checklist for pre-submission (bond orders requested, CMILES attached)

Submission checklist

Ensure all submissions have cmiles, most important are mapped hydrogen smiles
Ensure the WBO is requested for all submissions, this should be included in the scf properties list using the flag wiberg_lowdin_indices
If any calculations are to be redone from another collection re-use the old input (coordinates, atom ordering etc) used as this will avoid running the calculation again and will just create new references in the database to the old results and should help keep the cost of the calculations down.

Decisions

  1. Jeff approves this being a pile of spaghetti code given time constraints