/
2024-06-11 BW/LW Check-in

2024-06-11 BW/LW Check-in

Participants

  • @Brent Westbrook (Unlicensed)

  • @Lily Wang

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

General updates and discussion on projects

 

  • Torsion splitting in 2.2

    • Still where it was before OMSF workshop

    • Torsion shapes next steps?

      • Comparing functional form to QM data

    • (still here, waiting on below)

  • Fragment dataset curation code

    • Trying to get all torsion parameters (~65 total) covered

    • Found couple issues with database – fragments didn’t match any parameters. Went back to matching against full molecules

    • Have a set of SMILES and torsions in SMILES that match ~half parameters not covered (see below point)

    • Current stage: For parameters that didn’t match any molecules in ChEMBL, will make those by hand and possibly expand with fragments in database (~28)

      • Anticipate ~day

      • LW: if this takes more than 20-30 min, might not have any good molecules and possibly skip

      • BW: have gone through this already and trimmed out a few parameters that I couldn’t match

    • LW: are you currently matching one molecule per torsion or multiple?

      • BW: currently 1, except in a few cases where the molecules it was matching looked quite different

    • BW: also checked molecules for charging issues, and running into omega conf generation issues (e.g. [S3+])

    • ChEMBL database is ~12GB (includes table of just fragments, and table of molecules. Fragments comprise most of the space)

      • Molecules get inchi keys computed

    • How long does it take to query for parameters?

      • 1-1.5 hours

      • ~2 million molecules or so

    • eMolecules has offered two datasets:

      • 50 M “eco-lite” set

      • full 3.9 trillion set

    • BW: the database has a number of fields, e.g. elements to make querying faster. It also handles duplicates, so if there’s lots, we can save space – but 3.9 T is a big jump

    • BW: took 24+ hours to fragment + store ChEMBL itself.

      • LW: does this include throwing out invalid molecules e.g. with radicals?

      • BW: no, staying in RDKit for this, so everything accepted

    • LW: planning to ask advisory board for ideas on expanding the benchmark dataset

    • BW: would it be worth pulling this out into a package for others to use? Sent Lexie some stuff but maybe we should pull it out

      • #ff-fitting?

      •  

  • LW – let’s plan projects and effort allocation over the next year

  • Overall budget: 40 weeks

  • 20% infrastructure: 8 weeks

  • 80% science: 32 weeks

** infrastructure
probably best equipped to work on

  • yammbs

  • qcsubmit

  • possibly some RDKit stuff in the toolkit
    I've also read a lot of ForceBalance code but not implemented any features

    • LW: maybe focus FB effort on alternatives

  • torsion multiplicity project

    • Estimated time:

      • Iteration: TD set, Opt set?, fit, and benchmark: ~3 weeks expected

      • 6 weeks

  • PDB fraction
    I've been interested in trying to quantify which of our parameters are good,
    which I think relates pretty closely to this

  • besmarts

    • 4 weeks

  • FB replacement + alternative functional forms

    • 4 weeks

  • dataset curation (~7 weeks)

    • Develop better benchmarking dataset with wider chemical coverage

      • 3 weeks

    • Work with interested parties to make sure their areas of chemistry are covered

      • 1 week

    • Fragment database

      • 3 weeks

  • standardized benchmarking
    like I mentioned in the meeting the other day, I'm really interested in a
    setup where we basically push a button and it tells us if a new force field
    is good or not

    • LW: science or technical side? e.g. automation vs developing new benchmarks?

    • BW: more technical – e.g. passing a force field through “CI” and getting back results

    • LW: meaning on the roadmap --

      • YAMMBS

      • developing new benchmarks, e.g. dimer benchmarking, Chapin’s NMR benchmarks, condensed phase properties, solvation free energies

      • BW: interested in both science and technical

      • Science side: 3 weeks

  • Infrastructure: 8 weeks

  • Torsion multiplicity: 6 weeks

  • PDB fraction: 8 weeks

  • besmarts: 4 weeks

  • smee/alternative functional forms: 4 weeks

  • dataset curation: 7 weeks

  • standardized benchmarking: 3 weeks

  • Plans for next week?

    • datasets until end of the week

    • fititng + bencharking next week

 

 

 

Action items

Decisions

 

Related content

2021-03-30 Bespoke Fitting meeting notes (sci)
2021-03-30 Bespoke Fitting meeting notes (sci)
More like this
2024-05-08 Meeting notes
2024-05-08 Meeting notes
More like this
2024-04-23 BW/LW Check-in
2024-04-23 BW/LW Check-in
More like this
2023-08-14 Meeting notes
2023-08-14 Meeting notes
More like this
2024-07-17 BW/LW Check-in
2024-07-17 BW/LW Check-in
More like this
2024-08-07 FF fitting meeting
2024-08-07 FF fitting meeting
More like this