DH – Worked on benchmark datasets. Contributing to Lim paper by adding OPLS3e and new openff versions. Could use some support/recommendations for versioning protein-ligand benchmark datasets.
JRG – OpenMM Conda-forge migration. Working with Anthony Scopatz. Kickoff meeting this coming Friday. This overlaps with some other work that Scopatz is doing, and may mean that he GPU CI infrastructure will be built with our needs in mind. Looking at OpenMM on C-F by october.
SB – Mostly writing papers. Close-to-first-pass at recharge implemented in ForceBalance. Separates normal parameters from “mathematical parameters”. Need input from Lee-Ping. Need to package Recharge – Would like to build it on conda-forge, but it requires psi4 and openforcefield.
SB – Can I do “alpha” or “prerelease” versions? (eg 0.0.1a3)
JRG – Yes, just have a consistent semantically/lexically-ordered system. They may only accept certain patterns. May be possible to add to build string if they don’t like your naming pattern. Could ask at their gitter.im room.
MT – Started working on conformer generation CLI tool. Most test cases work, some ambiguity about molecule interpretation and output file naming. Needs feedback. Made openforcefield.cli module, which currently only has the python file. Basing this script off conformer energy one. Have the choice to either run the script top-to-bottom with lots of comments, or refactor into functions. I think the latter is more readable. Not sure where those functions should live (in toolkit vs. somewhere else).
JW – I think all the running code should stay in the script, since it’s aslo de facto a source of documentation and education
SB – Think we shouldn’t try to teach people to interact with the code this way, since it’ll be hard for pharma people to debug. Having scripts call refactored functions in the toolkit itself would be better. People who want to modify the CLI could access an equivalent notebook.
MT – (Showed current progress)
JW – Can we automatically render jupyter notebook to script? Or ensure they’re the same?
MT – mdconvert should do this, but it might be mangled
SB – CLI can be feature-rich “black box”. The notebook SHOULDN’T be feature-rich. These can be explicitly decoupled. The notebook could be in examples
DH – Could import script functions into notebook
JW – I wouldn’t want the notebook to import from the CLI, since that hides code.
SB – Agree that notebook shouldn’t import from script.
MT – If they have different functionality, we should keep the notebook and CLI far apart.
SB – Comments in CLI and notebook should talk about each others' existence, and point to each other
JW – To keep PR finish-able, I’m happy to approve it with a script only. But I’d definitely want to add a notebook shortly.
DD – Entrypoint?
JW – I don’t want an entrypoint, since that’ll obfuscate where the code is coming from
SB – If we advertise both the CLI and the notebook, then folks who should be exposed to source code will see the notebook.
(Dedicated CLI repo?)
JW – Would be nice, since it’ll be easier to do rapid releases
DD + SB – Separate repo would be great. Some questions about non-Toolkit CLI tools. Then they can be batch-downloaded using git clone.
DD – CLI could be accessed as openff command1 command2 …
SB – Parallel CLI and notebooks will be hard, since it’ll be easy for them to go out of sync
(Do pharma people want to see inside scripts?)
(general) no
People who want to see inside functionality can look at simpler notebooks.
(GUI tools?)
JW – I’d like for us to stay focused on being a toolkit provider, not a platform provider (per Karmen’s categorization system, from Cresset/Mackey)
(General) – We don’t want to do GUIs
JW – In the long run, we’ll want to move to a dedicated CLI repo, where each tool has a parallel, simplified example. For this PR, I’m OK to merge it into toolkit with no notebook in the interest of making it finish-able.
MT – OpenFF build matrix always installed RDKit, which could have let to silent OE failures, where OE failed but tests used RDKit instead. Working on a PR to make that stricter. Working on some other PRs as well. Benchmarking dashboard will be revived, by Trevor, who actually wants to work on benchmarking and has enough knowledge to do it.
JW – He can take infrastructure responsibility for this, but let’s make sure he has an out for data responsibility. If he’s hesitant about becoming the benchmarking data owner, let’s advertise the post to other people (though, with data ownership comes publications)
MT – This also ties in with delineating my responsibilites about potentially being a “dataset czar”, regarding how much do I drive policy vs. specific data generation/ownership
DD – MM data on QCA makes BP nervous, since it can quickly balloon in size and scope, and overwhelm QCA space and bandwidth
JW – I see it as a possibility that we’d host a separate QCA for MM data in the future. But we could easily have 10-20 MM operations for each final QM molecule, with different minimization schemes and whatnot.
DD – It’ll be hard to quickly iterate on MM schemes, since we cross organization boundaries and have release timeline delays.
MT – If we’re going to run into this problem eventually, then we should make our plan A be to start moving toward migrating to our own infrastructure.
DD – We will eventually need our own dataset storage period, whether it’s QCA-like or something else
SB – Agree. Cloud hosting isn’t horribly expensive. Heroku could get us a persistent database for just $hundreds per year
DD – Egress costs could get nasty
DD – QCA lifecycle automation, removing human intervention. Quick refactor to handle PRs with multiple datasets. Had issue with card management in situations where one PR includes many datasets. Still seeing issues with persistent incompletes – Seems like jobs are being successfully completed and returned to QCA. It looks like there’s leaky logic in QCA for setting job completeness. We need more QCA datasets.
DD – BP is considering a major refactor of QCF. Shouldn’t affect rest of QC* ecosystem. Currently a bit of technical/design iteration debt. Notes on this at 2020-07-31 QCA Submission Meeting notes
DD – Will try to get to PE work this week. Will focus on getting it run locally, and then moving toward pAPRika branch work.
JW – Dataset work. Will present dataset provenance/design thoughts at next all-hands.