2021-09-13 Core Developers meeting notes

Participants

@Simon Boothroyd
@Iván Pulido
@Matt Thompson
@David Dotson
@Lily Wang
@Jeffrey Wagner

Discussion topics

Item	Notes

Item	Notes
General announcements	JW – Will no longer do sprint planning on Mondays - JW will meet with MT and DD separately to not apply time pressure to this timeslot.
Updates	SB Worked on SMIRNOFFEE – Takes interchange object and evaluated energies in a differentiable way. Worked on Descent – A framework for evaluating loss functions using pytorch, can help eg Pavan on fitting experiments without being blocked by forcebalance releases. Works for loss functions that don’t require minimization (to include minimzations, will need to implement finite difference gradients if we want to include minimizations). It would be worth considering exactly how to implement minimization, since some methods would be much more expensive than others. IP Started work in Chodera lab. Working on getting familiarized with Perses and fix some bugs. Currently not as modular+tested as it could be. There’s a big demand for residue perception – Counting residues and knowing which atoms belong to what. JC is interested in a specific set of features, would be good to touch base with him on how that compares to current API Added some caps to residue library. Used CCD, but they weren’t in aa_variants.cif, so I had to identify and collect them from other parts of the whole database. MT More OpenMM 7.6 fallout Moving forward, everything will assume `openmm` namespace and make reasonable efforts to fall back to `simtk` Eventually, any use of `simtk` namespace will slowly break. Will resist urge to support it long-term `openff-units` now has an `openmm` module, `simtk` module is deprecated IP – What are your recommendations for other packages for dealing with the OpenMM transition? MT – Pinning to OMM 7.5 is an acceptable “stop the bleeding” fix. MT – The next better solution is `try: import openmm \| except import simtk.openmm` MT – Long-term, if there’s no requirement to support old versions of openmm, can pin to `openmm>7.6` MT – I think there’s a bigger strategic lesson here - unmaintained packages require extra care and may require more extensive pinning or active involvement in maintenance. IP – Agree. JW – When OpenFF went from `import openforcefield` to `import openff.toolkit` , one helpful thing was to change the whole package name. Conformer generation failures now properly handled by toolkit Custom exception, behaves as the intended “try each toolkit, only fail if all toolkits fail” approach IP – Is there a way to tell whether you use one toolkit or another? MT – If you do `Molecule.generate_conformers`, this previously would only try the first available toolkit, and if that failed you’d get an error. Now it will try ALL possible toolkits, and if they ALL fail, you’ll get a big combined error message JW – To only use a particular toolkit, instead of `Molecule.generate_conformers`, do `RDKitToolkitWrapper.generate_conformers(offmol)` SB – Currently the OpenEye wrapper first tries to generate with strict stereo, then if that fails, it turns off Maybe here? - SB – It may be better to document at a high level when and why stereochemistry is required, and how the user can customize their workflows/the strictness of OpenFF molecule operations. (General) What all are the concerns here? Do toolkits have significantly different behavior WRT conformer generation? Is the current documentation/feedback appropriate for conf gen with undefined stereo? Matrix representations can now be mapped “back” into Interchange flake8 PR up for the toolkit ~80% done with a prototype of the toolkit that does not depend on OpenMM. Most things were straightforward to convert, but FF and parameter objects are very openMM-centric. DD Partner benchmark is completed – Lorenzo is now in charge, will keep prompting partners to run final calcs and return results. So season 1 is in his hands and I’m supporting his efforts. Now have torsiondrives merged into QCEngine – Thanks SB! – Lori Burns is aiming to make a release later this week. Will wrap this into openff-gopt, the package for general optimizations and torsiondrives using our project defaults. Openff-gopt will simplify benchmarking greatly during refactor SB – JHorton in the new QCSubmit has some new API points for going between QCSubmit and QCEngine, is there overlap with anticipated funcitonality for gopt? DD – In OpenFF-benchmark, there are entrypoints for doing 1) molecule optimizations (using compute procedure on QCEngine), which consumes SDFs from file system and is triggered using CLI and 2) torsiondrives using similar structure via CLI. We found that people like this functionality outside of the context of benchmarking. So gopt is an effort to split these out into a standalone tool. Does this seem like it’s worth having a standalone package? SB – It seems like this could be a CLI entrypoint into QCEngine. DD – There’s a small existing CLI to qcengine, but it expects json blobs, and there’s a big gap for users wrt “how do you make the json blob?” SB – May be good to put what we can into a maintained package. So maybe gopt could be a strawman for a CLI proposal in QCEngine. DD – Using gopt as a prototyping ground would make sense, with the intent to migrate into QCEngine. JW – I’m doubtful that MolSSI wants to read SDFs and write CMILES, so it may be a hard sell to ask them to integrate all of these deps. So I agree that gopt will be a good proving ground, but maybe we should target QCSubmit as a likely destination. (though, it’s possible that MolSSI will embrace the tools to generate CMILES) JW – There’s a joint grant that we just got with MolSSI – I’d like to keep our hands off of it unless we have clear goals for what 1-2 engineer years could be used for. DD – Could implement standards v3 in QCArchive. DD – Could also do error cycling on-server in QCArchive On QCArchive, I’m working on finishing old datasets - Lots of work with BPritchard on technical side, clearing old tasks and other improvements. This will clear the way to completing industry dataset, Pavan, Cavender, and Eastman datasets. Will be spending half-2/3rds of my OpenFF time on PLBenchmarks on F@H for the next few months. LW From going to Grand Canyon + meetings, only 2 days work, most of it ANU-related Q: running openff-evaluator without creating a server. Tried running on ANU servers but got tangled up in protocol/connection issues. Is this possible in principle? Due to resource limits, it’d help if we can manually do bits of the workflow. SB – Depends exactly what you’re trying to do. In principle can run without a server. Same with removing calculation backends. We should have a separate call. DD – this is similar to what we needed to do for benchmarking – Lots of people didn’t want to run QCArchive server, and instead wanted a ton of directories and queue submission scripts. SB – Looking to run solvation energies? LW – Yes SB – Can take physical prop dataset, loop over all props, make a cluster submission for each (a `run.py` and a data `json` describing the job to run), then dispatch 1 property calc/gpu. Q: any ideas for using Python without conda or a container / minimize conda files. We get a limited number of files (not file SIZE, but number of files) DD – One solution I’ve seen is to have each worker install conda in scratch space. LW – Nodes don’t have internet access SB – Building minimal venv outside of conda https://pythonspeed.com/articles/conda-docker-image-size/ JW – Could build whole env on head node, then zip the `miniconda3/` folder and have each job unzip this locally. Also I could show how to make a single-file installer (though this still would need to install and may still hit file bottleneck). Also could try installing the `openff-toolkit-base` package, though this isn’t super minimal in terms of size. This week: computing HF/6-31G* surfaces of conformers to correlate to physical properties (maybe – post-chat with Owen Madin) SB – I saw your pinging OM on his confluence page. I think I actually wrote that page and OM published it. Feel free to join our chat tomorrow at 9 AM Pacific to discuss this. JW Bespokefit workshop assistance (JHorton did a great job!) Working on biopolymer infrastructure - HierarchyElements and Schemes. MT – Status of TypedMolecule? JW – Not yet started. It’s next on my list after residue/hierarchy iterators. Coming next in toolkit – Connectivity rearrangement detection and constrained AM1 charges. Research into whether we get better consistency between toolkits with AM1 ELF1 SB – JC had posted another library for doing AM1 charges. JW – I’m not sure that the restrained minimizations will provide significant enough benefit to be worth implementation in the short term. SB – What’s the threshold for “small” differences? Not just absolute value change, but rather effect on hydration free energies or other properties. LW – I’m kinda planning to look into how much differences in partial charges turn into differences in simulation properties outcomes. SB – Reminds me of Mobley lab study that arbitrary choice of carboxylic acid protonation leads to large changes in outcome. LW – I recall that study, but the study was done on a small set and the outcome wasn’t systematic. Been somewhat overwhelmed/sidetracked lately - Will be trying to do all meetings/project planning on Weds+Thurs

Meetings

2021-09-13 Core Developers meeting notes

Participants

Discussion topics

Action items

Decisions