2021-11-15 Core Developers meeting notes

Participants

@David Dotson
@Pavan Behara
@Matt Thompson
@Lily Wang
@Chapin Cavender
@Jeffrey Wagner

Discussion topics

Item	Notes

Item

Notes

General updates

JW – Be aware that Thanksgiving is next week – Most US based employees will be off Thursday and Friday.
MT – I’ll be taking the rest of my vacation for the year and will be unavailable for a lot of December
JW – Same, people should check the calendar if they need me after Dec 15
DD – Same

CC
- Working on following up with stragglers on LiveCOMS review
- Tried to submit dipeptide dataset on QCFractal on Fri, botched my conda environment, I think I’ve got it fixed so I’ll try again this afternoon
- Working on getting library charges for protonation states of amino acids. Having trouble with OpenEye. It’s probably from confs that have hbonds on carboxylic acid groups of side chains.
  - JW – Ran into the same thing last week with bespokefit. I’d like to have a way to make it fall back to a nicer behavior if all the carboxylic acid confs are bad.
  - CC – Yeah, I’m trying to think of a manual fix or some other way to handle this.
  - LW – I used rdkit when I did this and didn’t have a problem. I basically used Simon’s ELF code in the toolkit for this.
  - JW – I’ll keep thinking about a general solution for this – Will contact you if I have a fix.
MT
- General PR wrangling
  - Non-biopolymer non-ParmEd example in toolkit. Found a really odd amber error that causes a memory access error when the number of residues in a prmtop is mis-set.
  - Docs improvements – Users guide, changelog, other fixes.
  - Upstream beta/RC tests. Testing against OpenEye and OpenMM RCs. Already caught an upcoming breaking change OpenMM.
  - Bringing units package to to parity with Evaluator units. Some edge cases, eg OpenMM and SI disagree on what a “dalton” is. Polished and made a new release.
- Feedback from protein-ligand example. Mostly requests for improvements in system building experience. Think we’ll need Topology.from_pdb to load multi-protein and water files.
- MolSSI MMIC proposal. MolSSI recommended that we take the importers and exporters out of interchange and make them MMIC components. I declined this offer.
  - CC – Could you provide details?
  - MT – The project isn’t well adopted and doesn’t have a clear path forward. Getting this tied in with an uncertain effort would be a huge liability for us.
  - JW – Agree. QCFractal appears to be moving forward despite MolSSI, not because of it. I’m worried that MMIC would go the same way.
  - DD – Agree that it’s a big liability to tie ourselves in with something that’s externally developed. If they gain more adoption in the future we can always revisit this.
- Started moving stack to M1-compatible builds. Released mdtraj with M1 fixes. We’re still waiting on AmberTools and its dependency chain (parmed is done, packmol seems unmaintained, netcdf-fortran is messy). None of this is super blocking – Rosetta can emulate support for existing packages.
DD
- QCArchive
  - changes in openff-qcsubmit in to support Chapin's dipeptide work
  - updating prod envs today
  - scheduled time with Willa on Wednesday to get set up with QCFractal
  - error cycling fix for basic dataset, but error cycling now broken again. Will work on this shortly.
  - submitted two datasets from Pavan in support of SPICE
    - reviewing remaining submissions from Pavan today
- Protein-Ligand Benchmarks
  - began process for obtaining cloud compute, recognize value to OpenFF of utilizing John's resources for development pathway
  - meeting with him today to discuss what's needed to get resources from AWS programs
    - utilizing Covid as the conduit for funding there
    - might take a while for OpenFF/OMSF to secure its own resource streams here
  - DD – Shared infrastructure for hosting work server and storage - Could be big synergies by sharing with JChodera
  - JW – I can bring this us in project planning and get budget approved.
  - DD – Already discussing this in #foldingathome
  - SB – DHahn may be able to help with cost/file size estimate. Could you share the rough numbers once we start estimating?
  - DD – It can vary widely. I’ll also ask John about the costs of his existing F@H infrastructure on #foldingathome
- Partner Benchmark
  - shipped retrospective survey; have feedback from 3 partners
  - Lorenzo is now spearheading benchmarking, and will lead Season 2 when it happens;
    transitioned regular meeting to every 2 weeks, pair-meeting with him on the off-weeks
  - all partners except BRI have delivered Sage results for manuscript
LW
- Worked through nonbonded issues and debugged my dataset with Simon; Simon has updated nonbonded.
  - Had to debug dataset, some enthalpies of mixing were really pure properties.
- The entire week spent trying to work directly with evaluator and ForceBalance, as pinning openmm=7.5.1 is not friendly to updated packages
- Evaluator seems to hang after a few (~3?) hours (no update of progress logs; no increase in file sizes; top shows Python processes with <20% or <10% CPU usage; ps aux shows some generic multiprocessing processes). Stopped at iteration 0 of FB, if properties have been computed, I haven’t found where the output is saved. Tried pushing a few switches, which did not seem to really help:
  - Using DaskLocalBackend (which worked fine for training a force field to 1 property for 15 iterations), with varying numbers of G/CPUs requested
  - Using DaskSlurmBackend, with varying numbers of processes requested
  - Tend to quit jobs after ~24 hours of apparent no progress, so burning a lot of Mobley $
- SB – I’d check which DASK version you’re using – They’ve had a lot of breaking changes. It’d be good to compare this to the version used for Sage. One is ”work stealing”, where jobs don’t get distributed for no clear reason. There’s also cases where a worker has died and the server doesn’t know to redistribute the jobs. You can check this by looking at the dask dashboard. If you can get this to the point where the jobs are hanging, we can have a quick call and I can show you my process for debugging this.
  - LW – It’s hanging now, so let’s chat right after this meeting.
  - dask 2.30.0 py_0 conda-forge dask-core 2.30.0 py_0 conda-forge dask-jobqueue 0.7.3 pyhd8ed1ab_0 conda-forge
- PsiRESP 0.2 is pip installable ( ), although RDKit and Psi4 are not. It’s been refactored to use the MolSSI QC stack. Waiting to see if/when QCFractal will patch the issues I’ve raised so I can release 0.2.1 without my own patches – will procrastinate conda recipe until then. Calculates RESP and RESP2 charges
- Previously: CZI meeting very interesting, lots of tips on managing a research project and on being good science citizens (mostly focused on community building). GSOC 2022 will be very different from previous; no longer limited to students but to all new contributors, longer time frame for more flexible hours
PB
- Submitted the openmm datasets, took longer than expected.
  - JW – Thanks for doing this, I know it’s not really your job but it could provide a lot of value.
  - PB – Thanks. JHorton has been doing more work than me.
  - SB – Do we have plans for the experiments that we could do with this data? Like alternate targets?
  - PB – We had talked about this but didn’t have firm plans. I can plan to do fitting studies using this, though I’m concerned about the different method/basis.
  - SB – It’d be really neat to see what we can do with this data, and how the fitting looks. There could be some really high-value studies that we could do.
  - PB – When we talk about “forcebalance replacements”, are we thinking of going entirely through OpenFF stack, or updating forcabalance and our plugins?
  - SB – The former. Completely replacing forcebalance withs omething built around pytorch.
- Some follow up work related to wbo.
- Tried looking into torsionnet from pfizer, they released scripts to train a neural net but not the trained model.
JW –
- Worked with Danielle Bergazin on bespoke fitting for polymer hosts. Proposed a fix for trans-COOH in fragmenter, where we fall back to single-conformer WBO if ELF10 WBOs fail.
  - SB – I responded to that issue. I think we should manually rotate the carboxylic acids in the input conformers if they have that problem. I did this in RDKitWrapper so we can probably lift code from there.
  - JW – There’s some weird OE API stuff where we can access the ELF10 auto-correcter when we do charge calcs, but not when we do WBO calcs.
  - SB – It’s proabbyl best in the long run to have the “fix carboxyliic acids” logic in the toolkit.
- Started some work on speeding up biopolymer refactor. Didn’t make a satisfactory amount of progress so I may be quiet today and tomorrow so I can focus on it.
- Mostly tech support
SB
- Tidying up infrastructure - Evaluator stuff (upgraded to support OMM 7.6, but not backwards-compatible; leads to some YANK issues; then made a new release)
- Fix for thermoml changes.
- Tidying up nonbonded – Instead of exclusively using REST API, now can also run locally.
- Bespokefit is moving forward - I’m working with JHorton to get it on conda-forge. Also working on multi-stage fits (like electrostatics, then vdW, then other stuff). Also working on where in the schemas people can set bespoke terms to generate.
- Wrote my own YANK substitute (AbSolv) since it’s been unstable for a while. Also adds support for things like vsites and custom nonbonded. DCole group is working with it and providing feedback. If other folks want to give it a shot I’d love more feedback. It can also do nonequilibrium free energy calcs.
- Working on “can a GCN understand resonance forms?”. It seems like, in the original vcharge paper, you can average the resonance forms (with some method for determining which resonance forms are reasonable). I’m experimenting with this in the nagl package. .

Meetings

2021-11-15 Core Developers meeting notes

Participants

Discussion topics

Action items

Decisions

Related content