2023-02-28 Thompson/Wagner/Wang Check-in meeting notes

Participants

@Jeffrey Wagner
@Matt Thompson
@Lily Wang

Discussion topics

Item	Notes

Item

Notes

wanted to think a bit about architecture before this meeting but ran out of time. So we could discuss that, or discuss nagl.
MT – I tried doing partial charges for a host system - I gave up with the ToolkitAM1BCCHandler and instead did chargeincrementmodelhandler with formal charges. So I’m curious about the plans for loading nagl models.
- LW – Re: Nagl #26 - I was writing a big response, but I'm still working on that.
- LW – You should be able to use the models quite soon. Right now they’re stored via pickling. They’re about 10MB a pop. Out of curiousity, why are FFs in a different repo?
- JW – It’s just to keep the release cycles from getting mixed up. We could have worked around this but that’s the choice we made at the time.
- MT – I think it’s fundamentally superior to have microrepos. It’s not something that could just “be worked around” - It’s better in principle to do microrepos.
- JW – Not relevant here, but I’d kinda be in favor of a giant monorepo. I seems like we’d get rid of all the pinning complexity and we could trust that green tests really are green. And google does a monorepo.
- MT – LW, what are your thoughts about models with vs. without the code?
- LW – …
- MT – Choice of picking is interesting, though it does seem more compressed than a yaml or something.
- LW – I’m doing pickle because the pytorch default is to pickle. JM was using load_from_checkpoint in his docs work - This seemed slightly better because it contained more info. But it’s 32MB. Or I was thinking about making our own standard of yaml+model weights.
- MT – Is checkpoint file is somewhat standardized in pytorch world?
  - LW – Yes
- MT – This reminds me of GROMACS/MD engine checkpoint files, which aren’t necessarily enough to reproduce.
- LW – I’d think about doing hyperparameters in yaml and the weights in pickle.
- MT – It could make sense to make a .nagl format, though then we’d need to define our own format.
- LW – With pickling mode, they’d need to do use nagl to read weights anyway…
- MT – How would the size of model files compare to pytorch, mkl, etc? Seems like they’d be a lot smaller than just the deps.
- LW – Agree. We can reduce size of deps, but it’s hard to get rid of pytorch.
- JW – Does DGL have a serialization format?
- LW – 1) No and 2) I wanted to get rid of DGL as a dep for inference and 3) ONNX is explicitly not compatible with DGL.
- MT – Some perspective from how FFs are loaded: It’s bad that the FF loading machinery recursively checks directories until it finds the first thing with the right name. Also a question - If the model repo size gets big, will old models start getting dropped?
- LW – I don’t plan on dropping old models.
- MT – Where can I find an example model file?
  - LW – Nagl PR #26 - This is very pre-production but it’s there.
  - MT – Numerical issues, not error issues, right?
  - LW – Yes, it works pretty well for normal small mols + proteins.
- MT – Great. I think I can move forward with this then!
- MT – Will nagl ever do things other than charges?
  - LW – Maybe in the future. The model/architecture may need to change if the mapping becomes one-model-computes-many-parameters.
JW – For making models available, we could:
- Have an ever-growing conda package
- Store the actual models on zenodo and have the inference package dynamically reach out to zenodo to grab requested models
- The previous option, but with a higher-speed hot storage somewhere like openforcefield.org that is checked first for the models, before zenodo gets pinged.
- MT – What if we published two versions of the models package: one with just the latest model, and one with all the models.
- LW – …
- JW – …
- MT – How about we start with “all models in one package” and then figure something out once storage becomes an issue?
  - LW + JW – This sounds like a good plan.
LW – How does the name of the model file look?
- JW – Looks good.
- MT –
- How to Cite
(General discussion about how to reference models in FFs, reproducibility, citability, etc)

2023-02-28 Thompson/Wagner/Wang Check-in meeting notes

Participants

Discussion topics

Action items

Decisions