Handling units across OpenFF infrastructure

Common pain points

  1. Not knowing what used are used

    1. Can often infer (i.e. positions being nm vs angstrom) but this is horrific for reproducibility

    2. If given k = 400, what units - kcal/angstrom or kJ/nm? For angles, degree or rad?

    3. A single error can ruin a workflow (or sometimes crash spaceships)

  2. Errors can propogate through inaccurate numbers, stringing together conversion factors, etc.

    1. i.e. rounding hardcoded conversion factors with rounding (RAD_TO_DEG = 180/3.141 )

    2. Similar case demonstrated in Fig 3 of Shirts et al.

  3. Serialization - often not possible at all, and difficult to do round-trip and broadly

Existing approaches

1. Implicitly define units, just carry numbers

  • See ParmEd, most simulation engines

  • Expect conversions to be done (and done accurately) at the interfaces in and out

  • In QCArchive, everything is in atomic units, which are unitless

  • Conversions are handled by keeping track of necessary conversion factors, which can introduce errors

2. Let users do what they want but carry one set of units internally

  • OpenMM

  • Backend space everything has a single unit (nm, kJ/mol, ps, etc.)

  • 1.0 nanometers -> (nanometers and kj/mol)

3. All units all the time

  • Build on top off an standardized unit package (pint is the popular one, some competition from unyt and others)

  • Everything has units all the time, stored internally and exposed to the user

  • Some interfaces will be between libraries using (slightly) different registries

OpenFF Unit Specification

Necessary features

  1. Serialization - both to and from

    1. pint natively does this well

  2. Plays well with existing infrastructure

    1. Strong preference for not breaking API if at all possible

    2. For all interfaces between two libraries using pint, the same unit registry must be used

    3. Clean interface with OpenMM units

      1. Dimensionless quantities may be tricky because of existing OpenMM issues

    4. Evaluator already uses pint

  3. Be able to update registry when a “better” one is released, but also switch into older ones for reproducibility

    1. Distinct from Amber’s (and probably others') approach of not updating at all

Issues

  1. How to ensure different OpenFF libraries use the same unit registry?

    1. Implicitly defining a registry allows things to fall out of sync

      1. Today qcel.constants.ureg gets us CODATA2018, in a month or year it may not (and it 5 years it shouldn’t); components in an ecosystem may not all update at once

      2. Need to log the registry for provenance

    2. Explicitly defining a registry makes updating things cumbersome (need to be done all at once)

  2. Are there any interfaces that wouldn’t play nicely with unit-tagged values?

    1. i.e. if some function expects a float but may get 0.15 * u.nanometer

 

Barrage of links:

Pint - the most actively used and developed Python units package, already implemented in QCElemental, openff-evaluator, and a few hundred other packages

unyt - an alternative units package that plays better with NumPy (see their JOSS paper)

openff-evaluator refactor to pint, with many instructive bits buried in the git diff

defining default units in openff-evaluator classes, an approach which could maybe be used for other classes more generally

Open GitHub Issues:

  • OpenMM units don’t play well with dimensionless quantities

  • Original “let’s use pint discussion

  • Long discussion, mostly unrelated to units, about how to specify units in XML files

  • Old issue with the theme “let’s be explicit about units, and enforce units (even though this was about OpenMM units)

  • A case in which explicit and implicit units may necessarily be mixed