Handling units across OpenFF infrastructure
Common pain points
Not knowing what used are used
Can often infer (i.e. positions being nm vs angstrom) but this is horrific for reproducibility
If given k = 400, what units - kcal/angstrom or kJ/nm? For angles, degree or rad?
A single error can ruin a workflow (or sometimes crash spaceships)
Errors can propogate through inaccurate numbers, stringing together conversion factors, etc.
i.e. rounding hardcoded conversion factors with rounding (
RAD_TO_DEG = 180/3.141
)Similar case demonstrated in Fig 3 of Shirts et al.
Serialization - often not possible at all, and difficult to do round-trip and broadly
Existing approaches
1. Implicitly define units, just carry numbers
See ParmEd, most simulation engines
Expect conversions to be done (and done accurately) at the interfaces in and out
In QCArchive, everything is in atomic units, which are unitless
Conversions are handled by keeping track of necessary conversion factors, which can introduce errors
2. Let users do what they want but carry one set of units internally
OpenMM
Backend space everything has a single unit (nm, kJ/mol, ps, etc.)
1.0 nanometers -> (nanometers and kj/mol)
3. All units all the time
Build on top off an standardized unit package (
pint
is the popular one, some competition fromunyt
and others)Everything has units all the time, stored internally and exposed to the user
Some interfaces will be between libraries using (slightly) different registries
OpenFF Unit Specification
Necessary features
Serialization - both to and from
pint
natively does this well
Plays well with existing infrastructure
Strong preference for not breaking API if at all possible
For all interfaces between two libraries using
pint
, the same unit registry must be usedClean interface with OpenMM units
Dimensionless quantities may be tricky because of existing OpenMM issues
Evaluator already uses
pint
Be able to update registry when a “better” one is released, but also switch into older ones for reproducibility
Distinct from Amber’s (and probably others') approach of not updating at all
Issues
How to ensure different OpenFF libraries use the same unit registry?
Implicitly defining a registry allows things to fall out of sync
Today
qcel.constants.ureg
gets us CODATA2018, in a month or year it may not (and it 5 years it shouldn’t); components in an ecosystem may not all update at onceNeed to log the registry for provenance
Explicitly defining a registry makes updating things cumbersome (need to be done all at once)
Are there any interfaces that wouldn’t play nicely with unit-tagged values?
i.e. if some function expects a float but may get
0.15 * u.nanometer
Barrage of links:
Pint - the most actively used and developed Python units package, already implemented in QCElemental, openff-evaluator
, and a few hundred other packages
unyt - an alternative units package that plays better with NumPy (see their JOSS paper)
openff-evaluator refactor to pint, with many instructive bits buried in the git diff
defining default units in openff-evaluator
classes, an approach which could maybe be used for other classes more generally
Open GitHub Issues:
OpenMM units don’t play well with dimensionless quantities
Original “let’s use
pint
discussion
Long discussion, mostly unrelated to units, about how to specify units in XML files
Old issue with the theme “let’s be explicit about units, and enforce units (even though this was about OpenMM units)
A case in which explicit and implicit units may necessarily be mixed