Studying unit-bearing fields in pydantic (quantity of list vs. list of quantity vs. arrays)
Goals:
Selection of a single unit package
Shared practices and infrastucture for cases that unit package doesn’t natively handle (eg. quantity of array of array of quantity)
MT – Two approaches: either consider ALL openFF packages and come up with an design that can be shared everywhere, OR pick what works well for one package and adopt that.
Quantity of list, list of quantity, or quantity-wrapped array?
SB – quantity-wrapped array would be best
JW – Are there EVER cases where we want to support iterable of heterogenous units?
(General) – Nothing essential
Decision – We’ll support 1) quantity-wrapped arrays (homogenous units), and 2) anticipate that people will go to a unitless representation when doing high-performance work and provide friendly infrastructure for that.
What will our objects look like?
Internal structure?
Inherit from pint
SB – As long as everything in an object is pydantic-derived, then __dict__ will work for us.
JW – It doesn’t seem like having our objects be pydantic-dervied will PREVENT anyone from writing new code or developing quickly, but it will ENABLE them to quickly make serializable classes if they so choose
SB – Agree, as long as they don’t try to incorpoate polymorphism or other pydantic no-nos
MT – Agree
API?
Decision: Our internal representation will always have quantity-wrapped arrays internally, and this is the format for all returned values. Users/adjacent developers will be able to implement their own functionality using Pint naively, and won’t need to know about our internal details.
One potential issue with the proposal below is that the constructor is initializing what is ACTUALLY a private field using the name of a public getter for it. We’ll want to determine a path forward on this/create a proof of concept before we commit to this design pattern.
Should the unit of a FloatQuantity be modifiable?
Pro: Allows faithful roundtrips of eg. force fields
Con: Increases storage space and deserialization time by a lot
Should specifying explicit units on a type be optional? Or should “implicit unit” types and “explicit unit” types each be provided?
JW – Would propose having something like ImplicitUnitFloatQuantity and ExplicitUnitFloatQuantity, where the former does NOT allow the unit ton be set by users, and NEVER includes it in the serialized form, and the latter DOES allow the unit to be set by users, and ALWAYS includes it in the serialized representation
(General) – JW’s design above is not necessary if a single type can represent a quanitity that DEFAULTS to one unit, but may have that default overridden by the user. But this representation, if using the default unit, would need to serialize to a form that completely excludes the unit if it’s set to the default value.
We need to try implementing this to see if we can excluide a field from being serialized if it’s at its default value, or if it’s an optional field that’s currently set to None.
MT – Some questions about implementation of ArrayQuantity – Would setter accept unitless arrays? lists? ArrayQuantitys? Pint quantitys?
CODATA handling?
Action items
Matt Thompson Try wiring up init-kwarg-for-private-attribute-that-instead-takes-its-public-name
Matt Thompson See if a single type can handle implicit- and explicit-unit quanitites (are serialize efficiently if unit is the default)
Matt Thompson Come up with different possible design patterns against which Simon’s pydantic-quantity strategy can be compared.
0 Comments