2020-12-04 Unit package/serialization working session

Date

Dec 4, 2020

Participants

  • @Jeffrey Wagner

  • @Matt Thompson

  • @Simon Boothroyd

Discussion topics

Item

Notes

Item

Notes

Studying unit-bearing fields in pydantic (quantity of list vs. list of quantity vs. arrays)

  • Goals:

    • Selection of a single unit package

    • Shared practices and infrastucture for cases that unit package doesn’t natively handle (eg. quantity of array of array of quantity)

  • MT – Two approaches: either consider ALL openFF packages and come up with an design that can be shared everywhere, OR pick what works well for one package and adopt that.

  • Quantity of list, list of quantity, or quantity-wrapped array?

    • SB – quantity-wrapped array would be best

    • JW – Are there EVER cases where we want to support iterable of heterogenous units?

      • (General) – Nothing essential

  • Decision – We’ll support 1) quantity-wrapped arrays (homogenous units), and 2) anticipate that people will go to a unitless representation when doing high-performance work and provide friendly infrastructure for that.

What will our objects look like?

  • Internal structure?

    • Inherit from pint

    • SB – As long as everything in an object is pydantic-derived, then __dict__ will work for us.

      • JW – It doesn’t seem like having our objects be pydantic-dervied will PREVENT anyone from writing new code or developing quickly, but it will ENABLE them to quickly make serializable classes if they so choose

        • SB – Agree, as long as they don’t try to incorpoate polymorphism or other pydantic no-nos

      • MT – Agree

  • API?

    • Decision: Our internal representation will always have quantity-wrapped arrays internally, and this is the format for all returned values. Users/adjacent developers will be able to implement their own functionality using Pint naively, and won’t need to know about our internal details.

    • One potential issue with the proposal below is that the constructor is initializing what is ACTUALLY a private field using the name of a public getter for it. We’ll want to determine a path forward on this/create a proof of concept before we commit to this design pattern.

      • SB – Potential solution?

    • Should the unit of a FloatQuantity be modifiable?

      • Pro: Allows faithful roundtrips of eg. force fields

      • Con: Increases storage space and deserialization time by a lot

    • Should specifying explicit units on a type be optional? Or should “implicit unit” types and “explicit unit” types each be provided?

      • JW – Would propose having something like ImplicitUnitFloatQuantity and ExplicitUnitFloatQuantity, where the former does NOT allow the unit ton be set by users, and NEVER includes it in the serialized form, and the latter DOES allow the unit to be set by users, and ALWAYS includes it in the serialized representation

      • (General) – JW’s design above is not necessary if a single type can represent a quanitity that DEFAULTS to one unit, but may have that default overridden by the user. But this representation, if using the default unit, would need to serialize to a form that completely excludes the unit if it’s set to the default value.

      • We need to try implementing this to see if we can excluide a field from being serialized if it’s at its default value, or if it’s an optional field that’s currently set to None.

      • MT – Some questions about implementation of ArrayQuantity – Would setter accept unitless arrays? lists? ArrayQuantitys? Pint quantitys?

      •  

  • CODATA handling?

Action items

@Matt Thompson Try wiring up init-kwarg-for-private-attribute-that-instead-takes-its-public-name
@Matt Thompson See if a single type can handle implicit- and explicit-unit quanitites (are serialize efficiently if unit is the default)
@Matt Thompson Come up with different possible design patterns against which Simon’s pydantic-quantity strategy can be compared.

Potential Model Design

import numpy from pint import unit from pydantic import BaseModel, Field class MyModel(BaseModel): _field_1: FloatQuantity["angstrom"] = Field(..., description="") _field_2: ArrayQuantity["kilojoule / mole"] = Field(..., description="") def field_1_raw(self) -> float: return self._field_1 def field_2_raw(self) -> numpy.ndarray: return self._field_2 @property def field_1(self) -> Quantity: return self._field_1 * unit.angstrom @property def field_2(self) -> Quantity: return self._field_2 * unit.kilojoule / unit.mole my_model = MyModel( field_1=0.05 * unit.nanometer, field_2=numpy.zeros((1, 5)) * unit.kilojoule / unit.mole ) # my_model.json() -> # # { # field_1: 0.5 # field_2: [0, 0, 0, 0, 0] # }

Action items

Decisions