Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Item

Presenter

Notes

Big picture issues

Simon

  • Data model for output data probably the messiest place right now; hard to standardize these yet

    • performance a problem as well, since many objects get shuttled around and replicated on dask workers

    • serialization/deserialization across network a bottleneck

    • provenance info duplicated and shuttled around too, adding to problem

    • we want to figure out a satisfying solution for provenance that meets our performance needs

pAPRika

Simon

  • Issue on TSCC with filesystem writes (#224)

    • Could address with retry wrapping around components at each layer (protocol, individual dask steps, etc.)

    • we already do aggressive checkpointing, so this shouldn’t be a huge waste as a compensation for filesystem issues

      • need to verify coverage of checkpointing

  • The pAPRika PR is getting unwieldy, and currently have to manually keep it in sync with mainline branch

    • may help to split paprika components out as a separate repo that isn’t highly coupled to evaluator

    • David Dotson will touch base with Jeffry Setiadi on whether this solution works for him next

Development workflow

Simon

  • Numpy docstrings

  • black / flake8 for formatting / linting

  • Try to keep PRs under 500 lines of changes; not always possible

Good first issue

Simon

  • AttrsXXX (#226) should go all in on pydantic. Issue should be adjusted accordingly

Action items

...