Page Comparison

...

Discussion topics

Time	Item	Presenter
Big picture issues	Simon	Data model for output data probably the messiest place right now; hard to standardize these yet performance a problem as well, since many objects get shuttled around and replicated on dask workers serialization/deserialization across network a bottleneck provenance info duplicated and shuttled around too, adding to problem we want to figure out a satisfying solution for provenance that meets our performance needs
pAPRika	Simon	Issue on TSCC with filesystem writes (#224) Could address with retry wrapping around components at each layer (protocol, individual dask steps, etc.) we already do aggressive checkpointing, so this shouldn’t be a huge waste as a compensation for filesystem issues need to verify coverage of checkpointing The pAPRika PR is getting unwieldy, and currently have to manually keep it in sync with mainline branch may help to split paprika components out as a separate repo that isn’t highly coupled to evaluator David Dotson will touch base with Jeffry Setiadi on whether this solution works for him next
Development workflow	Simon	Numpy docstrings black / flake8 for formatting / linting Try to keep PRs under 500 lines of changes; not always possible
Good first issue	Simon	AttrsXXX (#226) should go all in on pydantic. Issue should be adjusted accordingly

David Dotson will engage with Jeffry Setiadi, support science from pAPRika
David Dotson will start making pydantic shifts in data model as individual PRs against issue #226