2021-06-28 Core Developers meeting notes

Participants

  • @Andrew Dalke (Deactivated)

  • @Matt Thompson

  • @Simon Boothroyd

  • @Jeffrey Wagner

Discussion topics

Item

Notes

Item

Notes

 

  • AD

    • Presented on toolkitwrapper/toolbox refactor.

      • Uploaded slides, updated for tomorrow’s presentation

    • Used coverage tool to reduce test runtime, in PR #999

      • Need to update to include instructions

      • A lot of tests had a lot of overlap – Eg, the added funcitonality in to_bson compared to to_dict is really small, only requires one addl molecule

      • Some tests require more review, for example total_charge – It runs the same code path in the test as in the toolkit – Instead it should use known “correct” answers stored in the tests themselves.

    • All of Minidrugbank is loaded both as SDF and Mol2. This loading is unconditional. Instead we should make these files only get loaded when the tests need them.

    • We should separate tests of reading file formats, from tests that need molecules for testing internal functionality

      • SB – Agree strongly – We should separate unit tests from integration tests.

    • I took a look at test_forcefield – It takes 5 minutes to run. One minute of this is reading the same configuration file. So I think we can reduce this runtime. a lot, but this is past the scope of my current work.

    • SB – I really like the code coverage analysis tool. But might we fall into the trap that coverage is “coverage is correctness”? Like, would it be beneficial to generate “minimal” versus “integration” sets?

      • AD – Currently the coverage tool operates on line numbers, with excepions being given negative line numbers. My thought is to have this allow us to only add meaningful test cases.

      • SB – I think I understand – the tool will provide an ensemble of molecules that we can pick from.

    • JW – I’m trying to think of a workflow for adding new functionality/modifying existing functionality and detecting edge cases.

      • AD – The root way to do this would be to check per-line coverage when adding/modifying features, not putting a large dataset through in CI.

      • SB – Agree, better design of the test suite should be able to remove the need to throw huge test sets through every time.

      • DD – Could have a “put a big dataset through” test suite that we run manually

      • AD – I haven’t needed this, but I’m the only developer on my project. With developers with a variety of skill levels, this sort of safety net may be helpful.

    • AD – The unit tests should aim to run in under a minute locally. Especially when running just a few tests, the startup time is really significant. Especially in test_forcefield

      • MT – I’d be up to take a look at test_forcefield and improve runtime.

  • SB

    • Mostly working on QCSubmit –

      • Looking to get parity with results object,

      • get coverage reports,

      • ensure that different APIs touch the same code.

      • Worked on docs theming

      • Working with JH on unique SMARTS matching, for example getting the first and second torsion in n-butane. Currently thinking of using symmetry classes.

        • AD – SMARTS matchers can do deduplication based on “seen atoms”. But that’s more used for ring systems.

        • SB – The normal built-in uniqueness definitions are based on atom indices, but I want them to be based on chemical equivalence/graph symmetry. Most problems of this class are based on hydrogens, but sometimes it’s heavy atoms.

        • AD – If doing SMARTS matching (eg for fingerprint generation), we can use uniqueness checking to resolve this, but that may not be applicable to this problem. Additionally, it depends on whether you need to keep track of bonds.

    • Fragmenter is now on conda-forge. The discussion with maintainers goes far worse if we try to split a single feedstock into making multiple packages initially. So from now on we should

    • Worked on a prototype of a different, more granular way to break up cheminformatics toolkits

      • SB – Generally, I’d think of updating the toolkits as being more of a rewrite than a refactor

      • AD – When a molecule is created, does it know which registry it’s associated with?

        • SB – Not currently, but the featureregistry can be queried to figure this out

        • AD – I’m thinking of having an openff.api call to be able to query which functionality is available.

  • MT

    • Been tying up loose ends and bugfixes in interchange in preparation for an 0.1.0 release.

      • Made an rc release this morning. I don’t think I’ll get feedback from other folks, so this will probably go ahead to a full release.

    • Worked on docs, will be working with Josh Mitchell to define a strategy for my docs in the short term, and syncing up on docs unification in the long run.

      • JW – I spoke with JM and he’s up to meet during his night/your morning next week.

    • Found some performance bottlenecks, for example importing interchange requires ~5 seconds. This is partially due to optional deps, and slow OFF Toolkit imports. Not sure how much I cn optimize/reduce those times.

    • Found a very tricky bug in Foyer having to do with nonbonded mixing rules and assumptions made in its interface with ParmEd. I’ll probably fix this myself.

    • Talked with Trevor, Simon, Jeff about virtualsites. I think I have enough direction on this to move forward.

  • DD

  • JW

    • Spoke with OpenFE candidates

    • Worked on roadmap update

    • openff-1.3.1 and 2.0.0-rc1 releases

      • JW will annoucne this release in #general

    • Partial charge normalization almost ready for review.

      • FF subdirectory

      • Other blockers to toolkit release?

        • MT – No blockers, a toolkit release is long overdue. In the future we shouldn’t ask for blockers and rather assume that we should be going ahead with more regular releases.

    • Organizing followup workshops

    • We’ll likely get some of IP’s time in the future, once he starts working in JC’s lab.





Action items

Decisions