2020-12-16 Benchmarking for Industry Partners - Development Meeting notes

 

Date

Dec 16, 2020

Participants

  • @Jeffrey Wagner

  • @David Hahn

  • @Joshua Horton

  • @David Dotson

Goals

  • Updates from project team members

  • Identify and address development issues encountered

  • Identify and address project risks

Discussion topics

Item

Notes

Item

Notes

Updates

  • DH

    • Started testing. Seems to work well.

    • Tested execution options A-C

    • Lim had a way to match the query-reference structures, which was used in some parts, but not in others. I’m using this throughout.

    • J W: Need to be careful about atom order; I’ve had luck with rdkit.Chem.rdMolAlign tools. Perhaps CalcBestRMS?

      • JW – I will switch to using CalcBestRMS

  • JH

    • Issue setting up SnowflakeHandler. Something about SQL error. If I do the snowflake class directly it works fine.

      • Locally, on mac, I can do both

      • On the linux machine I SSH into, I can only use the snowflake itself, not the snowflake handler. The error with the snowflakehandler is on startup.

        • Full output below

        • Ubuntu 16.04 – Computer was brand new 2 years ago

      • psi4 package currently appears broken

      • DD – Are any major directories networked?

        • JH – As best as I can tell, it’s all local

      • JH: On SnowflakeHandler, no matter the number of workers, same result; Snowflake itself works fine across number of workers

        • DD: We’ll switch to Snowflake itself and see if this works for users generally, handler not necessary likely

        • DD: can try and reproduce on ubuntu 16.04 VM if we still have issues

  • JW

    • Progress bars

    • Better/broader error handling

    • Live output for conf gen

    • RMS ratchet for conf gen

      • Not sure what minimum cutoff should be – How willl this affect downstream analysis? How bad are redundant post-QM conformers?

        • DH – For now let’s handle it naively (low cutoff/no deduplication), and in the future we can adjust this cutoff or do deduplication in a subsequent (post QM) stage.

        • DH – This is related to JH’s comments last week – It may be informative to monitor RMS change during QM optimization.

    • Redid docs

    • 0.8.2 release with stereo fixes

    • Warning squelching

      • Can/should I also do RDKit?

    • logging and name_assignment.csv

    • Next step:

      • Molecule labeling/parameter assignment

        • JH: started a branch for this component; this may be a good starting point for you

      • Add more error-handling tests for conf gen, ensure that parallel conf gen PR still handles tough cases gracefully

  • DD

    • Iterated on compute component with Swope and JW.

    • Made protocol document, sits on top of deployment doc

      • Offers 5 types of running:

        • single node, no server

        • multi node, no server/manager

        • (See doc for all supported configurations)

      • DD – Feedback on compute options?

        • DH – Could prioritize the 5 options – I’d deprioritize option A for example.

        • JW – Should completely reorder them – Less for users to read

        • DD – I’d put them in the order D, E, C, B, A

    • Added progress bar for execution steps

    • JW – I’ll look into RDKit warning squelching

    • DH – Could I have alberto’s dataset? I think they may be similar to PDB set.

      • DD – I’ll send those to you

    • JW – Option for Galileo compute

      • DD – Doubtful about this being successful, pharma folks probably couldn’t do this for data security reasons

      • JH – It’s been easy to use galileo compute for bespoke workflow

      • DD – Would need to make guarantees about security of network traffic

    • Major concerns/blockers?

      • DD – Nervous about diversity of compute environments, though they’re mostly SLURM. 3 or 4 partners didn’t respond at all.

      • DD – Concerned about need for network-accessible server. Partners will need to contact IT and get this done.

        • JH – Could send out a test case of compute to have them try and run.

      • DD – Also concerned about managing expectations for what they can handle for a given amount of compute resources.

      • JW – This is a very complex operation. If a few pharma partners end up coming in late/reducing size of dataset/dropping out, that’s not the end of the world, and a reasonable person wouldn’t put it all the blame on us. Especially if 7 other companies get it working successfully, the two/three that encounter trouble won’t see it as us being negligent.

      • DD – Could make a “predicted timer”, with an upper estimate on time required. This would help managing expectations.

      • JW - risks to project: after we do compute, could be expensive to try things again with a different approach; not really actionable, however

      • DH: Think we’ll have more information in early January

      • DD: I’ll reach out to Fox, Gobbi, since they expressed interest in giving feedback

        • JH: is there a list of things we should ask them to report back?

        • JW: like give them a set of specific molecules to try on their infrastructure?

          • JW – Will put together a small set to distribute for this purpose (5 heavy atoms, 10 heavy atoms, 20, 30, 40, 50, 60)

            • 60 is larger than we expect to handle, but it will be good to see what happens on really large mols

  • DH: below 500 molecular weight is considered small molecule

    • DH: largest in the Lim set is 84 atoms, probably around 40 heavy atoms

  • JW: Perhaps filter out in validate step?

    • Will include a molecule up to 60 heavy atoms in the test set; will help to set our cutoffs in validation

    • JW – Will add --max-heavy-atoms argument to validate component

JH’s full snowflakehandler error output

 

Traceback (most recent call last): File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/bin/openff-benchmark", line 33, in <module> sys.exit(load_entry_point('openff-benchmark', 'console_scripts', 'openff-benchmark')()) File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/home/b6056633/Documents/openff/benchmark_testing/openff-benchmark/openff/benchmark/cli.py", line 125, in execute server = FractalSnowflakeHandler(ncores=ncores) File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/qcfractal/snowflake.py", line 227, in __init__ self.start() File "/home/b6056633/miniconda3/envs/openff-benchmark-optimization/lib/python3.7/site-packages/qcfractal/snowflake.py", line 361, in start f"stderr:\n{err.decode()}", ConnectionRefusedError: [Errno Snowflake instance did not boot properly, try increasing the timeout.stdout: Starting a QCFractal server.QCFractal server base folder: /tmp/tmpjcp1wfay>>> Examining SSL Certificates... Autogenerated SSL certificates, clients must use 'verify=False' when connecting.>>> Logging to /tmp/tmpjcp1wfay/db_463ef162_1007_4078_804f_25aa7a870797 >>> Loglevel: INFO>>> Checking the PostgreSQL connection...>>> Initializing the QCFractal server... Fatal during server startup:SQLAlchemy Connection Error (psycopg2.errors.AdminShutdown) terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.[SQL: CREATE INDEX ix_grid_optmization_program ON grid_optimization_procedure (program)] (Background on this error at: )Failed to start the server, shutting down.] stderr:

Place mols in subdir?

  • DD: Can we put successful molecules in validation, conformer generation in their own subdirectory like error_mols? Important for flexibility of the compute step downstream

    • JW: yes, can make this change

    • JW – Will split validate output into success_mols and error_mols, have generate_conformers default to look in 1-validate_and_assign/success_mols

    • JW – Will enforce 3-character, all uppercase groupname

    •  

Conda package

  • DD: JW, can you add conda package building on the basis of a git tag?

    • pull GHA from toolkit

    • JW – Yes. I’ll try to get this done, but it’s a lower priority than validate/conf gen to-dos

    • DD: can wait till January

Action items

Decisions