Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion topics

Item

Notes

Updates from MolSSI

  • BP – Release went successfully. Database was migrated, to postgres13. Now using compression. Should be less stale jobs. Everything should be back up.

    • TG – More about stale jobs?

    • BP – Your stale jobs were due to a file size limit with moving around the hessians (some were around 50-100MB), but these can now be compressed 70-80%.

  • BP – There’s now a workflow for un-sticking tasks.

    • TG – I tried using it, we have a weird issue where, even after restart, its status doesn’t update. I haven’t seen the restart successfulyl going through.

    • BP – You can query tasks for that base ID. It should change from empty to incomplete.

  • DD – Currently error cycling ligand submission. One INCOMPLETE is holding things up. TG and I will meet about how to include this new reset pathway in our error cycling logic.

    • BP – I think I’ve got a good idea for what was causing this. I’ll be digging into this in the logs. It’s either two managers getting into a race condition on the same task, and then both of them trying to report back and overwriting. Or it’s a single manager reporting back twice.

    • DD – Or a combination of both.

    • BP – I’ve added debug output to ensure that the manager that returns the results is the manager that was assigned the task.

Manager/Queue status

  • DD – Queue is pretty empty. Expanded biphenyl set over the weekend using UCI and PRP. We had an issue where it’s an old dataset so it didn’t have CMILES. I’m going to resubmit this using QCSubmit to make a new dataset with CMILES

    • TG – What actually happens if there’s no CMILES?

    • DD – The OpenMMHarness can take two paths. If there’s CMILES, they’re used in the MM jobs. Otherwise RDKit guesses bonds from geometry.

    • JH – With RDKit, we can’t do charge molecules, and sometimes it guesses connectivity wrong.

    • JW – Would this become biphenyl set version 2? Or a differently-named set?

    • DD – Unsure. Depends on how we set standards.

    • TG – How do make sure the biphenyl resubmission doesn’t get deduplicated? The addition of CMILES would only affects extras.

    • DD will translate input molecules by a small amount to ensure they don’t get deduplicated in the newly input set.

  • DD – I’m preparing the protein fragment torsiondrive set. Trying to figure out how to apply dihedral constraints, thinking I shouldn’t apply them at all, since torsiondrive doesn’t consider them (constraints will be applied to optimization dataset)

  • DD – Worked with Dominic Rufa to do MM ANI jobs. Need to pre-screen for coverage of elements. I expect that this will have a lot of “not covered elements” errors, as well as convergence errors. Also observed lots of problems with INCOMPLETES in this set.

    • JH – I noticed that 4 torsiondrives succeeded in the last error cycling generation. Maybe keep resubmitting?

    • DD – The vast majority of errors are due to element coverage or convergence failures. Restarting wastes a lot of time because we don’t distinguish between error types yet.

    • TG – UCI has a lot of pre-emptible resources and kinda shifty memory, which makes my managers get interrupted a lot, and leads to these being INCOMPLETE.

    • DD – In the new set, I’ll filter ahead of time to only allow supported elements. I could also increase the number of steps

    • JH – It’s already at 300 steps, which is really high. I wouldn’t make it any higher.

    • DD – Ok, I’ll only submit supported elements, and I’ll try to monitor for TG’s error where memory usage climbs without bounds.

    • JH – I’ve also seen that ANI has a hard time achieving convergence in my bespoke workflow tests. If desired, we could relax the tolerance in terms of energy or force.

    • DD – Which keywords would I set to relax convergence criteria?

    • JH – I’ll send over an example of where I change convergence criteria in GeomeTRIC.

  • JH – I’d like to submit the XTB ligand set for compute. But it doesn’t work on mac.

    • JW – I’ll walk you through running linux via docker after this call

Previous to-dos

  • Joshua Hortonwill update protein optimization dataset and resubmit.

    • JH – Ready for submission, need to resolve name/version.

    • Decision – We should increment minor version number to make it 2.1

    • TG - Would be nice to have QCSubmit require a changelog for transformations like this

    • JH - For now I’ll put it in the description, but it’d be good to integrate this as a requirement later.

  • DD – Would pharma companies allow in-house molecules to go to external compute provider like Galileo?

    • JW – Basically, Janssen is strongly against this. We can keep the door open for other companies on a case-by-case basis.

User Questions

  • BP – Should OpenMM and TorchANI jobs be storing gradients?

    • (General) – Yes, but they aren’t stored in STDOUT

  • JH – QCSubmit currently depends on basis-set-exchange. Can we get a release out on conda-forge?

    • BP – There are some technical issues with keeping this in sync with website. Could release library.

    • JH – If you cut a pip release, that should trigger a c-f release as well.

Action items

  •  

Decisions