Dataset Updates
|
| JAC Update PR421 Lexie’s small molecule dataset finished except for 5 with Iodine which isn’t supported for MBIS charges Running PR422 Lexie’s larger MW dataset, two didn’t finish before, I thought because of the connectivity errors. Lipd MAPS running with low number of pods to see if any complete. So far no restarts but no completions either. There is still a Lipid Opts Benchmark, submit? or wait for QCF update? Are all new datasets my responsibility? LW: Yes, but maybe if we are going to update QCF it doesn’t make sense to JAC: I’ll submit today, and if no progress is made by tomorrow I’ll shut down deployments
|
Kubernetes Python API | | JAC: I’ve written some nice scripts in notebooks for Kubernetes API. If I have many deployments I’d like to take some time to clean it up into a package to monitor them more easily. How can I work this into Zenhub? LW: That sounds worthwhile, do ahead and make an epic with some tickets under the project “Improved Training Methods and Data“ |
DS3-CSD Update: | | From CSD there are: 230550 structures xyz2mol_tm successfully converts: 217776 There are 51754 that match constraints for our primary dataset Leaving 788 structures left This niche-key things is a problem, so I’m putting together a simple script to send to Magnus (xyz2mol_tm author) about this. Do we need inchi_keys? There are no bi or tri- valent metal complexes, this might be a motivation to get our own access to CSD Looking at the “other” elements that aren’t included in our project scope, Co or Ru have a lot of structures. LW: I also see I in there, that should prob be in our primary dataset JAC: It wasn’t in the strategic doc so I didn’t include it. Also MBIS charges don’t work for atomic numbers over 36. LW: I didn’t realize, you might post on the bespoke channel and ask Danny about that. JAC: Since the Chodera lab requested the multipole moments from MBIS charges, I don’t think we should add that in LW: That makes sense, let’s bring it up in our next meetings and make sure that’s not a priority for Genentech, or that Chodera lab would be will willing to relax their constraints LW: What are the constraints of the xyz2mol_tm smiles? Do we need CSD? JAC: They exclude group 1 and group 2 and we need Mg LW: It sounds like we do need it.
|