Tried again to adapt script to have a timeout so that a manual exclude list isn’t needed for larger datasets… that still isn’t working
DS2-COD
Finished draft of parsing scripts…but now not confident that the contents are relevant
LW: In the standup let’s move this back to the Science Backlog
DS3-CCD
Access update? Maybe skip it
LW: Still in progress, let’s focus on other for now.
Paper only directs readers to tmQMg repo with ~75k “structures“ (xyz files)
The xyz2mol_tm repo contained SMILES for tmQMg and ~220k structures of CSD.
We can:
Take xyz files and smiles for ~75k * 0.7 of tmQMg
LW: Leave this since for now
JCl: I’ll leave this for working with Chris' dataset since it’s the same dataset
Use cleaned CSD SMILES to generate dataset of
~220…
~220 k … recall that geometries are bad… can we use UFF to fix? (like Avogadro does)
LW: Let’s try it, do what Brent did and add that. Although UFF might make everything “the same” so use an RMSD comparison of conformers generated and minimized by RDKit.
LW: Get the pipeline up and something through and then we can wrap back around to these.
Dataset Longevity
Discussed in “iteration planning” I said it was a lower priority with the current issues. ok? I’ll continue chipping away if other tasks stall before I can request assistance
LW: Not an “MVP” but still a “Nice to Have”, we might bring this up as a topic at the onsite, so next sprint prepare a 10 min or so talk.
NRP Issues
Still not resolved. Applied python API to isolate that it’s a MolSSI issue, messaged Ben today and he’s looking into it
LW: Let me know if you need anything from me. It sounds like this is very disruptive.
Given QCSubmit datasets haven’t moved due to NRP issues, cancel meeting? Is Shirt student attending? Seems so.
LW: Post agenda items and let people know about our NRP issues. That will be a walk-through for Anika