2022-05-10 QC Meeting notes

 Date

May 10, 2022

 Participants

  • @Pavan Behara

  • @Jeffrey Wagner

  • @David Dotson

  • Ben Pritchard

  • @Chapin Cavender

  • @Joshua Horton

 Discussion topics

Item

Notes

Item

Notes

Updates from MolSSI

  • BP – Still kinda troubleshooting hardware issue. Very strange - There have been little clusters where the server turns off/on. Seems to happen at times when the servers aren’t under load. But we’re running jobs pretty heavily right now and there haven’t been problems. So I think it may have been something going on in the building that nobody is fessing up to.

  • BP – Looking at sending in an application for funding, which could include hardware.

  • BP – In refactoring the code, I found that reactiondatasets have a manybody calculation inside of them. I’m not sure that we ever intended to support that but we do now, so I’m making sure this functionality is still accessible in the refactor.

    • doesn’t belong there; should be something more top-level

    • probably something OpenFF doesn’t need

    • CC – Does this support SAPT calculations?

      • BP – I haven’t thought about that… maybe

  • JW – Next release timing? We have a big API breaking release coming and would align them if posible

    • BP – Thinking it’s not too long. Reaction datasets are the last to go. But no hard date yet. There are some flexible things like docs that could come before or after release. I know DD was going to coordinate with me on some updates to OpenFF submission code (QCSubmit)

    • DD – I’ll schedule a working session with you later this week.

  • BP – We got some interviews set up for the postdoc position, I’ll let you know what comes of that.

  •  

Infrastructure needs/advances

 

Throughput status

  • OpenFF Protein Capped 1-mer Sidechains v1.3 - 44/61 TDs (3 * 5 new starting points added for the errored out TDs)

    • CC – There was one TD that was going very slow, but did eventually finish. Of the three that didn’t finish, I made a new version of the dataset (this one, 1.3) that attempted to use different starting coordinates (made a starting conformer for each grid point using sage). That didn’t work, so I made a new dataset where I did that same thing but with 30 degree spacing. I also did a dataset where starting configurations were generated using sage for random grid points.

    • JW – Any characteristics of these conformers that are failing to optimize?

      •  

    • CC – Mostly things with hbonds between backbone and sidechain. So like LYS and GLU.

    • JW: Any chance of proton transfer or the geometries constrained too hard on a hydrogen bond?

      • CC - I don’t think so, the dihedral constrained doesn’t have any hydrogens but there is a Nitrogen which has those hydrogens that are involved in hydrogen bonding.

    • CC – I tried running locally but didn’t know how to capture the geometry from QCEngine during the optimization.

      • DD – Would you have time to work with PB and myself to try this out? I’m pretty sure I’ve done this before

      • PB – IIRC, it was something like messy=True that made it story the temp files.

      • (DD, PB, and CC will meet Thursday afternoon Pacific)

    • PB – Do you use the strong internal hbond structures for fitting? In general we try to avoid using those for fitting.

      • CC – I agree, but the thing I’m wondering about is whether missing some grid points will make the rest of the torsiondrive valid?

      • CC – I looked into this, and these torsiondrives do seem to have higher-energy structures compared to the minimum than the successful torsiondrives. So either there’s something up with these AAs, or we need to run more optimizations on each grid point to find more relaxed structures.

      • PB – Some part of QCArchive has an energy cutoff for torsion scans at 30 kcal/mol where it disregards structures above that. I don’t think structures above that energy are saved at all.

      • CC – I do see some results in my datasets that are 30 kcal/mol above the minimum. Would that cause a problem?

      • PB – I’m not sure whether that would get cleaned up between optimizations/grid points.

      • CC – For the lysine sidechain I have an example where the bakbone dihedrals are in a beta sheet configuration, and the highest energy above the minimum was 15 kcal/mol. But the alpha helix backbone for LYS is 35 kcal/mol above the minimum.

  • SPICE sets: Around 34.2K jobs last week

    • SPICE PubChem Set 2 Single Points Dataset v1.2:

      • 100 consistent incomplete/stale jobs, not showing any error message

      • DD – BP, would you be able to take a peek at these incomplete jobs?

      • BP – Sure, could you send me the IDs?

      • (PB sent job ids)

      • BP – It looks like the managers are stuck

      • DD – Last week I restarted all the PRP and Lilac managers to try and debug this. Can you see which host this is running on?

      • BP – This is marked as “running” on lilac, but the manager is marked “inactive”. This shouldn’t happen, and this will keep the task from getting picked up. I may have a script that can solve this - It happens some times when a manager starts up and then shuts down quickly (like, it shuts down while a task is in transit from the main QCA).

      • (BP ran the stuck-job-restarter script, reports that it restarted 487 jobs)

    • SPICE PubChem Set 3 Single Points Dataset v1.2: 121806 from 95725 (only 200 jobs yet)

    • SPICE Pubchem Set 4 Single Points Dataset v1.2: 8162 from 0

  • TG started back QC workers on UCI-hpc3 (25 workers: 8 cores/240 GB), PB is running a few workers as well since friday (10 workers: 40/48 cores/180GB), throughput is around 7K jobs yesterday.

User questions/issues

  •  

Science support needs

 

 Action items

 Decisions