2022-04-11 Chemical Perception meeting notes

Participants

  • @Trevor Gokey

  • @CaitlinBannan

  • @Tobias Huefner

  • @David Mobley

  • @Chapin Cavender

  • @Christopher Bayly

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Testing transferability of alkane parameters to Sage

@Trevor

  • CBy: So, you’re iterating over subgroups of smarts, and don’t you encounter invalid smarts?

  • TG: I think it is easy to capture those, what do you mean by invalid though?

  • CBy: On slide 6, the blackboxes, the union, is a simplification of the grey populations above but are we sure that we get a valid smarts pattern if we just build using blackboxes?

  • TG: There is multiple ways to test for invalid smarts, so even though you query all possible patterns you would filter out and pick candidates that actually perform partitioning and not picking candidates that are dead on arrival….

  • CBy: So, a chemistry world is transformed into fingerprint world. So, in the chemistry subworld are you exploring beyond the chemistry specified above.
    So, AA, AB, AC … is the known chemistry world, the above section and the other section is going to chemistries you have not seen?

  • CBa: So, the above section is based on molecules set right, how would we go off into unexplored space?

  • TG: Let’s pretend the bond parameter that matched [*]-[*], then ….

  • CBy: If you’re characterizing a larger space using your grey subspaces then you may go into unseen chemistries.

  • CBy: Let’s assume you have safeguards, …

  • TG: I only accept matches to my training molecule set.

  • DM: But also: If you were trying to learn something from, say, an alkane set that would generalize to broader chemistry, you might imagine that the SMARTS which would cover the alkane set are only a subset of those you would want to generalize to the broader set… So there’s chemistry you might be interested in that you haven’t seen yet, right?

  • CBa: Yes David, my argument was that you need representation in your training set if you wanted to go somewhere new.

  • TG: I can extend this to more bits to make the parameters more distinguishable…

  • SLIDE 11

    • CBy: From random forests, you can use important sampling and instead of picking the top20 can you use a similar thing and pick the important ones.
      So, going back to SLIDE10, you can see what parameters are responsible for the best drop instead of the top20.

    • TG: So, if we build a ML model then we can kind of do this but it might be computationally tedious.

    • CBy: If you have time you can check about important sampling.

    • TG: I have been looking at AdaBoost since it is great for binary things…
      Also, right now I am doing only 1-bit searches now, and learning ways to filter better and learn the surfaces.

  • CBy: So, on SLIDE12, your new parameter set has way too general parameters, did you transfer as is?

  • TG: Yeah, pretty much. Since these are at the top of the FF they’re low in priority based on hierarchy and in that way it is safe to transfer.

  • CBy: There would be unintended consequences if we let too many wild cards even though now we see good results.

  • CBa: Likewise.

  • TG: The toxic side affects are already in Sage with a1 being too general.

  • CBa: But, that’s a #6X4 so all are single bonds.

  • CBy: Generality and uncertainty

  • DM: I think there’s two issues here that are getting conflated:

    1. Issues where the central atom (or key atom) in an angle is now allowed to change, which is very very concerning always

    2. Issues where outside atoms are now allowed to change, which are not as concerning.
      So yeah the new a1-a6 are worrying and b1.a through b1.d are very concerning; again, these would pick up B-H or Si-H bonds in the right contexts, which we likely wouldn’t want/expect.

  • DM: At one point we did have all stars torsion at the top. But, we don’t want that situation.

  • CBy: But, lot of the crazy parameters with lots of negations can be simplified into some other patterns.

  • TG: Yeah, then you have to manually specify that since there are lot of ways to write these.

 

 

 

Action items

Decisions