2020-09-11 Chemical Perception meeting notes

Date

Sep 11, 2020

Time	Item	Presenter	Notes

Time

Item

Presenter

Notes

15-20 minutes

Gradient-informed type-splitting

@Josh Fass (Deactivated)

CCB – If that were combined with Chemper-proposed patterns, you might get more reasonable guesses at types
- CIB – Agree
DM – GB parameters are a nice toy system, but I’ll be more convinced by bonds. Though I understand that single-atom parameters are easier since there are less degrees of freedom
CIB – Like that this is a good proof of principle. And it’s good that the current direction will hit the “complexity explosion” and naturally need to integrate Chemper to constrain the space of possible SMIRKS eloborations.
DM – TG and I have been looking at separating into gaussians, and using molecules which are in the non-overlapping part of the gaussians to develop types.
CCB – I think Chemper could handle problems at this scale.
CIB – Maybe we could test these proposed solutions by seeing if they could reproduce HJ’s work/decisions over the last year.
- CCB – Agree

Slides:

15-20 minutes

Physics-based typing with Atoms-in-Molecules and Gaussian mixtures

@Tobias Huefner

CIB – I saw Bader talk a few times about his invention of the AIM approach. One interesting thing in the decomposition scheme. You’re trying to bring value by bringing quantum descriptors instead of cheminformatics descriptors. We used the same sort of thing with WBOs (filling in a shortfall of cheminformatics representations using QM information). You’re also looking to bridge the physics and chemistry using clustering schemes. Having tried to do this sort of thing before, I’ve found lots of pitfalls. Could we find a simple toy system for this sort of thing? Maybe JF’s set?

TH – Agree. I was planning to start with LJ type distinction, but I could do GB or charge like JF.
JW – Shared test sets and benchmarking infrastructure?
- JF – I agree that it would be efficient in the long run for us to standardize on standard infrastructure and datasets.
- (General) – Yes

15-20

minutes

Atom typing using set theory

@Trevor Gokey

CIB – I like this direction. Three questions/comments:
- You are going to have to make a representation of bonds. One key advance that we want to bring into the FF is to move away from integer bond orders into widespread use of WBOs. Floating points will be hard to represent in a bit vector.
  - TG – There’s a bit vector for the bond as well.
  - DM – We would plan for more of these to be replaces by WBOs
  - CB – Is there a representation for “any” bond order?
  - TG – Yes – Bond bit vector “11111…”
- How will you handle an atom which is described by what it’s bound to? SMARTS can be recursive, which could increase complexity.
  - TG – This is unresolved. I’ll discuss it at the end.
- Bit vectors overdefine things which are mutually exclusive: An atom can be X1, X2, X3, or X4, but not many of them. Will this affect representation? Is this unnecessarily complicated/high dimensional?
  - TG – I can cap this by preprocessing the space I’m working on and removing any possible combinations that never appear. I can also make rules to prevent things like [...H4X3
CIB- how to analyze the data? There are several methods out there you can choose. within data in bit vector, using some grouping scheme, like random forest, to figure out what is common among them. DT has a feature selection: what the recurring theme is in the tree.
Action items postponed to next week
- will come up with dataset for toy test.