| | | CB: Can we discuss about the ML big picture, this is the 5th year of openff and ML models are coming up from John’s work, how do we differentiate the physics based approach we have been using so far, and replace the physics model with a neural net kind of approach we may have to incorporate or at least use at some point. DM: Hmm, I think we kind of know when the ML model would go off the rails, most probably when we ask them to predict things that are very unlike to what they’re trained on. And so, I would love to see what happens to a model like espaloma when we test it on chemistry it has not seen before, some functional group not seen before or hydrogen bonds, etc., and if it holds up, or crash and burn, when compared to the physics based model. CB: I do agree with that. My analogy is with early stages of amber where there used to be a disclaimer “no known bugs” and fixing bugs as and when they encountered, similarly whenever something bad happens ML people can train on the new data and fix it. So, we should not fear of using the ML model because it may go off the rails, or because of it’s limited domain of applicability. DM: To be more concrete, I would say pick your training data, training procedure as you wish, and tell me what it is trained on, and then we can test on things that it is not trained on, and if it breaks every time we test it then it may have to learn something new, or the model is not useful. CB: Yeah, exactly. There could be another tactic, like Ajay Jain’s docking approach. Take the data you have, shut off some chemistry, train the model on remainder and test it on this.
|