2022-09-06 Protein-ligand benchmarks meeting notes

Participants

  • @Diego Nolasco (Deactivated)

  • @Iván Pulido

  • Ben Ries

  • @David W.H. Swenson

  • @John Chodera

  • @Richard Gowers

  • @Mike Henry

  • @Jeffrey Wagner

  • Levi Naden

Goals

  • DD : update to milestones and deadlines - seeking approval

  • RG : clarification on long-term custodianship of fah-alchemy

  • DD : fah-alchemy - current board status

    • fah-alchemy : Phase 1 - MVP

    • 3 weeks out from 10/1 deadline for ASAP, biopolymer benchmarking

    • gufe#36 nearly ready for merge (@David Dotson finishing touches, reviews received from @David W.H. Swenson, @Richard Gowers)

    • gufe#45 is in progress; gives protein tokenization/serialization (Ben Ries, @David W.H. Swenson, @Richard Gowers)

    • gufe#42 almost merge-able after #45 is in place (@David W.H. Swenson)

    • @David Dotson development effort is focused on Executor (service API), Scheduler (compute), and ResultServer (storage) for FAH

      • AlchemicalNetwork storage into ResultServer::neo4j works; working on full-fidelity roundtripping

      • turning attention to compute (dask+distributed cluster) this week

  • DS, BR : Protein serialization update (gufe#45):

  • IP : Nonequilibrium Cycling Protocol (perses#1066) update:

  • MH : ProtocolSettings taxonomy update (gufe#37) update:

  • MB : protein-ligand-benchmark - 0.3.0 update

Discussion topics

Item

Notes

Item

Notes

DD : update to milestones and deadlines - seeking approval

RG : clarification on long-term custodianship of fah-alchemy

  • JC – I’d figured that my lab would be the major custodian, wiht OpenFF also playing a role

  • RG – my understanding is that because it would be constrained by what can be run through the Folding@Home cores, so it’s of limited use for us to use for developing new methods

  •  

  • JC – Can implement custom behavior in integrator - some room for custom logic in C++, but very limited and we can’t run untrusted python code.

    •  

  • RG – I’m fairly correct in my understanding that if someone comes in with a new method, it would be of limited use to benchmark that immediately with F@H

  • JC – Right, you could benchmark alternate approaches for ligand preparation and network definition, but there are some changes that wouldn’t be in scope.

  • RG – the thing about mappings is useful to point out; valuable to know what areas it could be useful for us in

  • JC – to be clear, would only consider OpenFE’s role as a tertiary level of support; first/second would be Chodera Lab and OpenFF

  •  

DD : fah-alchemy - current board status

  • DD – PLB #52 - I haven’t seen much movement on this in the past week. Do we have any updates?

    • IP – I’d asked MB for further processing - One partial blocker is whether we want ligands to be in separate files or joined. Also

    • JC – think it may be better to do consolidated ligands in an SDF file

    • JW – my big concern on that is that there really is no spec for what a multi-molecule SDF means; or at least, different tools interpret it very differently

      • separate files keeps them explicit as different ligands

      • https://docs.eyesopen.com/toolkits/cpp/oechemtk/oemol.html#dude-where-s-my-sd-data

    • JC – since w’re only planning to use one copy of a ligand per set, not sure that this is a huge risk right now; in the context of this repo, it may make sense to make the statement that these are multi-molecule SDFs, not merely multi-conformer

    • JW – should make it clear on insertion that we don’t support multi-conformer

    • RG – what is the issue with having multiple conformers?

    • JC – the point of us doing things in this context is that we want to provide a single conformer with our best guess as to what should be used; the understanding then is that our SDFs are multi-molecule, but not multi-conformer

    • JW – I’m fine with this approach; if we do want to support multi-conformer later, then we’ll have to re-engineer our approach

  • Decision - we can have multiple structures in a single SDF file, our infrastructure will assume this ALWAYS corresponds to multiple distinct molecules and NEVER more than one conformer of a molecule.

    • JW + JC + RG – Approve

  •  

  • 3 weeks out from 10/1 deadline for ASAP, biopolymer benchmarking

  • gufe#36 nearly ready for merge (@David Dotson finishing touches, reviews received from @David W.H. Swenson, @Richard Gowers)

  • gufe#45 is in progress; gives protein tokenization/serialization (Ben Ries, @David W.H. Swenson, @Richard Gowers)

  • gufe#42 merge-able after #45 is in place (@David W.H. Swenson)

  • @David Dotson development effort is focused on Executor (service API), Scheduler (compute), and ResultServer (storage) for FAH

    • AlchemicalNetwork storage into ResultServer::neo4j works; working on full-fidelity roundtripping

    • turning attention to compute (dask+distributed cluster) this week

  •  

DS, BR : Protein serialization update (gufe#45):

  • BR – GUFE #45 - Need from_pdb and to_pdb via OpenMM, and to_dict and from_dict by wednesday. The features are in theory implemented and just need testing. So I anticipate that this will be done tomorrow evening in europe.

  • JC – the protein PDB files in the repo make a great unit test, what are you looking at?

    • BR – Yes, those are good, also want to read in PDB and write out and compare them.

    • JC – You should be able to retrieve PDBs from the repo

    • BR – Already did it.

    •  

  • JW – are you able to test for correctness of bond orders, formal charges?

    • BR – Bond orders are translations in formal charges - So if one of them is wrong then the I should get a chemistry error.

    • JW – you said bond orders are translations of formal charges; in the at case you would expect RDKit to be loud about this. Are you applying bond orders and implying formal charges from this?

    • JW – Now that we have our toolkit release cut, would be happy to work directly with you on a further iteration of this, but I agree that it’s important for you to reach your immediate deadline and can work with you on the next iteration in the future.

    • BR – yes, would take you up on this after our initial implementation is in place

  • DS – Agenda says #42 is “mergeable”, but I don’t think it is - I need to make additions to unit testing and will inform once that’s done.

IP : Nonequilibrium Cycling Protocol (perses#1066) update:

  • IP – I’ve hit a blocker - I can reproduce protein mutation cases - but when I try to reproduce examples for protein-ligand I’m hitting problems. These pathways use different API points so I need to debug the code paths for the latter (getting NoneType errors, need to dive in and figure out where). I think it's a perses issue and will discuss this at perses devs meeting today.

    • already solved other issues I’ve seen with serialization, breaking changes in new OpenFF release

    • now hitting issues that appear to be perses issues; can hit that in our working session

  • Do think I won’t be able to complete by end of this week; will extend into next week

  • JW – can you clarify; it’s working for protein mutations, but not protein-ligand transformations?

    • IP – there are two different API points, and yes one of them works, but the other currently needs work

  • DD – Anything we can do to help?

    • IP – I’d like to check in with you after the Perses devs call (3 PM US Eastern) in case we need to coordinate further.

    • DD – Will do, we’ll talk offline to coordinate a time.

MH : ProtocolSettings taxonomy update (gufe#37) update:

  • MH – I’ve spent some more dev cycles on this. Added unit support, but there are a few issues with using our base model on generics that have units. Working with MT on a unified approach to this. Also adding some convenience functions (like ensuring we don’t have negative temps in K). But basically implementing dimensionality checking (so lengths can be different units, but they all most be measurements of length).

  • DD – It’s useful to consider this an area that we’ll continuously iterate.

  •  

  • JW – are you getting all the Matt time you want? Really want to leverage him here

    • MH – yes I am, but if you want to give him your blessing that may help get some more time

    • JW – excellent, I’ll tell him to spend all the time he wants on this to free him up

  • LN – I also have resources we can contribute from MolSSI on pydantic usage; will reach out after call to see how best to contribute

  •  

MB : protein-ligand-benchmark - 0.3.0 update

  • DD – MB unavailable for this meeting, I’ll try to get an update offline and update on GH.

  • IP – MB is moving forward with this - I’ve gotten a few questions. Some issues with Schrodinger’s inconsistent handling of peptides. Started using some tools from Bonvin lab and has had a good experience. Some complexity around xtal waters.

  • JC – From our end, we decided that waters and ions could be in PDB, but more complex small molecules (“spectator ligands”) would need to be explicity defined outside of a PDB file.

    • IP – In an example I’ve been working on, the spectator ligand is in the PDB. There is probably an issue here with the source of chemical information for those ligands.

    • JC – I unfortunately won’t have time to work with MB this week

    • DD – I’ll reach out to MB and offer assistance.

  •  

 

  • JC – OpenFF PIs are going to brainstorm what we need to collect for the NIH renewal application. One big question is how we measure impact/usage of our tools. So if folks have ideas here I’d love to hear. Maybe a callback when a user loads/downloads a file. Current ideas include monitoring how many FE calcs we’ve run.

  • DD – Will be a little tough to measure this for fah-alchemy since job submission won’t be publicly available. But maybe metrics around community engagement with PLBenchmarks.

  • JC – Yeah, but it will be good to measure throughput/jobs processed on F@H.

  • DD – That makes sense. For fah-alchemy we won’t have unique users, but we could measure job throughput.

Transcripts without revisions

F@H interface meeting - September 06
VIEW RECORDING - 47 mins (No highlights)

 

@00:17 - David Dotson

Okay, I think we've got everyone. I'll drop links to the notes in Chat. Second as usual, Jeff, thank you for taking notes.

I feel bad asking every time, but you do a remarkable job. Is that something will do this week? Yeah.

Okay, got it. If ever you're not willing to do it, let me know. It's do.

 

@00:47 - Jeffrey Wagner

Thanks. Thank you.

 

@00:50 - David Dotson

Okay, so I got the agenda on the left. Before we jump into it, are there any items folks would like to add?

Okay. If not, we'll go ahead and jump in. So the first thing I'd like to get out of the way is we as in Diego, myself, and Jeff, we updated the the project page for the open, FF driven portion of this project, that is this meeting a few weeks ago.

I'm seeking approval on this. Basically, I'm looking for a thumbs up from each of our approvals. That is Jeff, John and Richard.

If we can get approval on this, then we can mark this as approved. Basically, we updated mainly the milestones, including the deadline lines to align with the path we've taken and then where we're going as well.

So I'll give a couple of minutes unless I'm looking for thumbs up. Sorry, I'm scrolling through. I see a thumbs up from Richard.

 

@02:06 - John Chodera (he/him/his)

Would it be okay to review this first? I'm still having trouble clicking on the link. I could only edit the link for some reason because I'm in edit mode on the page for some okay.

 

@02:16 - David Dotson

Sorry.

 

@02:17 - John Chodera (he/him/his)

There you go. I was able to close the page, and then I called Jeez.

 

@02:21 - David Dotson

Got you. Yeah. So we updated the milestones as we kept the things we've already completed. And then we've changed deadlines to match kind of current state.

So notice these things are now in progress. It's a deadline that's in the future. So trying to hit each of these items by mid September.

That is coming up very shortly. That will be next week with the production environment stand up by October 1.

 

@02:53 - John Chodera (he/him/his)

Got it. Yeah. This is fine. It's ambitious, but it's fine with me if you're okay with it.

 

@02:57 - David Dotson

It is ambitious. Yeah, I'm fully aware of that. That's so. We're trying to still match the deadlines that we set.

 

@03:05 - John Chodera (he/him/his)

Okay, so we got a thumbs up from John Jeff.

 

@03:10 - Jeffrey Wagner

Yup. Good by me.

 

@03:11 - David Dotson

Okay, cool. Thank you. Can you record that in the notes?

 

@03:14 - Jeffrey Wagner

We've got approval.

 

@03:15 - David Dotson

We do. Thank you so much. Any questions on that? Okay. Richard, do you want to raise this issue?

 

@03:29 - Richard Gowers

Yeah, I think I heard some chat that we might be open to see, might be expected to look after Far Alchemy in the long term.

And I wasn't aware of that. I just didn't know if that was the thing.

 

@03:48 - John Chodera (he/him/his)

Where did the chat or come from, out of curiosity?

 

@03:50 - Richard Gowers

You'd see? I'm fine.

 

@03:53 - John Chodera (he/him/his)

Oh, from I think this is just David being confused.

 

@03:56 - Richard Gowers

Yeah.

 

@03:57 - Mike Henry (he/him)

He didn't say David.

 

@03:59 - Richard Gowers

Yeah. Fair enough, but I didn't know that that was on the card.

 

@04:07 - John Chodera (he/him/his)

My expectation has always been that since I am one of the following at home consortium members, that the primary responsibility for supporting it would be with me, the secondary responsibility because of the need for running benchmarks.

So as long as that exists for open force field would be with open force field. But certainly I have always planned to bear the primary fiscal and managerial responsibility.

 

@04:35 - Richard Gowers

Sure. I think David thought that it would be very useful for free energy going forward. And I understood that because you can't run arbitrary piping code for obvious reasons, it's not very interesting for free energy for looking at new methods, because you can only kind of do what sort of baked into the kernel of the existing flooding at home.

Right. And is that going to be like a long development cycle on that? For good reasons.

 

@05:01 - John Chodera (he/him/his)

There's limitations, right? Like, the execution fundamentally has to be something you can run on GRmax, one of the recent releases or run via openm, and something you can encode as an integrator, unless you want to develop a new implementation of the what they're calling it, a protocol.

Now, it's not impossible, but it does allow you to get to scale up. So, for example, in 2020, we spend $8 billion in cloud resources.

So it depends on whether you can find the personnel resources to match the amount of cloud resources you would otherwise burn.

 

@05:38 - Richard Gowers

Yeah. So for the open amount integrators, they would have to be baked in at the C level, though, right?

 

@05:44 - John Chodera (he/him/his)

Just write a Python integrate the custom integrator is flexible enough so that custom anything can be deployed without changes.

So if you could implement something as a custom, anything custom forces custom integrators, you don't need to do anything.

If you do need to encode something that it would need to be baked in at C. Or we're working on ways to potentially compile the python down to some library that you can execute, but that's still experimental at the moment.

But yeah, very Limited. It's not a general purpose computer. You can run untrusted piped code on Right. There's obvious reason for that.

 

@06:21 - Richard Gowers

Yeah. And that's fine, because it does what it does, and it does it well. That's sort of why we're not looking at it seriously in the long term.

Because if someone comes to us with a new method, we can't then sort of use folding at home as our way to benchmark.

Right. I'm sort of fairly correct in that. Right, that's Correct.

 

@06:38 - John Chodera (he/him/his)

You could benchmark things like preparation methods, for example, for posing ligands. You could benchmark things like ways in which you do network planning.

You could even do entire all to all transformations, for example. You could benchmark a whole various number of things that impact the accuracy, quality and scalability of energy calculations.

But It's it's. Takes more human time to be able to benchmark different methods that actually execute the simulation within a narrow scope.

Right. Things that are actually doing different sampling strategies. Whereas different alchemical methods could be easily experimented with chemical protocols.

 

@07:17 - Richard Gowers

Sure. Yeah. Okay. The point about mapping is interesting that I maybe should think more about because obviously it's kind of fun to try different transformations within a strict protocol.

Maybe I should think more about that.

 

@07:30 - John Chodera (he/him/his)

There are just certain big classes of things, like if I want to implement a different selfadjusted mixture sampling strategy or a different replica exchange strategy that require more human time to develop.

 

@07:44 - Richard Gowers

Okay, cool.

 

@07:46 - John Chodera (he/him/his)

Certainly we're thinking about no more than tertiary responsibility is on open, free energy right. Because of the order of utility for these different stakeholders.

 

@07:57 - Richard Gowers

That's what I thought, but then I heard otherwise. And so just from Jeff, you guys have sort of planned for the long term, sort of benchmarking, kind of using this as your essentially force field continuous integration, right?

That's the plan for this?

 

@08:12 - Jeffrey Wagner

Yeah, that's the major role that we see playing at open force field.

 

@08:15 - Richard Gowers

All right, cool. I'm done.

 

@08:22 - Jeffrey Wagner

Cool. It sounds like maybe you have concerns. How does this information affect your plans?

 

@08:32 - Richard Gowers

What was the other outcome that you were if for some reason John said it was an all purpose Python thing, then obviously I would be more interested.

But I was just checking my assumptions on me not considering it. That interesting. And that's okay. I think just maybe David's a bit wrong.

About what? Holding a home. Because doing that's okay.

 

@08:53 - Jeffrey Wagner

Yeah, I hadn't been in communication with David about that topic. At all. So yeah, I think he probably just said the wrong thing to you.

That doesn't reflect the real plans.

 

@09:09 - David Dotson

Okay, cool. So, Richard, is your question answered on that front?

 

@09:14 - Richard Gowers

Yes, I'm happy.

 

@09:16 - David Dotson

Okay, cool. Thank you. I any other questions on this? All right, thank you. We'll move into current board status, so I've added to the in progress column on the MVP board anything we're currently dependent on, including some other repos so on Virginia ligand benchmark.

This is on most of these sites at the moment. I haven't seen much movement since we last spoke last week.

Does anybody have any insights into where that's at? I did reach out to those prior to this call.

 

@10:14 - John Chodera (he/him/his)

Yvonne. You've been working with her to try to run all of those systems through the Benchmark pipeline, right?

 

@10:21 - Iván Pulido

Yeah.

 

@10:21 - John Chodera (he/him/his)

I know you've got CDK Eight done, but were there changes for others that needed to be made?

 

@10:26 - Iván Pulido

Yeah, I asked her for so I think this is related to another item in the agenda but one of the discussions that we have to probably have again where we want the links to be in separate files or join this is something that we have to talk about maybe.

I don't know if we want to talk about it right now, but other than that I know she's been trying.

To give us something that we can run through the Schrodinger's commonwealth Utilities. And I offered my help on that, but she said, oh, no, I basically already have it, but I don't have any more dates on that.

 

@11:21 - John Chodera (he/him/his)

The specific issue was that there was a discussion, I guess, a while ago, where I missed that. The suggestion was just to have a separate SDF file for each legend in a separate directory.

And I think it would be much better for us going forward if we could instead have consolidated SDF file for different legends.

We do that reasonably well in most cases where you can just pull ligands out if you need to it's.

There are tools that require the ligands be split. We can just provide a simple, like, realign Python script that would do the splitting for you into different files that match the actual names, and it would save on literally it's.

Of different files checked into the repo, which does make a performance difference as well as a maintenance burden going forward.

 

@12:09 - Jeffrey Wagner

Yes, I think my big concern on that is that there is no spec for what an SDF means when it has multiple molecules in it.

Or maybe there is a spec, but it's not respected. There's the open eye documentation page due with my SD data, which is a special page dedicated to the confusion cause by putting multiple structures in a single SDF.

That's basically my big argument for why it's good to keep them separate and be very extensive. Because more or less I could see us someday someone is going to get confused and make a really bad mistake if they're using off the shelf tools to process these structures.

Eventually it's going to get loaded. As you know, if there's a structure with two identical ligands, at some point it's going to get put into a pipeline that assumes that there are two conference of that ligand, and the confusion will ensue.

So anyway, I'm flexible on this I'm happy to discuss, and I could be outvoted, but that's my big argument.

 

@13:29 - John Chodera (he/him/his)

Since we're not planning to use we're only planning to include one copy, one conference for each legend in the foreseeable future, leaving it up to the program to consider multi molecule or multi conference simulations if they need to do that.

I don't think that's a huge risk right now. I'm we could always again provide the tool that splits them out into different files, just simply splitting on the separator should anyone need the flexibility of having separate molecules.

 

@14:04 - Jeffrey Wagner

Yeah, I think I'm not totally comfortable with the idea of splitting on the separator, but I think that's a tertiary concern.

If we can make the guarantee that all the tools made by this collaboration will always assume that SDFs are multi vollecule and never multiconformer, then I'm happy with sticking them all in one file.

I know that there will come a day where somebody will find it so unbelievably convenient to just put multiple conference in one SPF.

And we need the code to yell at them immediately if they do that. Or, you know, something needs to be very obviously wrong and we can yeah.

Just on an engineering level, we have to enforce that to basically assert that as input.

 

@14:56 - John Chodera (he/him/his)

People are providing single conference files.

 

@15:00 - Jeffrey Wagner

Yeah. Multi molecule SDFs and never multi confirm or SDS.

 

@15:12 - Richard Gowers

Go ahead. Maybe I don't run enough free energy calculations, but what's the big problem with multi compromise? If you aren't sure if it's one of two poses, could you not just throw in both and sort of pick the best?

 

@15:25 - Jeffrey Wagner

The issue is that you don't know. So in an SD file, you can attach additional information about a structure in those key value fields below and in outputs from some programs, you'll have multiple conference of your molecule, but maybe you'll have partial charges.

And only the first conference will list the partial charges. So it will be like a key value pair for, like, here's my atomic partial charges, and then the partial charges, and then the program won't say this because there's no way.

Specify it, but it will say, I'm obviously a multicond from our SPF. So when you see ten more instances of the same chemical graph as me, they all have the same partial charges.

And you need to use these partial charges in your calculation, but it won't repeat them ten times. It will only put them on the first first one and then say, because I'm a multi conference, you know, the correct way to interpret this.

But the trick is that's not in the spec. You don't know that. And this very quickly leads to logical errors.

 

@16:27 - Richard Gowers

Yeah, sure. I thought the program in question would be attached to the partial charges to each version of the molecular record, sort of structured by that.

Are they not respecting that?

 

@16:37 - Jeffrey Wagner

No, I don't believe that's part of the spec. And so, yeah, somebody just linked to the Open Eye page that explains the confusion around this.

Basically like Open Eye tries to do the right thing, and now they're stuck with this legacy behavior that's extremely confusing because some programs attached only to the first confirmation, some programs attached tab.

Every confirm.

 

@17:02 - John Chodera (he/him/his)

As I understand it, the point of us providing these staff files for the ligand is that we're trying to say, okay, please compute the binding for energy difference between these two ligands on relative or absolute scales.

And for convenience means, we're providing a single posed geometry of what we think is reasonable for this specific nation state that we think is relevant of this molecule, and a single confirmation.

Right. If you did happen to provide two confirmations for one molecule and one for the other, the behavior of exactly how you integrate those in a single edge transformation is kind of undefined and would be up to a particular program to deal with.

So I think we'll just say that, let's only provide a single customer for now, and if we do, then we have to reconsider the behavior for how we want to store these files, et cetera.

Maybe we'll have a better serialization format for molecules by then.

 

@17:57 - Jeffrey Wagner

Yeah. So I'm happy we can put them into. I would vote for a pathway where we say multi molecule SDFs are allowed, but never multi, confirm or SDFs because then we can engineer everything with the assumption that it's always distinct molecules and there will be no confusion.

But it means if we ever do want to do Multicon from our SDFs, we might need to deeply re engineer or, like, extensively test the change.

Maybe we should both thumbs up. Multiple molecules and SDF always mean multiple molecules and never multiple conference. I see a thumbs up from Richard.

Thumbs up from John. Okay, perfect. Yeah, we can dump them all into the same file. Thumbs up from Yvonne.

Thanks, Yvonne.

 

@19:00 - David Dotson

Okay, thank you. We'll end up hitting some of these in a bit. Thanks for the update on that. I'll try to get in touch with Melissa later this week just to get a sense of where things are at because I think she said she had already redocked everything and so she was pretty much done.

But I haven't seen a PR or a push to that PR last week. So just want to make sure that we don't lose track of it.

Ben Reese is working on protein serialization, so I know we'll get to him here in a second. Number 42 goofy will also follow merge of that PR that's dependent on getting per routine serialization and to make all the test pass for transformation and network.

Works. Mike will also get to you here shortly on settings. I know you're working on that as well. Yvonne is working on in Percy's number 1066 protocol.

So we'll also talk about that here. Shortly after that is in place. I'll then make a Folding at Home version of that protocol where I swap out certain protocol units, in particular the one that does actually performs an MD simulation instead.

This is going to interact with the Folding at homework center to tell Folding at Home to do this for us.

So that's a bit dependent on first he's number 1066. Currently I'm working on mainly pieces of the results server in particular.

In the last week we've been working on getting our Alchemical network serialized into Neoporge and so that is not working.

I do need to finish out the route. Tripping. So we have full fail ability to pull these things back out.

But I am able to represent the full data model in the four J for myself. I'll be turning my attention to the compute portions, and these are represented here in the scheduler bits this week.

So storage was kind of the first priority, being able to put things in and being able to pull things back out and then compute, which will operate on those components.

Any questions on where we're at currently? Okay, thank you. We'll jump into some detailed bits. Ben, do you want to give us update on Friday serialization?

 

@21:53 - Benjamin Ries

Yeah, I can try. So basically, I think. What was the requirement that we talked about was to have from PDB to PDB via open Mm and also the from functions by Wednesday.

And this is kind of what I'm doing currently. So in theory, the features are implemented, but the unit tests need to be finished.

And I'm also checking correctness right now, and I need to update the documentation. So therefore, I think it's realistic to say that it will be done by tomorrow.

Tomorrow evening. Europe meaning morning in US.

 

@22:45 - David Dotson

Thank you for that. Are there any questions? You want to use this group for any blockers right now?

 

@22:54 - Benjamin Ries

Not really fixed.

 

@23:00 - John Chodera (he/him/his)

Just curious about unit tests for this. So the protein PDB files obviously provide a great set of unit tests.

In this case, what are you using for testing?

 

@23:10 - Benjamin Ries

The loaders and savers in which way I test the functionality. We had already some prewritten unit tests which I first want to integrate.

And then of course, David Swanson suggested a 1.5 test. I think, if I remember it correctly please correct me if I'm saying this wrong, but I think the idea is to read in a PDP, write it out and then compare it if there is a severe difference, and I think that should be a proper one.

 

@23:48 - John Chodera (he/him/his)

It would probably have to be a temporary URL right now, but you could be able to retrieve the PDB files for the benchmark from the repo and then test them as part of the unit test.

 

@24:00 - Benjamin Ries

That might be a good I have to because I already did that because I worked on the benchmarks. So I already pulled one right now from there and probably having more.

 

@24:13 - John Chodera (he/him/his)

Nicer. Yeah. The key is if you can automate during testing time, pull them all and test them all and see if they fail.

That would be extraordinarily helpful to make sure that this at least reaches the near term and also detects whether the further term issues break.

 

@24:30 - Benjamin Ries

I see. Yeah, we can do that.

 

@24:35 - Jeffrey Wagner

Are you able to test for correctness of bond orders and formal charges or is this just aerial order?

 

@24:45 - Benjamin Ries

Bond orders. I see. As a translation in the former formal charges. So if some bond order is severely wrong, I also will have formal charges that are awkward.

That's one thing. And then on the other thing. I now define the orders. So it's a bit like I only can see it in the form of charge of something like an amino acid charge that shouldn't be charged, or something like that.

Did that answer the question?

 

@25:16 - Jeffrey Wagner

Almost. The bond orders or translations of formal charges. And I think you implied basically, if you get it wrong, then you should expect our kid or Open Eye to yell at you, which is probably right.

 

@25:28 - Benjamin Ries

I mean, they don't yell, but they tell me all this is charged.

 

@25:33 - Jeffrey Wagner

Yeah. Okay. I know you had mentioned last week that Openmad has a data field for bond orders, but not formal charges.

Are you defining only bond orders and then implying formal charges from that a moment, yes. Okay, I know you have a deadline in about 24 hours.

I think that's a bad idea, and I would be happy you should meet your date. Deadline. And now that we have our big release cut, I have some more availability.

And so I'd be happy to work with you on a second iteration of this. Okay, cool. But I won't get in your way for tomorrow evening.

 

@26:18 - Benjamin Ries

I would be happy to not make it more complicated right now, but I absolutely agree with you that this is not necessarily the most elegant solution, because solution probably to this problem.

 

@26:32 - Jeffrey Wagner

Yeah. Cool. Thank you, Ben.

 

@27:01 - David Dotson

Okay, cool. Any questions for Ben? Okay, thank you, Ben. Yvonne, do you want to give us an update on the nonsense before we go there?

 

@27:14 - David Swenson

Can I make a really quick comment on related to 42? Because it says in the agenda that it's merchable.

I would say it's almost merchable after 45, I think I need to write another unit test, but I'll be on that now that it was waiting for 45 to be in place before, that was going to be something I was going to focus on.

 

@27:31 - David Dotson

So I'll get that done soon. Okay, thank you. Yvonne, do you want to give us an update on the non equipment cycling protocol on purses?

 

@27:45 - Iván Pulido

Yeah, so I kind of hit a blocker because I'm trying to so basically, we've tested the nuclear switching. We have many examples with protein mutations and when.

Try to reproduce examples for protein ligands and running into some blockers. We'll have purchased that thing later today. Hopefully we will discuss is and know what's causing the issues I already solved, like other issues that I saw with Serialization and I'm breaking changes with the new Open FF release.

But I understand what's going on and that's already solved. But now I'm hitting this issue that seems to be a purchase issue.

So we would need to discuss that in our purchase meeting and estimate why is that happening and what's the solution for that.

And hopefully. Yeah, it will take too much. But I do think that I won't be ending or completing this for the end of this week, as I said last week.

But it would probably take me a few more days.

 

@29:19 - Jeffrey Wagner

Could I clarify? I for the Notes. So it's working for protein mutations, but it's not working for protein ligand.

 

@29:25 - Iván Pulido

Right? Yeah. The problem is that those two pathways use different API points. So I have to develop the API points that the protein league and simulations are using because I'm getting, like, nontype errors trying to choose something from an object that is not on and things like that.

 

@29:53 - John Chodera (he/him/his)

Yeah, there's a ton of crust there, unfortunately. We are going to work our way in, refactoring this down the chain as soon as.

To get this top level API in place, but we intend to unify them at some point. That was an unfortunate consequence of covet, though.

 

@30:13 - Iván Pulido

But my hope is that with today's person's step meeting, we can trace this error and have a roadmap to solve it and hopefully keep on working on this PR.

 

@30:30 - Jeffrey Wagner

Cool. Thank you.

 

@30:36 - David Dotson

Cool. Thank you. Yvonne, is there anything this group can help you with?

 

@30:44 - Iván Pulido

Sorry, I missed that.

 

@30:45 - David Dotson

What was that? Are there any questions for this group that this group could help you with on this front?

 

@30:53 - Iván Pulido

No, just maybe I just have to reach out to Mike to talk about the protocol settings objects. But other than that so we have a meeting, like a working session?

I would say that it's better to do that after a person. So it could be tomorrow or or later today because I would have more information and clear what we have to do.

 

@31:27 - David Dotson

Yeah, we can reschedule that working session for later today, if you like. That will be best.

 

@31:32 - Iván Pulido

Yeah.

 

@31:33 - David Dotson

Okay. Yeah, let's talk after the call then. We'll find a time.

 

@31:38 - Iván Pulido

Yeah, perfect.

 

@31:39 - David Dotson

When is the person's dosing?

 

@31:42 - Iván Pulido

It is today at four, if I'm not mistaken. For Eastern 03:00 PM. Three. Eastern Friday.

 

@31:50 - David Dotson

Okay. Yeah. So we can schedule something in the early afternoon, probably you and I. Okay. We'll figure it out.

 

@31:59 - Iván Pulido

Thank you. Yeah. Okay.

 

@32:01 - David Dotson

Any questions for Yvonne? All right, thanks, Aban. Mike, on the question of protocol settings, do you want to give an update?

 

@32:15 - Mike Henry (he/him)

Yeah. So I spent some more death cycles on this. I've added unit support to it now, but there are a couple of bugs with using generics with our base model that supports units.

And, I mean, that's really the biggest issue. I'm working with Matt on that front, so that way we have a unified kind of approach to this as well as adding some more convenience unit stuff.

So, like positive load quantity, so that way you don't have to write a validator to make sure the temperature is positive, but you can then type in.

Put in temperature and get that as well as also add support for dimensionality. So I made some opinionated choices, as in like temperature and Kelvin, distances and nanometer, but I don't think that's what the final product should be.

I think instead what we should day is like units of length, units of temperature and let users kind of pass where they want.

Because since we have unit support, then whenever we actually need it to be in a certain unit, we should be able to call to whatever unit method and have it that way.

 

@33:32 - David Dotson

Okay, cool. I know Richard is working on a protocol and then of course the bond is working on one perceives.

Are you plugged into their activities?

 

@33:46 - Mike Henry (he/him)

Because they'll be I'm more plugged into a Bond. But yeah, I should sync up with Richard as well on that and we might make the decision to get it working good enough so we have something and then add the mail.

More like convenience bits in later if that ends up being too much of a blocker. But I've raised the issue with that.

We're going to meet up some time to kind of discuss it because I don't want that to necessarily slow it down because something like flow positive quantities in a huge deal and I think dimensionality any checks are going to be beautiful once we get them in.

But if that also ends up being like either something we have to contribute to Pint or just it's going to take more time for us to test it thoroughly.

I'm okay, at least at this stage being more opinionated about requiring the units to be passed down to be of a certain dimension unit.

 

@34:38 - David Dotson

Yeah. And I think it's worth identifying those areas where this is clearly something we can iterate on in another iteration or won't stop with.

We have something that works. We will, of course, continue to work and improve on these things and so identifying those places where you can get somewhere.

But it's a working product, something that we can use.

 

@35:00 - Mike Henry (he/him)

To kick the tires on it, so that way we can actually because I'm sure these are just theoretical concerns I've identified as my implementation.

I guarantee you we'll find more once we start consuming the protocol in other past.

 

@35:14 - David Dotson

So yeah, don't want to hold up too much on this. And that'll make it clear what guardrails are really valuable, right?

Because there's probably an infinite number of guardrails you and I can think of right now.

 

@35:24 - Mike Henry (he/him)

The universe will always come along and invent someone that will overcome those guardrails. So there's a certain limit to that.

 

@35:31 - David Dotson

Yeah. So we'll find the practical bound in use. Any questions? Go ahead. Sorry.

 

@35:38 - Jeffrey Wagner

Oh, I want to say, are you getting all the math time that you want? Because it seems like a great use of math time to coordinate with you on this and make sure that we can handle more cases than just the open FF toolkit with our units.

 

@35:49 - Mike Henry (he/him)

Yeah, I am. But maybe if you want to give him your blessing about this as being a collaborative use case, I might get more.

I'm sure he's. I feel like I'm fine with it. I know he triages the time as well, so if this is something he knows is a valuable thing to contribute to, he might spend some more time on it.

I'm just excited for eventually writing the blog post that is, like, just how you can use Pythonic and this unit package and make killer data models that are going to be, like, great, just wonderful and exactly what we wanted for so long.

So that's kind of my end goal for this.

 

@36:29 - Jeffrey Wagner

Yeah, I agree. Okay. I'll tell him to spend all the time he wants with you on that because this is great.

Thank you.

 

@36:36 - Mike Henry (he/him)

Cool.

 

@36:39 - Levi Naden

Mike, just an aside to not be realistic, I'm going to reach out to you after this about doing pythonic models and helping teach people about that.

So I've got some material for that from multi side of things that I think we might be able to coordinate on and make a better package.

I'll reach out to yeah.

 

@36:56 - Mike Henry (he/him)

Beautiful. Yeah. I definitely want to encode. I would say I'm an amateur. You're? Start using Pegantic, so I don't know nearly as much as what could be out there.

And I certainly want to make sure that we follow kind of the best practices that we've established with it, because people are going to copy and paste this code.

So I'd like it to be, like, the example. I so I appreciate that. Yeah, no problem.

 

@37:18 - Levi Naden

So I'll reach out to you after the meeting.

 

@37:29 - David Dotson

Okay, cool. Mike, do you feel like you have what you need?

 

@37:33 - Mike Henry (he/him)

Yep. Okay.

 

@37:35 - David Dotson

Are there any additional questions for Mike? Okay, thank you. My last thing was a benchmark update. We discussed this already.

I haven't heard back from Alyssa, so I will try to get an update from her myself. I think, Ivan, you're working with her pretty closely.

 

@38:04 - Iván Pulido

Well, she's not that closely, I guess, but she has asked me a few things, and I know she's been dealing she's been trying trying to make it as reproducible as possible.

And she mentioned some problems with how shrouding her deals with pep talk, that sometimes it deals in an inconvenient way.

So I know last time we spoke, she told me that she was using PDB tools from the I don't know.

And apparently she was getting good results with that. She's I know about that. She also said she had many discussions with people I don't know about where to include.

Or not, or some waters that are part that come with the structures. So I don't know, maybe it was with airplane.

 

@39:12 - John Chodera (he/him/his)

From our end, there was discussion about waters and ions could be included in the PDV, but the other molecules that need information about the chemistry need to be separate.

PDF or something like that. We call them Spectator League's or Cofactors because there's not enough information to easily reconstruct the chemical information unless you're using the new open source code.

Biohubber stuff. Sorry. Even then, you don't actually support arbitrary logins from the chemical component dictionary yet.

 

@39:40 - Jeffrey Wagner

Right, right. Yeah, we don't support arbitrary logins from the CCD.

 

@39:46 - John Chodera (he/him/his)

Got it. Is that what you meant, Yvonne? Or is it a different aspect?

 

@39:51 - Iván Pulido

Yeah, it is related to that. So, yeah, I think for us, ideally, we would like the so. How process works.

When you specify a factor or spectacular, you basically merge them in a single structure. The example I've run, it's even merged in the same in PDB, as far as I remember.

So I don't know if ideally we want that separately or not, but probably we do because you said we don't get the whole informatic information.

But yeah, it's related to that, but also related to basically just having random items in the structure coming from the VDP itself and whether she needed to strip them out or not.

Okay.

 

@40:53 - David Dotson

Does she have what she needs to make some of those decisions or.

 

@41:01 - Iván Pulido

I cannot say. Maybe John has more information then.

 

@41:05 - John Chodera (he/him/his)

I don't know. And I don't have time to check in with her this week, unfortunately. But I can set up a meeting with her for Monday to see if there's anything else that she needs before this meeting next week.

 

@41:14 - David Dotson

Okay, yeah, that'd be great. I just want to make sure if she's stuck on something, I'm making a decision then I'd like to find ways to unblock her with either this group or folks from this group, and I'll reach out again to her and see if there's anything we can help with.

Okay. Any other questions or comments on that topic? Okay. If not, that's it. Do we have any other topics folks would like to discuss?

 

@41:55 - John Chodera (he/him/his)

I just wanted to give everybody a heads up that the open source. Force field PiS from the NIH Grant are going to meet on Wednesday this week to go through brainstorming what we need for next, the renewal application, which is going to be due before March of 2023.

Gosh and one of the things that has come up is that how can we measure how many people are using our tools and all the different ways in which that happens.

So I'll put a little thing in slack in this channel too. It would be great if we had we always had this problem that we don't think about this until the very last minute and then it's very hard to measure impact.

So if you folks have insight as to how can we measure how many people are using this, obviously, we should have a citation.

People can copy and paste. We talked about having a citation for the version of the released library as well as the manuscript that talks about things, but what other ways we can we measure it?

Like if we eventually put in place a little Python tool that does the automated download for you when you want to access the thing, could we keep track of how many times people use it, things like that?

So it's just can we think soon about how we can measure impact so that it will help all of our organizations in the future say something about how wide of an impact this is actually happening?

Where are those discussions currently happening? We just have a Google Doc we've been dumping a bunch of stuff into, thinking about what we can go and scrape from the various sources, like conda downloads, etc.

I will paste the current thoughts about how to measure impact. Mostly this is retrospective, but it might spread discussion about what should we put in place for this project or these projects, about how much people are using them.

Like, should we monitor how many free energy calculations we've done? That would be an easy way to help a funding agency or funder understand.

The scale that we're operating at. Just various things like this to think about measuring without being too intrusive. It's better to do that from the beginning rather than for the end.

 

@44:12 - David Dotson

Agreed. In the context of this project here or this isn't really because the stuff is not really something that anyone can just use.

That's harder to say anything about impact. Right. Protein leaving benchmark, qualifies. Other folks may use that. Hopefully they do.

So there's protein liquid benchmark, there's goofy. And the fees, of course, has their own sets of repost.

 

@44:41 - John Chodera (he/him/his)

Are we doing you can still track like how many times calculations does it run? How many unique users do we end up having, et cetera.

 

@44:50 - David Dotson

Falcon.

 

@44:51 - John Chodera (he/him/his)

Yeah, falcon is pretty heavily tied to use at home. Oh, sure, yeah. But that also means.

Action items

Decisions