2022-08-30 Protein-ligand benchmarks meeting notes

Participants

@David Dotson
Jenke Scheen
Benjamin Ries
Levi Naden
@Jeffrey Wagner
@David W.H. Swenson
@Iván Pulido
@Irfan Alibay

Goals

DD : update to milestones and deadlines - seeking approval
- (Complete) Protein-Ligand Benchmarks Automation via Folding@Home
DD : fah-alchemy - current board status
- fah-alchemy : Phase 1 - MVP
- 4 weeks out from 10/1 deadline for ASAP, biopolymer benchmarking
  - we have some chance of making this deadline and having a minimally-usable system deployed, but will be tight
- gufe#36 in review; ultimately needed for perses#1066
- also working to resolve gufe#42 and gufe#45 (@David W.H. Swenson, Ben Ries, @Richard Gowers), which will complete coverage of our GufeSerializable approach across existing gufe objects
- @David Dotson focusing development effort on Executor (service API), Scheduler (compute), and ResultServer (storage) for FAH
- mskcc2 deployed on Chodera Lab AWS; using as dev host for mskcc1 prod host
  - Folding@Home work server
  - neo4j graph database
  - dask+distributed cluster
  - service API via FastAPI
DS, BR : Protein serialization update (gufe#45):
- DD : what are the assumptions for protein-ligand ChemicalSystems? Docked + protonated + capped?
IP : Nonequilibrium Cycling Protocol (perses#1066) update:
MH : ProtocolSettings taxonomy update (gufe#37) update:
MB : protein-ligand-benchmark - 0.3.0 update
- status of #52: are we blocked anywhere? Do we have docked structures with assay conditions, or should we proceed with what we currently have right now?
  - need CLI tool that produces prepared target, docked ligands

Discussion topics

Item	Notes

Item	Notes
DD : update to milestones and deadlines - seeking approval	(Complete) Protein-Ligand Benchmarks Automation via Folding@Home Unable to do, missing approvers
DD : `fah-alchemy` - current board status	fah-alchemy : Phase 1 - MVP 4 weeks out from 10/1 deadline for ASAP, biopolymer benchmarking we have some chance of making this deadline and having a minimally-usable system deployed, but will be tight `gufe`#36 in review; ultimately needed for `perses`#1066 DSwenson will review also working to resolve `gufe`#42 and `gufe`#45 (@David W.H. Swenson, Ben Ries, @Richard Gowers), which will complete coverage of our `GufeSerializable` approach across existing `gufe` objects DS – BR is working on this, I think he has what he needs. @David Dotson focusing development effort on Executor (service API, fah-alche), Scheduler (compute), and ResultServer (storage) for FAH DD – fah-alchemy #18, #17, #21 `mskcc2` deployed on Chodera Lab AWS; using as dev host for `mskcc1` prod host Folding@Home work server neo4j graph database dask+distributed cluster service API via FastAPI
DS, BR : Protein serialization update (`gufe`#45):	Protein serialization update (`gufe`#45): DS – Overall picture is to vendor in some of openmm’s app.pdbfile, extend the xml file to include bond order, from that we can make an rdkit molecule. Our protein component is really a “everything that isn’t a ligand” component. Serialized representation needs to hold multiple molecules. So for now we’ll dump everything in gufe, but later we’ll upstream it to OpenMM. JW – will this be upstreamed to `openff-toolkit` to use its Topology object? DS – we do want that, but in the short term we need something that works BR – it’s also part of the general serialization class that allows us to backpropagate back into an OpenFF object? JW – I feel like it’s a combination of actual limitations, short term limitations, miscommunication would like to have another go at creating a clean path for use of openff-toolkit `Topology` DS – that would be our ideal path; we’re focused on something that works in the short term JW – your representation is going to be an RDKit Molecule? DS – that is the plan for now; unclear what challenges that will present would like a cleaner solution there isn’t a single `Topology.from_pdb` yet; would like to revisit this in a few months to replace `gufe` implementation with something like this BR – deadline? DD – by end of the week (EOW) is ideal; this blocks our efforts for ensuring we can actually do protein-ligand free energy calculations via protocols basically limited to small molecules and solvents if we don’t have this JW – on bond orders: support in OpenMM added, but probably not used much DD : what are the assumptions for protein-ligand `ChemicalSystem`s? Docked + protonated + capped? IA – Anything in the PLBenchmark reqs are what we need - Capped, protonated, already docked (we’re not offering a docking pipeline). DD – This will be important for JS to understand what gaps to fill to get to appropriate input. JS – This sounds good. I assume it’s the same as the best practices paper. DD – I’d like to square this up with the changes in PLBenchmarks #52. That is the most detailed source of information for more details. IA – Re docking preparation – MCS isn’t simple enough to automate in the current iteration. Also, it’s not clear that we’ll have all the input structures prepared in #52 in time, you can also use the `openfe-benchmarks` repo as a resource: DD – IA, could you be the contact person for JS moving forward? IA – Sure. JS – I know Richard in the past wanted to use MMCIF? Not sure if still the case? DS – We’d love if people moved to a better format, but we’ll realistically always need to support PDB. JW – we’re in the same boat; would love to just support MMCIF, but realistically all the good datasets out there (and many users) use PDB. So OpenFF will eventually support mmCIF but for now it’s just PDB. IP – I have a task to reach out to MB, to help finish #52. I’m concerned that we may be doing some redundant work here, so I’d like clarification on what we should be doing.
IP : Nonequilibrium Cycling Protocol (`perses`#1066) update:	IP – Nothing to report. Catching some performance issues with Perses and that’s opened a pandora’s box. Ideally this will be resolved by the end of next week. DD – Let’s chat in our working session today.
MH : `ProtocolSettings` taxonomy update (`gufe`#37) update
MB : `protein-ligand-benchmark` - 0.3.0 update	status of #52: are we blocked anywhere? Do we have docked structures with assay conditions, or should we proceed with what we currently have right now? need CLI tool that produces prepared target, docked ligands MB – Went threough, re-prepared structures using schrodinger-based CLI scripts. I’m reviewing now, things look pretty consistent. There’s one that I know I need to re-dock and so I’m sanity-checking that. DD – Awesome. Any other blockers that IA/IP could assist with? MB – Nope, I think I’ve got it handled. I wanted everything to be python scripts but Schrodinger software is way harder to access that way, so would CLI scripts be acceptable as a final form? DD – Yes, totally. MB – Great. I’ll push the current form of things tomorrow. IA – I can redo #58 if you ping me once this is pushed. MB – Will do. MB – Could also put assay conditions into the same PR DD – That’d be great, if you have time. MB – Do we need automated CLIs to do the docking jobs as well as the protein prep? I do the docking using Maestro - which drops an input file and a bash script to the file system - but I do the docking using the Maestro GUI. Should I upload the input file and bash script? DD – As long as we have enough info that someone with access to Schrodinger tools could reproduce this, I think we’re good. JW – Do you do any pose selection by eye, or take the top ranked pose? MB – I do docking with core constraints and take the top ranked pose. That should make it more reproducible.
Other topics	all features for openff-toolkit 0.11 release are complete; working on some finishing touches (docs, etc.); should see release by EOW (possibly EOD)
Transcripts (still need to reviewed)	F@H interface meeting - August 30 VIEW RECORDING - 45 mins (No highlights) @00:00 - David Dotson Updated some of the deadlines based on our discussions from last week. And so we don't have everyone we need to actually approve this. I'm going to kick it to next week. Any objections to that? Okay. Thank you. About board status. So this is our MVP board. So we're currently four weeks out from the October 1 deadline for ASAP and now Biopolymer benchmarking. I do believe still that we have some chance of making this deadline. Having a minimally usable system deployed. It's going to be a tight deadline. I think it's still possible. As far as top level priorities for me. 56 this is the rework of the deck system that's currently in review. We do ultimately need that for 1066. We'll talk about here further down. Swinton, I wanted to just ask you, is that something you're able to review sometime this week? @01:10 - David Swenson Yeah, I've already started on it yesterday. Today, and I'll continue through today, hopefully get it done today or tomorrow. @01:19 - David Dotson Excellent. Okay, cool. Yeah, I'll be responsive to any comments you make. I'll try to do a rapid loop on that. Okay, thank you for that. We're also working to resolve Gaffe, number 42, number 35. And I know Swanson. You're working with Ben on this? I know Richard's out. That will complete our coverage of Goofy serializable objects across all of Goofy. Do you want to give a brief update on really, just we can wait. We'll hit number 45 here. Do you have any comments, though? @02:00 - David Swenson And nothing other than to say that it's in bad hands right now. If he needs any help, I'm there to help him. But I think that he's got what he needs. @02:08 - David Dotson Okay, cool. Thank you. As for me, I'm focusing development efforts on the three deployable of components that we need for the actual fully deployed system. So we need a service API to put in an issue number for this. I'll add those later. Basically, they're represented in here. So we need an executor. That's our service API. We need a scheduler, and that is that talks to desk in this case. And then we need a result server that talks to Neo Four, J, and S three. So I'm working on all three of those components and integrated data fashion currently on MSKCC two, so we stood that up last week. That's a VM running on AWS environment. This is intended to be the dev host for MSKCC One, which is our production host. This has a pulling and homework server on it. It's already running. It's part of the fully home network. It has a Ne four J graph database deployed on it. We'll be deploying a dashboard plus distributed cluster on that as well. And then that will also host service API. So it's going to be all on one host, this whole system. So Host is up, actively developing on it right now. That is my update on where we are on the board. Any questions for me? Okay, we'll walk through some of the items that are also on here, so I want to make sure that these things are lost. Swanson and Reese, do you want to give an update on the number 45? @03:57 - David Swenson I can just sort of an overall picture of where we're headed with it real quick. Which I think what we're going to do is basically vendor in some of what is in from OpenMP file and the related functionality there extend the XML files that are used in that to include bond order which is the main thing that we are missing here. And then from that we can make an RDC molecule and serialize that. That's kind of the path we're taking to get a PDB serialized challenge here being that our protein component is really kind of everything else components, everything but the ligand and the salt. And so we really need a multi molecule serialization here, something that can go straight from the PDB which contains crystallographic waters and things like that and give us something we can serialize. So that's what we are trying to make sure we have available. @04:51 - David Dotson Okay, and you're currently working on that thing. You said you need that but you're currently work. I think that's something that's going to have to exist out of the two. Just to be clear for now. @05:02 - David Swenson Right. So what we're doing right now is putting something that goes into just goofy. We're going to dump this into goofy for now. We'd like to actually upstream this to open them. These XML files don't include bond orders, but the topology actually has space for that. The open M topology object has a field for bond order that is just not used. @05:25 - David Dotson Okay, cool. @05:26 - David Swenson And so I think that we're going to see if we can upstream these. But in terms of getting it done, we're going to put it in by pendering it. @05:35 - David Dotson Yeah, that's perfect. Yeah, I think that approach makes sense. We'll implement what we need immediately and goofy and then the things that are possible to upstream will do that. That will take a longer time scale. It's something that won't happen within the next few weeks, but we need to be able to move fast. I think you're taking the right approach. @05:55 - David Swenson Exactly a thought. @05:56 - David Dotson Yeah. Excellent. Thank you. @05:59 - David Swenson Narrow. Anything else to say about the actual implementation of that since he's working on the implementation. @06:04 - Benjamin Ries I don't know if that has anything to add. Not so much, to be honest. I'm actually doing what you described. Also assigned that. @06:25 - David Dotson Absolutely. Responsibility. Yeah. @06:30 - Jeffrey Wagner I wonder good. Will this have a pathway in the future to harmonize with open FF molecules, or have we kind of missed the vote on that? @06:38 - David Swenson We would like to. I think my understanding right now, obviously correct me if I'm wrong, but is that open up molecule is pretty strict on definition of molecule, but it is actually a topologically connected thing where what we need right now is a component contained in multiple molecules which I think can fit within an open FF topology. Object, but we just don't really have the path to get everything we need in there quickly. I think there's still code because we can't do a towel from PDB that doesn't seem to exist in a generic way right now. But I think we're absolutely open to having a way to move this into an open effect. We want that for sure, but we need in the short term something that can work with openm. So just the thing that looks like openm will do that. @07:39 - Benjamin Ries I mean, it's also part of the generalization class where you can back propagate from Goofy to open FF. @07:48 - David Swenson Right. They're we would want to have a way to have a component output as an open FF object. What's a little bit weird here is that we have previously been doing these as openf molecules. So our ligand, for example, outputs as an open f molecule, which makes sense because that is the correct thing to do. But we have other components here that might not be able to don't fit into the model of a molecule. They fit into the model of a topology. So it's a little bit weird, and we have to figure out how to make our API makes sense, because it's weird if two openss gives you different things depending on which component you get. @08:35 - Jeffrey Wagner Okay. Yeah. I feel like there's a combination of actual technical limitations, temporary technical limitations, and miscommunications here. But in terms of getting stuff result in the next month, I'm trying to figure out the best way to coordinate this. Okay. I think it would be nice if this was designed such that we could have another go at it in the future. @09:12 - David Swenson Yes. I don't think that we are necessarily saying that our serialization format here is going to be the permanent serial format we use. We would much rather be using open FS tools under the hood, but we need the easy PDB onboarding, and we need the serialization to work and exist. That's what we're trying to do in the short term, is to get something that works in the future. @09:42 - Jeffrey Wagner Okay. Yeah. And so your representation is going to be serialized RDK molecules? @09:48 - David Swenson Yes, I believe that's the case, Ken. @09:56 - Jeffrey Wagner Decent molecules be more than one chemical. More than one mutually found species. I was under that things then you're more the RDK expert than I am, so you might be the person to answer that. I don't have a ton of RDK experience. Nice. Okay. Yeah, I think it may be. Yeah. You're in a tough spot. I'm sorry. If we had the biopolymerdale done in January like it was supposed to be, this would be a lot easier. And so I think this is kind of on us again. @10:36 - David Swenson We're just seeing a workaround for the time being. As long as we have the workaround for the time being, we're fine. We just need to get something so that we can move forward. I think that a cleaner solution in the future would be great. We're just trying to get something short term. @10:51 - Jeffrey Wagner Got it. Okay. I don't know. We may have some of the functionality you need. There isn't a single topology from PDB, but no, let's not take half measures. Yeah. Okay. It sounds like you're doing what is optimal for you in the next few months once we get our head back on. @11:20 - David Swenson I would love to help you replace that something from open FF, and I apologize if that's not ready yet. No worries. That would be great. @11:44 - David Dotson Well, thank you. Anymore other questions? Ben or David? David. And Ben, do you feel like you have what you need from this group? All right. Thank you. @12:03 - Jeffrey Wagner Actually, what's your deadline for this? Like a month ago or something? @12:09 - David Dotson Pretty much, yeah. @12:13 - David Swenson It is a thing that blocks all future things on sterilization, because we've serialized everything sort of below that in the hierarchy, and we've got the tools in place to serialize everything above that in the hierarchy. So we just need this one bit of serialization, and we should be able to move forward. With our storage stuff completely. @12:33 - David Dotson Unless all you want to do is hydration free energies. Yeah, but if we want to do any sort of protein protein ligand free energy calculations, then we need this component. @12:54 - Jeffrey Wagner Yeah. So sorry. I'll just make one last push at this. No, never mind. Never mind. Okay. Yeah. Get your job done and then we can talk later. Cool. @13:13 - David Dotson Okay. Yeah, but the deadline for this is basically as soon as possible. Probably by the end of the week, if possible, having at least something that round trips and works. Does that sound reasonable? @13:32 - David Swenson What's your thoughts? Since you're the one doing the grunt work here? I can certainly help if it gets stuck on that. I think. That is not a crazy timeline. @13:42 - Benjamin Ries End of the week. Yes, I'm going to try that, but guess the bond fixing, which I still need to implement with the bond orders, that might probably take some more time. Do you want to have. The precise deadline, or do you want to have something like end of the week or next week? Sorry. My first time on fixing deadlines at falling at all. @14:20 - David Dotson I'd like to say by the end of of the week, if we can. @14:23 - Benjamin Ries I see. @14:26 - David Dotson So what this means is I'm currently doing testing cycles on my end for storage retrieval into, like, neo four J of these objects. And so I need to be able to do that with proteins in order to build networks that are protein for energy networks. And I need to be able to do that in order to do calculations with actually protein ligand systems. Right. So until we have this component, it's really hard to do a whole lot of anything. @14:55 - Benjamin Ries Right. Okay. @14:56 - David Dotson And so it's a huge blocker. That's why as soon as possible is the word at the end of this week. It would be absolutely great. If there are difficulties, please let us know. So reach out to me, reach out to David as soon as they show up and then we can and we can even jump on a call or a working session. I'm happy to do it whenever time. @15:20 - Benjamin Ries Yeah, then agree on end of the week Friday morning. @15:26 - David Swenson I can definitely do a lot to help with the actual stuff in the PDF file things because I've worked at that code before, I know my way around that very well. What I really don't want to try and do is figure out all the bond orders for everything that needs to go into the XML file and hoping I can count on you to do the front work of that. Okay. @15:50 - Jeffrey Wagner And I might add on that front, depending on your workflow, there's actually two really important things. That you need. One of them is bond orders and the other one is formal charge. And I think in some cases it's easy to deduce formal charge from bond orders. Like if you have the graph you can do it. But then there's weird cases I think in the long run it's not going to work for much more chemistry than just proteins. There's weird cases where maybe sulfur and phosphorus can have like valences or like formal charges are one off from each other and so you can count the number of bonds it has but then it's still ambiguous what the promo charges. So it may be worth looking into whether the open XML file can also, if it cannot, just hold bond orders, but if it can also hold formal charges that would save you a lot of headache in the long run. @16:51 - David Swenson Yeah, I believe that is something that can I have to double check this, but I'm pretty sure that's something that can be added at least. The nice thing about orders is that there's literally. A thing in a topology that's just not used. It's like it's designed to have bond orders and nobody ever put them in it's. @17:07 - Jeffrey Wagner Got this I remember looking at this a few years ago, Codara, before he was super chem informaticsy asked Peter to put in bond orders. And Peter is like, why do I want bond orders? And John's like, no, it's really important for open FF. And then they got it in and we looked and we were like, what was the formal charge? And so I think that's the story of why there is a completely unused bought order field there. @17:35 - Benjamin Ries How would you add formal charge to okay, you would need to place it on an atom. @17:43 - David Swenson Right. There are atom entities in the XML as well to define your residential because you need to know the out of names. @17:52 - Benjamin Ries I think they are right. I only see one right now, but maybe there's something I'm missing. @17:58 - David Swenson I think there's a different XML file. Well, I think that may be the case. @18:01 - Benjamin Ries That could be. @18:03 - David Swenson The residues XML file includes a bunch of stuff. Okay, but this is a detail, I think, for protein. @18:11 - Benjamin Ries We can very safely calculate that from bonds if necessary. And I'm not going to promise that any cofactor is correct with that. @18:22 - David Swenson Of course, I think we can say as a first implementation, if we find that their edge cases where this doesn't work, we can try and get something to work on that in the future. @18:33 - Jeffrey Wagner Okay. Yeah. Ben, I may offer you help later this week, depending on how my schedule goes. And you can feel free to not take that because I don't want to make your life harder. Let's see in which direction it goes. Okay, sounds good. @18:55 - David Dotson Hey Ben. Thank you. Just remember, you've got resources here, so don't hesitate to reach out. @19:01 - Benjamin Ries Okay. @19:03 - David Dotson I did want to ask, what is some of the assumptions for the protein ligand chemical systems? So, in our model, we have a chemical networks. They're composed of chemical systems joined by transformations. So for a chemical system that has a protein small molecule component and a solvent, what are some things that either a user or a system that uses these components? What are some of the assumptions that the say protein component makes about what's already been done to the protein? Does it just need to be capped? Does it need to be protonated? Do we need to make sure that the protein and ligand coordinates are such that they're in the docking post that you want? @19:51 - Irfan Alibay So anything that's in the protein Lincoln benchmark is the assumption you can make. So it needs to be pertain to tapped if needed, like essentially rate. I don't think would be offering a docking pipeline at this point. @20:05 - David Dotson Yes, I think that's reasonable. This was a question that John specifically had, and this is also relevant for yanka your work as to what what needs to exist outside of the system, the things that the system won't do for you. Right. So these are things that we'll need to make sure we're doing, for example, in ASAPS use of It junka. So your phone said anything that's in Protein League and benchmark. So I don't know if you're familiar with this repo. @20:41 - Jenke Scheen Yeah, I'm very familiar. I think so far, even in the best practice paper, I should be able to find more or less the steps needed as well. @20:51 - David Dotson Okay, so we are currently a bit stuck on us, I was hoping to add. Melissa, join us. I reached out. I haven't heard back from her. So we're currently stuck on number 52 protein ligand benchmark. This is a redocking and reprehension of all of the targets and all of the ligands for each target. But the form that this is taking is in terms of what's present among the PDB files, what assumptions are made about the ligands and relative coordinates. These are all things that we want to emulate pretty closely in ASAP use of the system. So I would say best resource I can point you to here is to study that repo. Sounds like you're already familiar with it. So that should be A small ask, but study that PR in particular, if you can, and that would give you Some insights. As to what kind of things do we need to do on the ASAP side to utilize the file. Can be deployed system. @21:54 - Jenke Scheen Yeah, I will. Okay. To be honest, brexit preparation Is one of the main vein. My PhD left for you. @22:07 - David Dotson Okay, good. @22:07 - Jenke Scheen I think you said. @22:08 - David Dotson All right. Got you. @22:10 - Jenke Scheen Good. @22:10 - David Dotson I'm glad to hear that. Fantastic. @22:15 - Jenke Scheen Any additional questions on this topic, then? @22:18 - Irfan Alibay Just a couple I wanted to make. There were follow up discussions Nelson and now are having with docking preparation. So just, for example, doing a simple MCs restrained docking can be problematic in certain cases because you end up with if you have just one call and you're trying to mainly align several leggings, you kind of have to do, like a network basis. Yes. Something we're not doing here. There's essentially work to be done and so making sure you get good minimum alignment for your dogs. Something that's not currently done in protealing a benchmark. Also for the sake of because I don't know if Melissa is away or when, but also. Getting everything prepped on time for your deadline might be a bit difficult. We have Soft pseudo vendor and some of the systems on the open Fe benchmarks as a temporary solution which will be posting results for soonish. So the stuff that we run our BFFs for the ECS, I think I've also just added some ben and I have just been adding some random results there anyway, but I'll hopefully have something there, so as an initial test, that could be a good place to start. @23:42 - David Dotson Well, thank you for that. Yanka. Yeah, so I put the link in the notes, these will show up once we publish to the open fee benchmarks recall. So drop it into the chat. That might be worth checking as well because that moves a bit faster. Since the only stakeholder there is open fee, so it's none of their internal benchmarking. @24:06 - Jenke Scheen Okay. @24:09 - David Dotson I'd say that our work in printing benchmark will trail out a bit, but you're on. Would you be willing would you be willing to operate as a resource for Yanka going forward? @24:22 - Irfan Alibay Yeah, sure. Yeah, definitely. Happy to. @24:26 - David Dotson Okay, excellent. @24:29 - Jenke Scheen I just have a question. Just to my understanding. I remember Richard a while back talking about, at least for a fee, that he wants to do PDB. He wants to do MMC for protein. I wonder if if that's still the case. Is that something that you're still talking about? @24:51 - Irfan Alibay I can't speak for Richard. Maybe Swanson can on that, but I think there is a plan. Maybe, but I don't think it's convenient. @25:00 - Jenke Scheen Got you. @25:01 - David Swenson I guess officially today, I am speaking for Richard, but I didn't want that responsibility. But no, I think that we would love it if people move to better format. But let's fact is, we will always need to be able to support PDB. So whether we might put benchmark in another format as a way to sort of say, we recommend you do this is, I suppose, a possibility. But yeah, we have to support PDB for the foreseeable future. @25:36 - Jenke Scheen Okay. @25:37 - Jeffrey Wagner Yeah. From open FF side, we're looking at we're kind of in the same boat. In a perfect world, we would only support MMC. But in reality, all the data in existence right now, like all the good data sets, are in PDB, and so we've got our PDB support first, and then we'll do MMC in a future release. @26:02 - Jenke Scheen In a decade. We live in a utopia with just MMC. @26:07 - David Swenson I'm pretty sure they were saying that a decade ago. @26:11 - Jeffrey Wagner Everybody gets an MMC in a postscarcity world. @26:18 - David Dotson Excellent. Jenka, do you have a way of contacting your phone? I'm on both the slacks, so if you're on either the open FF hold on. Fe, you are on the open fee. Slack young. @26:40 - Jenke Scheen Yes. @26:41 - David Dotson Oh, perfect. Okay, then you can contact your fund directly to that. Yeah, I know. We're spread out over a few different slack teams. Right. So I know you're on the cook lab, you're on the ASAP, but you're also open a fee, so that should be sufficient. @26:56 - Jenke Scheen You should reach in through open a fee. @26:59 - David Dotson Okay, cool. @27:02 - Iván Pulido I want to say that I have the task to reach out to Melissa and help her with some of the issues, to help her finishing what she's been doing with this PR. I'm just worried that we might be doing some redundant work here. So yeah, just let me know. What should we be doing from Clara? Labs are especially Melissa, and I'll try to help her with that. @27:37 - Irfan Alibay So this is becoming, unfortunately, in benchmark discussion. @27:41 - David Dotson I think the main thing that was left melissa just reached out to me. She may jump on in the next few seconds, but hold on. @27:52 - Irfan Alibay All right, let us speak. @27:55 - David Dotson We summoned her just by speaking her name. I'm. She's typing. Okay, Yvonne. She'll hop on shortly. Did you still want to say yes? @28:11 - Iván Pulido I know she may be better that she that she sells us what she's been working on. That's fine. @28:25 - David Dotson Any other comments on purchase serialization? Thank you. She joins us. I'd like to make use of her time here. Yvonne, do you want to go ahead and start? Give an update on the noncombative cycling protocol in Percy's? @28:58 - Iván Pulido Sure. Well, basically the update that there is no update. We've been catching some performance issues with purses, and this open like a Pandora box, but we are hopeful we can finish that tomorrow. So I'll be working, and after that, I'll be working directly with this PR, and hopefully it will be ready for next week. By the end of next week, if all goes well. @29:31 - David Dotson Okay, excellent. And we have our working session today still. Does that work for you? @29:38 - Iván Pulido Yeah, I don't think I have anything to discuss in that session because I'm still figuring out these things, and I'll let you know when I hit a blocker. @29:49 - David Dotson So, yeah, we don't need to meet today unless you feel like we do. I'm still happy to jump on things. Okay? Yeah. Cool. Any questions for Mike's? Not with us. So let's get this update on protocol settings. I don't see Melissa yet. Gift a few minutes here. I'll turn the finger again. Does anyone have any topics they'd like to hit? @30:40 - Jeffrey Wagner I can give a brief update, which is we more or less all the functionality for the new toolkit release. The Eleven release is is in. And we're just fixing up the documentation and the examples now. Yeah, more or less. The roof. Is approved. And once we get a few like last technical stuff done, we'll be cutting the release. So you should see Eleven any time now for the open FF toolkit. @31:26 - David Dotson Did I catch it right? By end of week? @31:28 - Jeffrey Wagner Probably, yeah. Possibly even by the end of today. @31:34 - David Dotson Okay, awesome. @31:35 - Jeffrey Wagner Thank you. @31:45 - David Dotson Let's see. She's in the web now. @32:00 - Jeffrey Wagner Oh, and actually, I have more tips for Ben. I know this isn't I suppose it's unstructured time. Don't use bond orders 1.5. @32:10 - Benjamin Ries You're going to be tempted to use bond orders of 1.5, but don't do it, okay? No resident structures or something in between. It's all clearly defined. That's what you mean, right? Okay. @32:27 - Jeffrey Wagner Yeah. We'll be safe with stuff like Tyrosine? Like, every other model in the world agrees that Tyrosine is aromatic. But different aerometicity models disagree about like tryptophan. @32:42 - Benjamin Ries It will be a biochemical review so there's no delocalized electrons for this first implementation. @32:51 - Jeffrey Wagner Excellent. @32:53 - Benjamin Ries Very good. @32:55 - David Dotson Hi, Melissa. @32:57 - Jeffrey Wagner Hey. @32:58 - Melissa Boby Sorry, I didn't. Know, this call was I had an agenda item here, so I can give you a quick update since you were just waiting on me for the issue. 52 so I went through and had repaired all of the structures using the command line of schrodinger so that it's replicatable for anybody. So I can just pass over the command lines that were like the command line input that was used. Rather than going to the GUI, I was just going through and making sure that they were all consistent. And they do seem to be. So far there was just one ligand set that I had switched. I guess I find it. Sorry. I've got a million windows open, but there was one that I needed to redox. So those have been. Redoct, and I'm just making sure that they look same. And then I will push the re prepared structures to the repo. @34:09 - David Dotson Fantastic. Thank you. Thank you again for producing the CLI version of your your workflow. That's Critical. So thank you for that. And then are there any other blockers from your perspective or anything else that we need to address you'd like to use either? @34:32 - Melissa Boby I Don't Think So. I've been Trying To script this out so that I could actually have it just as A full Python script for you guys, but Schrodinger does not make that Very easy. So I just want to make sure that it's all right to just pass over the command line, input it, and have it just be that instead of having a full script, because I worked on that for, like, a week or so and it just kept breaking things. @34:58 - David Dotson I Understand. Yeah. I think that's already better than what we currently have, which we don't have any. I think we have a paragraph, so I think I'll take that. @35:08 - Melissa Boby That's perfectly fine. Sure. Sounds good, then. @35:12 - David Dotson Thanks for all your efforts. @35:14 - Melissa Boby I don't mean to to diminish that or minimize that. No worries. So, yeah, I think probably tomorrow I can push the rest of it up. @35:25 - David Dotson Okay, excellent. @35:30 - Irfan Alibay If you can just ping me, Melissa, then I'll be on hold. But I might have time if you think we'll try and redo 58 so that we do all the weird showing of fixes that need to. @35:42 - Melissa Boby Absolutely. @35:46 - David Dotson Well, thank you. And I think then what I can do is I just wanted to roll through. We got a little bit more time. Thanks again for joining us, Do. Checking the three milestone, because we also had assay specific information. That's something that I can do as a separate PR, and I think I can just do that myself, because you've already given us a great spreadsheet that has all of that information. Is that pretty up to date still in this link, as I know it is still up to date. Okay. @36:24 - Melissa Boby Yeah. And do you want in the meantime, is it worth just preparing the systems with the specific conditions, or do you want me to hold off on that? @36:34 - David Dotson If you want to just roll those into the PR, you can too. I'm perfectly fine with that. I think if that's something that's easy for you, or is that what you're suggesting or misunderstanding. @36:47 - Melissa Boby Yeah, that's what I'm suggesting. I can pull that into the PR if you'd like. @36:54 - David Dotson Yeah, please do. I'm just saying don't feel like you have to, but if you want to, I'll take it. Certainly reduces the burden here. @37:02 - Melissa Boby I could do it at my leisure, but yeah, I could do that in between projects. Okay, cool. Thank you. @37:12 - David Dotson Any questions you have for us? @37:17 - Melissa Boby Nope. I think I'm good. @37:20 - David Dotson Okay. Any other questions for Melissa? Thank you for all your work on this, Melissa. I know that this is, like, perhaps a bit of a distraction compared to your other projects, so I do appreciate all your effort on this. @37:40 - Melissa Boby No worries. @37:41 - Jeffrey Wagner And I totally feel you on the schrodinger automation thing. I, too, tried to automate it in Python, and I just went out there with something that is basically Python calling a bash script. And it was so dumb. That was exactly what I ended up with. @37:55 - Melissa Boby And I was like, I can't this is stupid. @37:58 - David Dotson And there's definitely a way to do it. @38:00 - Melissa Boby Supposedly they have, like, a script guru who doesn't actually listen. I don't know. @38:06 - Jeffrey Wagner It's so bad. Yeah, no, I think CLI tools are great. CLI tools are probably even better than Python tools for interfacing with the Schrodinger suite. @38:14 - Melissa Boby Definitely. Although I will say honestly, when it comes to CLI stuff, I feel comfortable with this because I've manually checked through these, but I like using the GUI to do the protein prep if I'm not doing batch prep. Because it makes a lot more sense to go through and manually check some of these issues that crop up. So just a heads up on that. I've curated these, but going forward, as you use these CLI tools, want to have an intermediary check? @38:47 - David Dotson Yeah, I know. Something we'd like to do. And I think we have an issue for you. Benchmark is basically validators. Like CI validators for new entries. I'm. Probably using things like anti analysis or other tools to check the assumptions given structures, given new targets, given new ligands. Do all suite of checks on whether the poses are reasonable, whether there's flashes, things like that. So any weird stuff that might come out because I agree with you. Using an automated tool, there's no guarantee that what you're going to get is valuable without checking. @39:31 - Melissa Boby Yeah. Oh, actually, that reminds me. On the automated CLI interface kind of thing, do we need that for the docked structures as well, or is it okay that we just provide those? I mean, I have the input files, but yeah. It actually is a gigantic pain in the butt to set up docking jobs with schrodinger through the CLI. It is way easier using micro draw. Which produces input files that are applicable. @40:03 - David Dotson Yeah. So let me just try to understand. Go ahead. @40:11 - Melissa Boby So whenever I do the talking, I set it up using maestro because it's easier, honestly and more consistent. It does produce an input file that codifies everything that is written out there. So it's consistent. I don't do it through the command line because it's pain, and it's way easier to do it in my stroke and faster. So do you want me to just upload, basically the input file that I generate whenever is docking, or do you need CLI for docking as well? Or do we not need any of that? Can we just put the ligands the CLI tool doesn't. @41:00 - David Dotson Doesn't tell straight tools to do docking. What does it tell you? @41:09 - Melissa Boby The way you would do the docking of the CLI is through basically a chain of input arguments, right? Whenever you set it up in my stroke, it writes all of those input arguments out for you and then initiates the run. So I've just been doing it that way. All the input arguments are the same. I'm just not launching it through the CLI because it is often more consistent and the arguments are written more consistently and more accurately with doing it through Myestro. @41:43 - David Dotson So Maestro, just to clarify, try to restate. Mystroge generates an input file that you would then and consume using your CLI tool. And then that would do. @41:54 - Melissa Boby Maestro generates an input file and a bash script that initiates the run and also packages that varies. Tidally the grid and the ligands to be docked, put them all in one folder in a directory, and then you could just run that. You execute the shell script, and it runs everything that works. Much more tidally on my cluster and on my computer than it than manually writing out the CLI. And that's, in fact, the way they recommend doing it through shortener. So irfan just pinged in the chat that he'd be happy with just the input file, which gives all of the information for putting it in the CLI if you were so inclined. So if that works, I'll just pass over the input files as well. @42:48 - David Dotson Okay, yeah, if that's sufficient information. My objective is to capture enough as long as we have enough information, that someone with access to the charge. Could reproduce what you've done. Yeah, that should get you there. Okay. As long as we're not missing A key piece. That's what I was afraid of. No, you are not missing A key piece. @43:11 - Melissa Boby Am I saying this is probably the more complete way to do it? Because it packages is everything up into one tidy little directory. @43:17 - David Dotson Okay. I'm comfortable with that, then. Thanks, sir, as well, for the clarification. @43:23 - Jeffrey Wagner Melissa, out of curiosity, do you always take just, like, the top ranked post, and that's the starting one? @43:32 - Melissa Boby Or is there, like, further filtering by eye? That happens. So, with this, I always take the top ranked post because I'm using a cork and tight for the docking based off of the cognite ligand that's in the crystal structure. And it basically only, like I can ask it to put out more poses, but they all more or less look the same when I've done that as a sanity check. So I've just taken the top pose. @43:56 - Jeffrey Wagner Okay, perfect. Yeah, I think that's good for reproducibility.

Meetings

2022-08-30 Protein-ligand benchmarks meeting notes

Participants

Goals

Discussion topics

Action items

Decisions