Legal Innovation Spotlight: Justin McCallon

Justin McCallon

In this episode, Ted sits down with Justin McCallon, CEO & Founder at Callidus Legal AI, to discuss how AI is transforming legal workflows and the challenges law firms face in AI adoption. From understanding model selection to building trust in AI-driven legal tools, Justin shares his expertise in leveraging AI to enhance efficiency while maintaining accuracy and compliance. With AI rapidly evolving, law firms must navigate the balance between innovation and tradition, making this conversation essential for legal professionals looking to integrate AI strategically.

In this episode, Justin shares insights on how to:

Select the right AI models for legal applications
Address concerns around AI accuracy and hallucinations
Build trust in AI through transparency and audit trails
Identify high-value AI use cases that save time and resources
Implement AI while maintaining traditional billing models

Key takeaways:

AI can significantly enhance legal workflows but requires strategic implementation
Model selection is critical for accuracy and efficiency in legal tasks
Trust in AI is built through transparency, security, and proper oversight
Law firms should start small, iterate, and develop AI policies for responsible adoption
AI is a tool to support, not replace, legal professionals

About Justin McCallon

Justin McCallon is the CEO & Founder of Callidus Legal AI, bringing his expertise as a former attorney specializing in M&A and commercial bankruptcy. With a deep understanding of both law and technology, he co-led the transformation of AT&T’s legal department, focusing on optimizing workflows and leveraging AI for efficiency. Now, he helps law firms strategically integrate AI to enhance legal practice while maintaining accuracy and compliance.

What will happen is that the market will all move toward AI. Then if you’re the one laggard who’s not using it at all, it’s going to be pretty obvious. Groups are going to know about that and they’re not going to use you, because you produce less legal work than the alternative groups.

Connect with Justin:

Callidus Website: https://callidusai.com/
Justin’s LinkedIn:https://www.linkedin.com/in/justin-mccallon-19952b2a/

Subscribe for Updates

Machine Generated Episode Transcript

1 00:00:03,487 --> 00:00:05,224 Justin, how are you this morning? 2 00:00:05,400 --> 00:00:06,792 Doing well, good morning. 3 00:00:06,803 --> 00:00:11,272 Yeah, I appreciate you jumping on here with me for a few minutes. 4 00:00:11,478 --> 00:00:12,565 Absolutely. 5 00:00:12,981 --> 00:00:15,581 So you and I connected at TLTF. 6 00:00:15,581 --> 00:00:28,261 I think we were having lunch and we were talking about AI and reasoning and you were sitting next to me and chimed in and had some really good thoughts on that topic. 7 00:00:28,261 --> 00:00:38,561 And I think we've, we've covered that on previous episodes, but, um, you and I had another conversation and I thought you had some really good insights to share. 8 00:00:38,561 --> 00:00:41,469 It sounds like you, you dive pretty deep. 9 00:00:41,469 --> 00:00:46,272 which it's always good to hear perspective from folks that really, really jump in. 10 00:00:46,272 --> 00:00:51,314 But before we jump into the content here, let's just get you introduced. 11 00:00:51,314 --> 00:00:55,736 So you currently lead, is it Calidis Legal AI? 12 00:00:56,677 --> 00:00:57,338 Okay. 13 00:00:57,338 --> 00:01:11,475 And your focus on pairing AI with lawyers to enhance core legal work, you're deep in AI and ML, your former practicing attorney doing &A work. 14 00:01:11,549 --> 00:01:14,301 which I think is interesting. 15 00:01:14,301 --> 00:01:17,542 Tell us more about your background and what you're doing today. 16 00:01:17,976 --> 00:01:24,569 Yeah, started off practicing &A, corporate restructuring, ended up working for AT &T after that for a while. 17 00:01:24,569 --> 00:01:29,791 And I led the AT &T legal transformation with our deputy GC. 18 00:01:29,791 --> 00:01:33,412 It was really successful and very interesting for me. 19 00:01:33,552 --> 00:01:38,224 The group looked at just understanding what are our attorneys doing? 20 00:01:38,224 --> 00:01:40,675 What are they doing that's not the highest priority? 21 00:01:40,675 --> 00:01:42,036 How can they reprioritize? 22 00:01:42,036 --> 00:01:48,012 How can we think about bringing in-house some work that we're using outside counsel for that we can 23 00:01:48,012 --> 00:01:52,574 add more efficiency to the internal resources and have them do more of the work internally? 24 00:01:52,574 --> 00:01:59,807 Where are we missing key insights and adding liability and where are we over-focused where we shouldn't be? 25 00:01:59,807 --> 00:02:04,849 And how can we just rethink some of the workflows to where we're operating more effectively? 26 00:02:04,849 --> 00:02:05,989 So I did that for a bit. 27 00:02:05,989 --> 00:02:11,111 And then my next gig was running a data science and engineering org. 28 00:02:11,111 --> 00:02:12,541 And we launched the first Gen. 29 00:02:12,541 --> 00:02:16,713 AI product for the company, the subsidiary, Direct TV. 30 00:02:16,770 --> 00:02:18,031 which was pretty informative. 31 00:02:18,031 --> 00:02:20,092 It was right after CHATGBT came out. 32 00:02:20,092 --> 00:02:30,759 And to me, the reason I started my startup was it was so obvious if you paired the legal transformation work with the GEN.AI work, there was going to be a big opportunity. 33 00:02:30,759 --> 00:02:36,682 And I didn't think it was going to be something from day one that CHATGBT could just replace lawyers jobs or anything like that. 34 00:02:36,682 --> 00:02:43,146 But I thought over time, this looked like a great starting block for something that could be really powerful. 35 00:02:43,476 --> 00:02:44,436 Interesting. 36 00:02:44,436 --> 00:02:44,656 Yeah. 37 00:02:44,656 --> 00:02:56,896 And I remember you and I had some subsequent, we had some subsequent dialogue on LinkedIn talking about, we were talking about the Stanford paper and how, um, and also the Apple 38 00:02:56,896 --> 00:03:01,156 intelligence paper on the GSM eight K battery of test. 39 00:03:01,156 --> 00:03:06,736 think they call it GSM eight K adaptive, which, um, GSM is grade school math. 40 00:03:06,736 --> 00:03:10,188 And then there were 8,000 questions that 41 00:03:10,260 --> 00:03:13,980 were used to evaluate how well AI performed. 42 00:03:14,720 --> 00:03:28,320 And Apple Intelligence did a study on that and changed the adaptive part is where they changed minor details about the questions and to see how the models would perform. 43 00:03:28,460 --> 00:03:37,080 And they degraded quite a bit anywhere from, I think, at the low end, the degradation was like 30%. 44 00:03:37,080 --> 00:03:38,476 So at the time, 45 00:03:38,579 --> 00:03:39,789 Maybe that was one. 46 00:03:39,789 --> 00:03:53,044 I forget exactly what model on the open AI side was the latest and greatest, um, all the way down to like a 70 % degradation just by inserting irrelevant facts about the questions 47 00:03:53,044 --> 00:03:54,375 or changing the names. 48 00:03:54,375 --> 00:04:04,939 Um, that's, and as you pointed out, that has since been resolved and you know, which makes me wonder, all right, did they do that? 49 00:04:05,019 --> 00:04:07,240 Did they game the system at all? 50 00:04:07,240 --> 00:04:08,370 Like, Hey, we've got a, 51 00:04:08,370 --> 00:04:19,675 we've got a weakness here, let's apply a band aid or was there a fundamental adaptation that they implemented that helped? 52 00:04:19,675 --> 00:04:33,231 But I think you had, when you ran those same questions through wherever we were at that point, maybe it was 4.0, you had different output, like it answered successfully. 53 00:04:33,231 --> 00:04:34,982 Am I remembering that correctly? 54 00:04:35,304 --> 00:04:37,215 Yeah, I think pretty close. 55 00:04:37,215 --> 00:04:43,398 I think that the Apple paper, I could be wrong, but I thought it was they used a battery of models. 56 00:04:43,398 --> 00:04:48,360 The only one that was somewhat advanced was GPD for the original GPD for. 57 00:04:48,360 --> 00:04:55,283 And now we have quite a lot better models with three mini and GPD four point five. 58 00:04:55,283 --> 00:05:02,478 And if you look at like the benchmarks for what the benchmark I like the most is live bench, which they. 59 00:05:02,478 --> 00:05:09,838 hide the questions, you can't really game the system, they change the questions regularly, and they do a full battery of tests. 60 00:05:09,998 --> 00:05:15,478 GVD-4 scored about a 45, and the best models now score about a 76. 61 00:05:15,478 --> 00:05:19,278 So they've come a long way in those benchmark tests. 62 00:05:19,278 --> 00:05:30,018 And when you use the top models now to do the same questions that Apple had, and continue to variable different pieces and add irrelevant information so that you're sure that it 63 00:05:30,018 --> 00:05:32,418 wasn't trained on any of that information. 64 00:05:32,610 --> 00:05:34,632 they're answering every question correctly. 65 00:05:34,632 --> 00:05:42,897 And so I had sent over a handful of examples yesterday just to kind of prove my point empirically that this is testable, this is falsifiable. 66 00:05:42,977 --> 00:05:47,640 You can run the test yourself and see, no, the AI actually is able to solve these things. 67 00:05:47,640 --> 00:05:57,787 And as far as how they did it, I'm not sure all of the specifics, but I think a lot of it is on the post-training side where they're teaching it to, after they've completed the 68 00:05:57,787 --> 00:06:02,242 pre-training, they're teaching the model how to be more effective with the information it does have. 69 00:06:02,242 --> 00:06:04,427 And then the reasoners are very good. 70 00:06:04,427 --> 00:06:11,540 anything that they're able to do to add this reasoning capability is definitely enhancing the answers. 71 00:06:11,540 --> 00:06:12,080 Yeah. 72 00:06:12,080 --> 00:06:17,180 And there's, there's so, there's, there's so much movement in the space. 73 00:06:17,180 --> 00:06:20,760 can't even keep up and I use it all the time. 74 00:06:20,760 --> 00:06:28,500 Like, I don't know, five, seven, 10 times a day, but you know, you've got grok three, you've got Claude 3.7. 75 00:06:28,500 --> 00:06:38,040 You've now got, um, Oh three mini, uh, 4.5 apparently five or five GPT five is on the way. 76 00:06:38,040 --> 00:06:40,532 Um, you know, there's deep seek. 77 00:06:40,532 --> 00:06:44,155 There's whatever Alibaba's model is. 78 00:06:44,155 --> 00:06:55,445 mean, there's Mistral, there's Llama, like it's impossible to keep up unless you're doing this full time, which, you know, I'm not. 79 00:06:55,846 --> 00:07:03,893 so I looked at some of the tests that you threw at 03mini and I thought it did really well. 80 00:07:03,893 --> 00:07:08,040 didn't, I just kind of breeze through it, but why don't you kind of tell us some of the... 81 00:07:08,040 --> 00:07:11,002 some of the tests you threw at it and how it performed. 82 00:07:11,470 --> 00:07:12,070 Yeah, yeah. 83 00:07:12,070 --> 00:07:22,410 What I was trying to do is get a sense of how strong is the model for the types of things that people are challenging to say AI is just not able to do these fairly simple tasks. 84 00:07:22,410 --> 00:07:35,110 And so I ran through a handful of examples, one being let's find a case that was not in the training set and let's go have it find the case text online and then give us a full 85 00:07:35,110 --> 00:07:39,758 summary of like the holding and the material facts and so forth and give us legal analysis. 86 00:07:39,758 --> 00:07:43,840 I think that was one people were concerned AI is just not capable of doing that. 87 00:07:44,201 --> 00:07:45,298 I read it. 88 00:07:45,298 --> 00:07:48,763 I thought it did a fantastic job summarizing the case. 89 00:07:48,963 --> 00:07:58,113 I gave it some questions like solve complex numerical problems that also deal with linguistics that are hard to even understand the question being asked. 90 00:07:58,113 --> 00:07:59,329 It did well there. 91 00:07:59,329 --> 00:08:01,518 It does well in constrained poetry. 92 00:08:01,518 --> 00:08:07,746 It did well on just I was surprised it did well on world model questions where I basically had it. 93 00:08:07,746 --> 00:08:16,920 run a scenario where I'm like dumping marbles out of a container and then putting super glue in and then moving them around the house and seeing where they end up and what 94 00:08:16,920 --> 00:08:18,030 walking through the steps. 95 00:08:18,030 --> 00:08:26,134 And it did pretty well on pretty much all of those things to where my point of view is between that and GBD 4.5. 96 00:08:26,134 --> 00:08:36,318 Now you pretty much have something that can reason like a smart human can reason and it can help in a pretty wide variety of ways from a chat window. 97 00:08:36,318 --> 00:08:37,622 There's still some 98 00:08:37,622 --> 00:08:45,566 issues where these tools don't have full capabilities that a human would have outside of the chat window where we can pull additional resources. 99 00:08:45,566 --> 00:08:52,904 But if you're resource constrained and you're just talking to somebody intelligent, it's going to be pretty similar to what these models can do now. 100 00:08:52,904 --> 00:08:53,664 Yeah. 101 00:08:53,664 --> 00:08:59,427 And you and I kicked around the Stanford paper too, which at this point is almost a year old. 102 00:08:59,427 --> 00:09:01,728 It's actually over a year old from their first iteration. 103 00:09:01,728 --> 00:09:06,990 They did a subsequent adjustment and re-release in, I think May of last year. 104 00:09:06,990 --> 00:09:16,324 But some of the challenges that the Stanford paper highlighted was, you know, they categorized too many things as hallucinations in my opinion. 105 00:09:16,324 --> 00:09:20,786 But I think overall, I got a lot of insight from reading the paper. 106 00:09:20,840 --> 00:09:29,093 but that AI misunderstands holdings, it has trouble distinguishing between legal actors, it has difficulty respecting the order of authority. 107 00:09:29,093 --> 00:09:37,264 Did you, it fabricates, did you, do you feel like these specific issues have gotten better? 108 00:09:39,142 --> 00:09:40,623 They've gotten better. 109 00:09:40,843 --> 00:09:53,288 There's tests on hallucination rates for the different models, and the reasoners are about half the hallucination rates of GPT-4, and in GPT-4.5 is also about half the hallucination 110 00:09:53,288 --> 00:09:54,888 rate of GPT-4. 111 00:09:54,949 --> 00:09:58,710 That said, hallucinations are still an issue for these models. 112 00:09:58,790 --> 00:10:03,552 Legal tech companies can solve those issues, and this is where domain-specific software comes in. 113 00:10:03,552 --> 00:10:06,924 There's different algorithms you can run to help there. 114 00:10:06,924 --> 00:10:08,128 For example, 115 00:10:08,128 --> 00:10:13,399 Let's say that you have an issue that you tend to hallucinate cases out of the LLMs. 116 00:10:13,399 --> 00:10:15,640 Well, I think everyone's kind of solved the problem now. 117 00:10:15,640 --> 00:10:26,443 We've been doing this for a long time where you take the case from the LLM, you have a list out of the relevant cases, and then you have a secondary external data source that 118 00:10:26,443 --> 00:10:29,544 has a list of all the cases that you have API access to. 119 00:10:29,544 --> 00:10:33,625 And then you check the Blue Book citation to say, is this a real case or not? 120 00:10:33,625 --> 00:10:37,974 And then if it is, let's go check relevancy to ensure this is relevant to the answer. 121 00:10:37,974 --> 00:10:39,865 And if you get two checks, you say, OK, good. 122 00:10:39,865 --> 00:10:41,207 This is a real case. 123 00:10:41,207 --> 00:10:42,037 It's relevant. 124 00:10:42,037 --> 00:10:46,500 This is going to be passed on to the user, and they're going to be able to access that case. 125 00:10:46,601 --> 00:10:51,845 so this is where, yeah, it's true that I think they had a good insight. 126 00:10:51,845 --> 00:11:01,553 LLMs will continue to hallucinate and cause problems, but LLM software that has domain-specific engineering on top of it can solve those issues. 127 00:11:01,553 --> 00:11:03,672 And then the other one being 128 00:11:03,672 --> 00:11:08,805 Hey, can LLMs actually like pull out the legal actors and how can they figure out the holdings? 129 00:11:08,805 --> 00:11:19,402 That one, I think pretty well now with the top models, they're able to understand pretty well the holdings and you can test it yourself and see whether that's true empirically 130 00:11:19,402 --> 00:11:21,193 yourself pretty easily. 131 00:11:21,194 --> 00:11:27,397 In all my tests, and we use this a lot and do a lot of evaluations, they're quite good for the top models now. 132 00:11:27,412 --> 00:11:29,723 What about respecting the order of authority? 133 00:11:30,814 --> 00:11:32,755 That's another one that they get. 134 00:11:32,755 --> 00:11:37,158 You might have to prompt engineer it a bit, and this is where the domain software comes in again. 135 00:11:37,158 --> 00:11:46,244 But if you prompt engineer it well, it fully understands that the Supreme Court is superior to a state Supreme Court, the US Supreme Court versus state Supreme Court. 136 00:11:46,244 --> 00:11:51,947 It fully understands that that court's superior to a trial court and so forth. 137 00:11:52,567 --> 00:11:57,320 We use this every day, and this is something that it's able to do very consistently. 138 00:11:57,621 --> 00:12:04,902 So how would you assess the current state of AI capabilities in legal research and analysis as we sit today? 139 00:12:05,390 --> 00:12:09,230 Yeah, I would say raw LLMs out of the box, quite bad. 140 00:12:09,230 --> 00:12:12,010 I wouldn't use it for legal research. 141 00:12:12,010 --> 00:12:14,590 And again, they're going to hallucinate everything. 142 00:12:14,590 --> 00:12:16,590 They're going to miss some insights. 143 00:12:16,590 --> 00:12:25,770 They're going to have instances where, because they don't have in the pre-training data the full knowledge about all the cases and all the statute for that state, they're going 144 00:12:25,770 --> 00:12:32,830 to take majority rules and assume that those are right for that state, even though your state might be playing with them or using a minority rule. 145 00:12:33,030 --> 00:12:35,052 And so there's going to be a bunch of issues. 146 00:12:35,052 --> 00:12:38,044 You'll get kind of poorly formatted responses. 147 00:12:38,044 --> 00:12:41,976 If you ask it to draft a full brief, it'll give you like two pages. 148 00:12:42,057 --> 00:12:44,318 All of those things are problems. 149 00:12:44,438 --> 00:12:51,603 If you use good domain specific software, though, these are all engineering problems that are solvable by the legal tech companies. 150 00:12:51,603 --> 00:12:56,466 And a lot of us have started to or very substantially solve those issues. 151 00:12:56,827 --> 00:13:01,550 And so if you use good software, you can expect an extensive 152 00:13:01,550 --> 00:13:07,392 30 page brief, no case hallucinations, hopefully no holding hallucinations. 153 00:13:07,572 --> 00:13:10,094 You can expect that it's properly formatted. 154 00:13:10,094 --> 00:13:20,848 You can expect that it does go into the details regarding the state law that it's in question, just looking at the legal authorities to pull out the insights so that it's not 155 00:13:20,848 --> 00:13:22,298 just relying on majority rules. 156 00:13:22,298 --> 00:13:25,640 So all of those things that I think good software is able to do. 157 00:13:25,640 --> 00:13:31,446 That said, I would strongly suggest that we focus on software that keeps the attorney in the loop. 158 00:13:31,446 --> 00:13:34,929 and gets the, lets the lawyer audit the output. 159 00:13:34,929 --> 00:13:39,092 So I wouldn't want just, hey, here's the full, here's my fact pattern. 160 00:13:39,092 --> 00:13:44,295 I'm just going to let the AI often go off and run and just draft a full 30 page brief. 161 00:13:44,295 --> 00:13:46,997 I don't think that's a good solution right now. 162 00:13:46,997 --> 00:13:56,385 I think what the AI does well is it synthesizes large amounts of information, bubbles them up to the lawyer and probably gets it right the vast majority of the time, but the lawyer 163 00:13:56,385 --> 00:13:59,660 is still going to make the judgment call about which direction to go. 164 00:13:59,660 --> 00:14:01,353 And then the lawyer says, yes, go here. 165 00:14:01,353 --> 00:14:03,237 Don't pursue this and so forth. 166 00:14:03,237 --> 00:14:07,385 And then you work together with the AI to get a great answer very fast. 167 00:14:07,385 --> 00:14:10,900 And that's where I would say the focus should be. 168 00:14:11,304 --> 00:14:13,916 Yeah, and I have seen it's been a couple of months. 169 00:14:13,916 --> 00:14:14,947 I think it was late last year. 170 00:14:14,947 --> 00:14:30,037 I saw a chart of the amount of, gosh, guess comprehension, I guess, for lack of a better term of different sized rag prompts. 171 00:14:30,037 --> 00:14:33,520 So, and it trails off dramatically. 172 00:14:33,520 --> 00:14:39,476 The larger, you know, like there's some like, like, Gemini two has a 173 00:14:39,476 --> 00:14:41,876 a 1 million token context window, right? 174 00:14:41,876 --> 00:14:43,656 Which is pretty significant. 175 00:14:43,656 --> 00:14:47,156 I think Claude is a couple hundred thousand GPT a little lower. 176 00:14:47,156 --> 00:14:48,396 These are always moving. 177 00:14:48,396 --> 00:14:51,656 I'm, I'm, I'm, I might not be the current state of things. 178 00:14:51,656 --> 00:14:51,796 Yeah. 179 00:14:51,796 --> 00:14:52,656 Yeah. 180 00:14:52,656 --> 00:15:03,436 Um, but I, I saw a kind of a performance metric that through large amounts of documents through rag at these models. 181 00:15:03,436 --> 00:15:08,404 And it trailed off pretty substantially in terms of missing, you know, like 182 00:15:08,404 --> 00:15:16,375 key facts during summarization as the number of tokens increased. 183 00:15:16,375 --> 00:15:19,770 Are we getting any better there or is that still a limitation? 184 00:15:20,024 --> 00:15:23,096 Getting better is a limitation, but it can be engineered around. 185 00:15:23,096 --> 00:15:24,257 This is another one. 186 00:15:24,257 --> 00:15:31,641 What we've had to do this where client has say a 40 page document and it may be a hundred 40 page documents. 187 00:15:31,641 --> 00:15:34,783 And we're trying for each one to pull out all the payment terms. 188 00:15:34,783 --> 00:15:43,268 This is an area where out of the box, LLMs do pretty poorly without a tremendous amount of prompt engineering and kind of just general engineering. 189 00:15:43,268 --> 00:15:46,786 So what we need to do is break down the problems to where. 190 00:15:46,786 --> 00:15:55,038 we're only pushing through like a page at a time, maybe a little bit more than that, giving it enough context and then giving very detailed prompts on exactly what to look for 191 00:15:55,038 --> 00:15:56,550 and what not to look for. 192 00:15:56,590 --> 00:16:01,312 You have to do all of that in a pretty domain specific way to get good answers. 193 00:16:01,312 --> 00:16:07,915 And so I think if you're just using a raw LLM without a lot of engineering work, they're not gonna do very well here. 194 00:16:10,116 --> 00:16:11,056 Chunking's a big part of that. 195 00:16:11,056 --> 00:16:15,098 Yeah, you'll chunk the paper and then run in parallel. 196 00:16:15,310 --> 00:16:22,978 like a hundred different agents basically to each have their one page to review and then summarize in groups. 197 00:16:23,142 --> 00:16:24,312 Interesting. 198 00:16:24,413 --> 00:16:24,723 Yeah. 199 00:16:24,723 --> 00:16:37,703 You know, another challenge just as a, I'm not, I don't consider myself an AI expert, more of an enthusiast, but I, I, I really do put, um, AI through its paces on real world stuff 200 00:16:37,703 --> 00:16:41,272 mostly and find a huge variation. 201 00:16:41,272 --> 00:16:43,537 Um, I also find it quite confusing. 202 00:16:43,537 --> 00:16:48,460 All the fragmentation, like just within the open AI world, just the number. 203 00:16:48,460 --> 00:16:51,006 And I know five is supposed to solve that. 204 00:16:51,006 --> 00:17:00,661 but I'm still really curious how that's gonna, how that's gonna work because you know, today, I mean, what do you have eight drop down? 205 00:17:00,661 --> 00:17:01,261 You know what I mean? 206 00:17:01,261 --> 00:17:03,822 Like eight models you can choose from. 207 00:17:04,483 --> 00:17:05,963 that's, that's a big impediment. 208 00:17:05,963 --> 00:17:15,748 Somebody who pays really close attention to this still, I still don't have a firm handle on, you know, when to use what, um, it seems like a moving target. 209 00:17:16,526 --> 00:17:19,707 Yeah, and it's pretty tough for a casual user. 210 00:17:19,707 --> 00:17:24,169 You're far, far more knowledgeable about this than the average user. 211 00:17:25,730 --> 00:17:30,512 Basically, the factors you need to consider are A, how fast do need a response? 212 00:17:30,512 --> 00:17:33,413 B, what's the context window that I need? 213 00:17:33,593 --> 00:17:37,135 C, do I need a model with a lot of pre-training data? 214 00:17:37,135 --> 00:17:43,137 So as in, it has a lot of knowledge that I need to pull from, or do I need something more that's reasoning well? 215 00:17:43,137 --> 00:17:45,720 And based on those factors, you can choose the right model. 216 00:17:45,720 --> 00:17:53,737 But I'm in this every day and this is my business and so I'm familiar for your casual user, you have no idea which one to use. 217 00:17:53,737 --> 00:18:02,914 And yeah, GPT-5 though will help with that to where it's going to basically just figure out your question and then suggest the best model internally and then just give you that 218 00:18:02,914 --> 00:18:05,386 best model without you needing to even think about it. 219 00:18:05,386 --> 00:18:15,016 I think ironically, a lot of the time GPT-5 will be basically GPT-4 where they're just gonna say, well, GPT-4 is good enough to answer this, go ahead and move forward. 220 00:18:15,016 --> 00:18:17,789 because most questions that gets are actually pretty easy. 221 00:18:17,789 --> 00:18:22,272 There's a handful of hard questions that people push on every now and then. 222 00:18:22,353 --> 00:18:31,220 That said, again, I would stress that the legal tech groups are a lot better for solving domain-specific tasks than these models are anyway. 223 00:18:31,220 --> 00:18:37,426 And what's happening is basically we're standing on top of the best models for the specific task we're working on. 224 00:18:37,426 --> 00:18:43,020 We're choosing the best one, knowing exactly the that the tool that we need to use. 225 00:18:43,214 --> 00:18:52,730 And oftentimes we're using a combination of two or three, sometimes even from different groups, to where that combination, plus a lot of prompt engineering and other engineering 226 00:18:52,730 --> 00:18:55,761 on top of it, can yield pretty good results. 227 00:18:56,242 --> 00:19:07,748 And I think what you'll see is, in general, the legal tech companies are going to be about two years ahead of the raw LLMs, as far as their ability to practice law more or less, or 228 00:19:07,748 --> 00:19:11,850 support someone who's practicing law to be an amplifier of that person. 229 00:19:12,003 --> 00:19:23,766 And so in general, and I don't think it's very user friendly just to work from a chat window versus a nice template that's easy to follow just like a webpage. 230 00:19:24,210 --> 00:19:24,801 Yeah. 231 00:19:24,801 --> 00:19:32,266 So the model selection is that has to be done algorithmically, correct? 232 00:19:32,487 --> 00:19:36,520 what does the process look like for selecting the right model? 233 00:19:36,520 --> 00:19:38,412 Just maybe in how you do it. 234 00:19:38,412 --> 00:19:40,322 I think OpenAI is somewhat opaque. 235 00:19:40,322 --> 00:19:43,085 I'm not sure that they provide transparency around that. 236 00:19:43,085 --> 00:19:49,180 But just in broad brushstrokes, like, how does it determine which path to take? 237 00:19:49,870 --> 00:19:55,210 Yeah, for us, we decide, we don't do it in a full algorithm way. 238 00:19:55,210 --> 00:20:00,750 We have across our app probably 100, 200 different API calls. 239 00:20:00,750 --> 00:20:05,870 And for each one of those, we have a general view on, is this going to need speed? 240 00:20:05,870 --> 00:20:09,810 Is it going to need the ability to instruction follow really well? 241 00:20:09,810 --> 00:20:13,730 Is it going to need high pre-training knowledge and so forth? 242 00:20:13,730 --> 00:20:18,702 And then based on those factors, we'll say it's probably one of these three models that we should use. 243 00:20:18,702 --> 00:20:26,616 and then we run evals and anywhere important to say, okay, let's actually see what score these models get on our evaluations. 244 00:20:26,616 --> 00:20:40,634 And so that could be an evaluation, for instance, of how many cases are they returning that are accurate out of the, where we'll try to kind of do a full analysis on, okay, 245 00:20:40,634 --> 00:20:42,855 here's an evaluation question. 246 00:20:42,855 --> 00:20:47,822 Let's have real attorneys do the work and figure out what cases you would want to cite. 247 00:20:47,822 --> 00:20:55,782 And then as you figured out what cases you want to cite, we're going to score these cases to say, this is like a five, this case is a three, this is a one. 248 00:20:55,782 --> 00:21:03,262 And as far as importance, now let's have all the models do their work and give the case that they think are most relevant and we're going to score those. 249 00:21:03,262 --> 00:21:10,862 So we have a lot of those automations in place and then whenever a new model comes out, we just run it through the system of tests and say, okay, it's going to be good here, here 250 00:21:10,862 --> 00:21:11,422 and here. 251 00:21:11,422 --> 00:21:12,742 It's not going to be very good here. 252 00:21:12,742 --> 00:21:14,790 And we can move forward that way. 253 00:21:15,218 --> 00:21:15,668 Interesting. 254 00:21:15,668 --> 00:21:22,781 Yeah, it seems like, you know, finding the right balance between speed and quality is the sweet spot, right? 255 00:21:22,781 --> 00:21:31,535 You can't slow the process down too much or you're going to impact efficiency, but you need to, it's striking that balance. 256 00:21:31,535 --> 00:21:34,766 It seems like is the, is the strategy, is that accurate? 257 00:21:35,148 --> 00:21:37,439 Yeah, it's a fun challenge. 258 00:21:37,638 --> 00:21:47,382 A lot of what we do is we'll have a seven part workflow, for instance, and when the user does step two, we're kicking off a pretty slow model that's really smart. 259 00:21:47,382 --> 00:21:53,843 And then when they get to step six, that slow model is done with the analysis, and then it's inserting the answer for the user. 260 00:21:53,843 --> 00:21:59,805 Then it's done all that work in the background while they're filling out other information that's not as relevant to the answer. 261 00:22:00,025 --> 00:22:02,106 And so we do a lot of that. 262 00:22:02,166 --> 00:22:02,732 then 263 00:22:02,732 --> 00:22:06,684 Sometimes you just use the fast model because it's a fairly easy answer. 264 00:22:06,764 --> 00:22:07,955 So we'll do some of that. 265 00:22:07,955 --> 00:22:19,521 And it's just an interesting game of how do we think about the legal implications, how do we think about the AI driven implications and the technology implications, and then how do 266 00:22:19,521 --> 00:22:25,384 we think about a good user experience and pair all that together to give something that makes sense cohesively. 267 00:22:25,396 --> 00:22:26,096 Yeah. 268 00:22:26,096 --> 00:22:29,596 You know, recently I've seen interesting benchmarks. 269 00:22:29,596 --> 00:22:32,596 Was it Val's AI that put together? 270 00:22:32,596 --> 00:22:33,576 I'm not sure if you've seen it. 271 00:22:33,576 --> 00:22:51,216 It's just been maybe in the last week or so that, I don't know if it was a benchmark or a study that talked about real scenarios, legal workflows in which, measured efficiency. 272 00:22:51,436 --> 00:22:52,850 Again, there's so much stuff. 273 00:22:52,850 --> 00:22:54,411 flying at you these days. 274 00:22:54,411 --> 00:23:05,207 don't have it memorized, but it seems like there's more of a focus now on legal specific use cases and how these models perform in those scenarios. 275 00:23:05,247 --> 00:23:07,378 Are you seeing more of that now? 276 00:23:07,922 --> 00:23:09,263 I need to check out that study. 277 00:23:09,263 --> 00:23:11,725 actually haven't seen it. 278 00:23:11,725 --> 00:23:15,009 We love the idea of doing more legal benchmarks. 279 00:23:15,030 --> 00:23:21,136 That's an area where we've really taken a lot of time to try to build a tool that's useful from that perspective. 280 00:23:21,136 --> 00:23:25,280 And I think it's useful to the end user as well. 281 00:23:25,301 --> 00:23:27,422 But no, I haven't seen that specific study. 282 00:23:27,422 --> 00:23:30,836 I do like the idea though of pursuing that. 283 00:23:31,775 --> 00:23:37,650 Yeah, it's, it's this stuff is March four, three days ago, um, is the post I saw on it. 284 00:23:37,650 --> 00:23:39,761 And again, it's just so hard to keep up with. 285 00:23:39,761 --> 00:23:45,216 And there's so much that even, you know, after you read it, three more things fly at you. 286 00:23:45,216 --> 00:23:52,962 it's like, so what about, um, AI strategies in general for law firms? 287 00:23:52,962 --> 00:24:00,888 So, you know, I, I have been critical of law firms that seem to 288 00:24:01,170 --> 00:24:07,614 immediately deploy tactically versus figuring out strategically what they want to do. 289 00:24:07,614 --> 00:24:09,986 And strategy includes a lot of different things. 290 00:24:09,986 --> 00:24:15,059 It can include where to first focus your AI efforts. 291 00:24:15,059 --> 00:24:21,714 It could include the organizational design within the firm that's going to support those efforts. 292 00:24:22,395 --> 00:24:27,088 It can define the risk tolerance that the firm is willing to take. 293 00:24:27,088 --> 00:24:30,610 know, cause we still have, you know, we still have 294 00:24:30,610 --> 00:24:35,342 I saw an interesting study from the legal value network. 295 00:24:35,342 --> 00:24:39,484 They do an LPM survey every year. 296 00:24:39,484 --> 00:24:43,905 one question that was asked was what percentage of your clients? 297 00:24:44,426 --> 00:24:47,747 So they talked to, think 80 law firm GCs. 298 00:24:48,608 --> 00:24:57,431 And what percentage of your clients either discourage or prohibit the use of AI in their matters? 299 00:24:57,431 --> 00:24:59,592 And the number was 42%. 300 00:24:59,592 --> 00:25:10,258 which seems shockingly high because I saw another study from the Blikstein group that it's the LDO law department. 301 00:25:10,258 --> 00:25:12,300 I forget what the acronym stands for. 302 00:25:12,300 --> 00:25:23,766 And anyway, almost 60 % of the law firm client GCs that they talked to said that law firms aren't using technology enough to drive down costs. 303 00:25:23,766 --> 00:25:26,378 And those are two very conflicting data points. 304 00:25:26,378 --> 00:25:27,004 It's like, 305 00:25:27,004 --> 00:25:33,244 OK, you want me to drive down costs, but you got OCG's that prevent me from implementing the technology. 306 00:25:33,244 --> 00:25:39,738 I can't use them on your matters like I don't know what you feel like that's a disconnect in the marketplace still. 307 00:25:40,152 --> 00:25:41,863 I think it's very bimodal. 308 00:25:41,863 --> 00:25:52,911 I think that you have a lot of attorneys on one side or the other where some really want to embrace the newest technology all in and others are very cautious about it. 309 00:25:52,911 --> 00:25:57,534 And there's not as many groups in the middle as you'd expect. 310 00:25:57,534 --> 00:25:59,795 And so it's not like your normal bell curve. 311 00:25:59,936 --> 00:26:01,957 And so I think that's what's going on. 312 00:26:01,957 --> 00:26:09,474 And I think the organizational strategy and kind of transformation lens is a really tough and interesting question for 313 00:26:09,474 --> 00:26:11,715 organizational leaders to think about. 314 00:26:11,715 --> 00:26:14,436 I think we probably disagree a little bit on this one. 315 00:26:14,436 --> 00:26:24,880 I have more of an engineering mindset on it where I think the way to go is start small and iterate and then run in parallel your strategy. 316 00:26:25,500 --> 00:26:35,765 We've just seen so many instances where a company really wants to get into AI, they're strategizing about it and a year later they haven't really done anything and they don't 317 00:26:35,765 --> 00:26:37,560 really get it because they have 318 00:26:37,560 --> 00:26:41,113 They're senior leaders doing strategy stuff without being very hands-on. 319 00:26:41,113 --> 00:26:42,774 They don't really get it. 320 00:26:42,774 --> 00:26:53,582 I think if you take the time to have a small group of people that are really invested in using AI every day, try out some leading tools, go in the right direction. 321 00:26:53,582 --> 00:26:55,363 Don't do anything just crazy. 322 00:26:55,363 --> 00:26:57,875 And then just don't put in any client information. 323 00:26:57,875 --> 00:27:04,640 Just do everything based on just very kind of random or kind of synthesized or sanitized data. 324 00:27:04,666 --> 00:27:10,570 I think if you do that, you can get a pretty good sense of, now I get what people are using this for. 325 00:27:10,570 --> 00:27:15,773 We could use it here, here, and here, but I can't use it in this area because it's going to have issues. 326 00:27:15,773 --> 00:27:20,796 Or this is decent software in this way, but not in this other way. 327 00:27:20,796 --> 00:27:23,358 Now I can make informed strategic decisions. 328 00:27:23,358 --> 00:27:29,161 I think that if you kind of do that pairing, that's probably what I think would be the best approach. 329 00:27:29,202 --> 00:27:29,482 Yeah. 330 00:27:29,482 --> 00:27:31,403 Well, we're, we're aligned on part of that. 331 00:27:31,403 --> 00:27:43,798 So I think that striking the right risk reward balance is key and, that should be the number one, um, driver of the approach. 332 00:27:43,838 --> 00:27:44,398 Right. 333 00:27:44,398 --> 00:27:54,863 I think that jumping right in on the practice side and, know, going whole hog with attorneys who have super high opportunity costs and low tolerance for missteps is a 334 00:27:54,863 --> 00:27:55,643 mistake. 335 00:27:55,643 --> 00:27:56,943 So we're aligned on that. 336 00:27:56,943 --> 00:27:58,549 I guess where I get hung up, 337 00:27:58,549 --> 00:28:03,480 is that, I'm going to quote another study here, or survey. 338 00:28:03,480 --> 00:28:13,753 Thomson Reuters did one, the professional services, gen AI survey came out late last year and only 10 % of law firms, one out of 10 have a gen AI policy. 339 00:28:13,913 --> 00:28:19,114 So in order to write a policy, I think you need a strategy first, right? 340 00:28:19,114 --> 00:28:26,674 A policy is an outcome, is an outgrowth of a strategy, but nine out of 10 don't have one. 341 00:28:26,674 --> 00:28:37,690 So what you have now is law firm users who don't have proper guidance on, hey, what can I use the public models for? 342 00:28:37,690 --> 00:28:38,691 Can I use them at all? 343 00:28:38,691 --> 00:28:40,412 Do I use it on my phone? 344 00:28:40,412 --> 00:28:44,254 What do I use it my personal laptop when I'm not connected to the VPN? 345 00:28:44,254 --> 00:28:49,196 Like all of those questions not being answered, I think creates unnecessary risk. 346 00:28:49,196 --> 00:28:56,370 Maybe at a certain, you know, altitude defining the strategy and incrementally 347 00:28:56,370 --> 00:29:00,529 working your way down more granularly, maybe that's the right balance. 348 00:29:00,684 --> 00:29:01,935 I think we're in sync there. 349 00:29:01,935 --> 00:29:07,059 think it's crazy that you wouldn't have a GNI policy at this point. 350 00:29:07,059 --> 00:29:12,604 think our company at Direct TV, I think we had one three months in after ChatGBT. 351 00:29:12,604 --> 00:29:20,371 I was on the executive board there and we thought just immediately we have to give the company employees something to give some guidance. 352 00:29:20,371 --> 00:29:22,132 And yeah, I think you're exactly right. 353 00:29:22,132 --> 00:29:29,578 You start high, you make it a little bit overly restrictive, then you dig into the details and you realize, okay, here's where we can open up. 354 00:29:29,622 --> 00:29:38,376 a little bit more, here's where we can be a little bit less or more forgiving on the use of the tools and just be smart about that. 355 00:29:38,858 --> 00:29:44,415 But yeah, if you're working in a law firm, you don't have a strategy, I think you definitely should start working on that right away. 356 00:29:44,415 --> 00:29:45,125 Yeah. 357 00:29:45,125 --> 00:29:55,458 And what do you, what do you think about this is another, you know, opinion of mine, some may agree, disagree, but I see a lot of law firm C-suite and director level roles, both on 358 00:29:55,458 --> 00:30:06,651 the innovation and AI world that are brought in without any sort of strategy and essentially brought in to just kind of figure it out. 359 00:30:06,731 --> 00:30:11,922 And normally I like an agile approach, but the problem with 360 00:30:12,264 --> 00:30:28,148 this approach in law firms is there is those resources are typically not sufficiently empowered to make change and law firm decision making is so friction heavy that it feels 361 00:30:28,148 --> 00:30:31,329 like you're setting these leaders up. 362 00:30:31,449 --> 00:30:36,091 You're not setting them up for success if because the tone has to be set at the top, right? 363 00:30:36,091 --> 00:30:40,472 Again, around risk taking around where they want to. 364 00:30:41,556 --> 00:30:44,436 add value within the business. 365 00:30:45,516 --> 00:30:56,096 know, just all of these things that need to happen at the senior, at the most senior level and bringing somebody in, even if it's a C-suite, but at the director level, like, do 366 00:30:56,096 --> 00:31:01,896 really think this person's going to have the political capital to make recommendations and those get it? 367 00:31:01,896 --> 00:31:03,136 How long is that going to take? 368 00:31:03,136 --> 00:31:06,376 Like they'll be there three years before anything gets, I don't know. 369 00:31:06,376 --> 00:31:08,776 Do you have any thoughts on the sequence? 370 00:31:09,452 --> 00:31:10,442 A couple thoughts. 371 00:31:10,442 --> 00:31:20,055 I think it's a tough problem for one in the fact that you have usually have a lot of partners and managing partners that are making decisions, decisions collectively. 372 00:31:20,055 --> 00:31:24,906 That's just inherently harder to kind of move the ship and all that. 373 00:31:25,026 --> 00:31:35,499 That said, I would say when we speak with most of the senior leaders at firms, I don't think they're that deep on what's possible with TENAI, how the value of it is very 374 00:31:35,499 --> 00:31:37,450 specific some implementation any of that. 375 00:31:37,450 --> 00:31:38,850 What I'd recommend is 376 00:31:38,850 --> 00:31:50,098 Think about the core values that you care about, like risk versus the impact to your business from an acceleration perspective, or the ability to add more insight, and all 377 00:31:50,098 --> 00:31:54,441 those high level values with maybe confidentiality and security and all that. 378 00:31:54,441 --> 00:32:00,726 And just in a very general sense, align at the highest level on what trade-offs you wanna make. 379 00:32:00,726 --> 00:32:06,562 And then once you have that general view, then empower somebody who is 380 00:32:06,562 --> 00:32:16,665 very knowledgeable in the area to give very specific recommendations of, given what you said from a value standpoint, here's how we can implement an end-to-end strategy around 381 00:32:16,665 --> 00:32:21,626 Gen.AI that makes sense and is aligned with what you're guiding me on. 382 00:32:21,626 --> 00:32:32,615 And then I think in parallel, I would really try to have some subset of users be very engaged in using a tool and getting a good sense and getting learnings from that and 383 00:32:32,615 --> 00:32:35,630 having the groups present jointly. 384 00:32:35,795 --> 00:32:37,258 to the managing partners. 385 00:32:37,258 --> 00:32:39,723 I think that's probably a good recipe for success. 386 00:32:39,804 --> 00:32:40,324 Yeah. 387 00:32:40,324 --> 00:32:54,728 And, know, I have advocated for bringing consultants in for that part of the journey, just because I worry that bringing in, you know, again, a director level role to manage this, 388 00:32:54,728 --> 00:33:05,311 um, is just a tough, a tougher sell than if, you know, the executive committee brings in consultants who, and you know what, there's a gap in the marketplace right now. 389 00:33:05,311 --> 00:33:08,776 There's not people like you who really know this stuff. 390 00:33:08,776 --> 00:33:10,677 are sitting in a seat like yours. 391 00:33:10,677 --> 00:33:23,704 There's so much capital being deployed in this area of tech that if you have these skillsets, going out and selling your time hourly is not the best way to capture economic 392 00:33:23,704 --> 00:33:24,465 value. 393 00:33:24,465 --> 00:33:27,276 It's to do something like you're doing with a startup. 394 00:33:27,276 --> 00:33:34,190 And as a result, I think there's a big gap in the consulting world with people who really know their stuff. 395 00:33:34,190 --> 00:33:36,731 So I do sympathize. 396 00:33:36,731 --> 00:33:38,365 Do you see that gap? 397 00:33:38,365 --> 00:33:39,354 as well. 398 00:33:40,280 --> 00:33:41,780 think we're aligned there. 399 00:33:41,780 --> 00:33:44,811 It's a really tough problem for law firms because of that. 400 00:33:45,252 --> 00:33:56,875 I mean, one thing you could try to do is work with a leader of a vendor and just say, hey, look, I can't use your software, but we'd love to form a longer term relationship over 401 00:33:56,875 --> 00:33:57,615 time. 402 00:33:57,615 --> 00:34:06,138 And can you just give us some general guidance on how we can be effective and know that that person is going be a little bit biased is what one thing you can do. 403 00:34:07,375 --> 00:34:16,683 I do think that trying to find the right consultant, that there are some out there that you might be able to find one, but it's tough and you might need to just rely on finding 404 00:34:16,683 --> 00:34:23,248 your most tech forward partner to take a lead position and say, hey, you've got to get really deep on this stuff. 405 00:34:23,248 --> 00:34:35,798 And I think one thing you need to be cautious about is if you find someone who's not very kind of forward from a transformation perspective, they're going to move very slowly. 406 00:34:35,798 --> 00:34:40,825 relative to somebody who's just like, hey, we need to stop everything and figure out how to do this effectively. 407 00:34:40,825 --> 00:34:44,570 That person's gonna have enough friction thrown at them to slow them down anyway. 408 00:34:44,570 --> 00:34:46,642 But I would start with someone like that. 409 00:34:46,642 --> 00:34:48,373 Yeah, that makes sense. 410 00:34:48,373 --> 00:34:53,214 lot of partners still have books of business. 411 00:34:53,975 --> 00:34:57,276 It's a tough problem for sure. 412 00:34:57,276 --> 00:34:58,377 No easy answers. 413 00:34:58,377 --> 00:35:06,780 How should law firms think about balancing efficiency gains and the impact to the billable hour? 414 00:35:07,406 --> 00:35:15,586 Yeah, this is one where we get all the time, Okay, maybe someday or today your software is good enough to where you're adding efficiency. 415 00:35:15,586 --> 00:35:17,146 I'm just going to build less, right? 416 00:35:17,146 --> 00:35:19,226 So why do even want this software? 417 00:35:19,686 --> 00:35:20,726 A few thoughts on that. 418 00:35:20,726 --> 00:35:25,366 One, in a lot of cases, attorneys aren't always billing by the billable hour. 419 00:35:25,366 --> 00:35:33,986 It could be contingency, they could be in-house, that it could be doing a cost per X type of model where it's like I'm going to charge you per demand letter I write or something 420 00:35:33,986 --> 00:35:34,670 like that. 421 00:35:34,670 --> 00:35:42,970 For those that do need to do the billable hour, which is the majority of attorneys, my view is that it's kind of like computers. 422 00:35:43,310 --> 00:35:53,610 It's not like after the computer came out 10 years later, were still spending most of their time going to law libraries and manually checking out books and reading through 423 00:35:53,610 --> 00:35:53,950 books. 424 00:35:53,950 --> 00:35:55,810 It's just not as efficient. 425 00:35:56,010 --> 00:35:59,510 What will happen is that the market will all move toward AI. 426 00:35:59,510 --> 00:36:02,606 Then if you're the one laggard who's not using it at all, 427 00:36:02,606 --> 00:36:04,586 it's just going to be pretty obvious. 428 00:36:04,586 --> 00:36:12,666 Groups are going to know about that and they're not going to use you because you produce less legal work than the alternative groups. 429 00:36:13,446 --> 00:36:15,546 And so that's where I see the market going. 430 00:36:15,546 --> 00:36:18,665 The other benefit is lawyers write off a lot of their time. 431 00:36:18,665 --> 00:36:22,866 I mean, if you work a 10 hour day, you might bill six and a half hours on average. 432 00:36:22,866 --> 00:36:29,046 And a lot of that time is because you're doing background legal research work or background work to get up to speed. 433 00:36:29,046 --> 00:36:30,626 does that really well. 434 00:36:30,732 --> 00:36:34,865 your goal as a law firm would probably be to have higher revenue per attorney. 435 00:36:34,865 --> 00:36:38,527 And if attorneys are billing a higher percentage of their time, you're meeting that goal. 436 00:36:38,527 --> 00:36:42,150 So I think that there's a lot of talk about the billable hour. 437 00:36:42,150 --> 00:36:43,931 And I think it's not going away. 438 00:36:43,931 --> 00:36:48,434 Maybe some on the margins, maybe there's some changes. 439 00:36:48,434 --> 00:36:53,678 But I think that lawyers are going to want to be efficient. 440 00:36:53,678 --> 00:36:56,660 And over time, they're going to lean on these tools. 441 00:36:56,660 --> 00:37:00,248 think the hesitancy with AI has been more 442 00:37:00,248 --> 00:37:04,983 There's a lot of traps and a lot of just, this wasn't very good type of outputs. 443 00:37:04,983 --> 00:37:10,918 And I think that the industry is getting pretty close to where those types of issues are going away pretty fast. 444 00:37:10,918 --> 00:37:11,828 Yeah. 445 00:37:12,009 --> 00:37:16,651 Well, what about high value use cases in, in, on the practice side? 446 00:37:16,651 --> 00:37:22,173 Like I know you and I talked about, document review and timeline creation. 447 00:37:22,173 --> 00:37:24,831 And I thought the timeline creation wasn't, was an interesting one. 448 00:37:24,831 --> 00:37:37,030 I I'm not a lawyer and I don't know how often that scenario comes into play, but any thoughts on, know, where, where the high value use cases are within the practice today? 449 00:37:37,474 --> 00:37:49,104 Yeah, the areas that I think are most interesting are where AI can synthesize very large amounts of data and get a pretty much fully accurate answer almost every time. 450 00:37:49,545 --> 00:37:54,250 And so a couple of areas that really make sense, like you mentioned, timelines. 451 00:37:54,250 --> 00:38:03,286 You can ingest all of your documents, like a discovery set that's all relevant documents, throw that into the AI. 452 00:38:03,286 --> 00:38:09,872 at say, hey, pull out all the relevant pieces and create a timeline based on that, and then use that to draft a statement of facts. 453 00:38:09,872 --> 00:38:11,773 That's gonna be pretty good. 454 00:38:11,773 --> 00:38:15,517 And that's not that hard to set up to do really well. 455 00:38:15,517 --> 00:38:26,486 I've been seeing a lot of users use our tool and are just like, wow, this saved a ton of time before I was kind of nervous about letting AI answer legal research questions. 456 00:38:26,486 --> 00:38:29,528 But when I see this, this is super useful. 457 00:38:29,629 --> 00:38:31,660 Very similar concept with doc review. 458 00:38:31,660 --> 00:38:41,753 you can automate your doc review and put in hundreds of thousands of pages of files that AI is looking through to see whether it's relevant based on the context you give it and 459 00:38:41,753 --> 00:38:43,693 based on what you're asking to search for. 460 00:38:43,693 --> 00:38:51,415 And a very high percentage of the actually relevant files will be pulled out, prioritized, and then synthesized in summary. 461 00:38:51,516 --> 00:38:58,898 Those types of tools are extremely useful where they might save thousands of hours of time to get your first pass in doc review. 462 00:39:00,102 --> 00:39:06,428 I would say that the error rate is pretty similar to humans at this point, if for a well-engineered software. 463 00:39:06,428 --> 00:39:18,799 And there's other things that are similar, like you can do where you have to have an expert say, here's all of the information I relied on before I go take the stand. 464 00:39:18,799 --> 00:39:26,446 And can you create the reliance list based on all of these 200 files that I uploaded to your system? 465 00:39:26,446 --> 00:39:36,066 and said, are the pieces I relied on, it might take an attorney 100 hours to build that tool, we'll do that with 100 % precision if it's well engineered, and you've just saved 466 00:39:36,066 --> 00:39:36,906 that time. 467 00:39:36,906 --> 00:39:45,346 So those areas are the ones that I'd say are the most interesting that I would really recommend groups try out for a good vendor. 468 00:39:45,346 --> 00:39:49,666 And then there's others where I think legal research is a really hard problem. 469 00:39:49,666 --> 00:39:55,054 It's the first one we started tackling, but just think about all the decisions that the lawyer needs to make. 470 00:39:55,054 --> 00:39:59,814 when they do legal research, they're thinking about, what kind of, is this a motion to dismiss? 471 00:39:59,814 --> 00:40:01,534 Is it a motion for summary judgment? 472 00:40:01,594 --> 00:40:03,294 Is it a trial? 473 00:40:03,694 --> 00:40:06,354 Am I drawing, doing the original complaint? 474 00:40:06,354 --> 00:40:09,954 I'm gonna have very different cases that I use in all of those scenarios. 475 00:40:09,954 --> 00:40:13,854 I've gotta understand the relevancy of the cases, the procedural posture of the case. 476 00:40:13,854 --> 00:40:22,874 I need to think about whether in that case the court ruled for or against my client or the person that's analogous to my client. 477 00:40:22,874 --> 00:40:24,662 And there's so many factors. 478 00:40:24,662 --> 00:40:32,709 I think we're getting very close to where I feel pretty good about our analysis, but we still want the lawyer heavily in the loop through the process. 479 00:40:32,710 --> 00:40:35,823 But the other areas are just AI just does it really well. 480 00:40:35,823 --> 00:40:37,014 It's not as complicated. 481 00:40:37,014 --> 00:40:45,072 And I definitely recommend you get started on those areas and then dip your feet in some of the peripheral areas like billing or areas where it's a little bit less related to core 482 00:40:45,072 --> 00:40:45,962 work too. 483 00:40:46,494 --> 00:40:54,791 So like in the scenario of like creating a timeline on the surface to me, that doesn't sound like something that requires a point solution. 484 00:40:54,791 --> 00:41:05,530 Like can the general models or I'm not a big fan of copilot at the moment, but do you need a specifically trained platform to do that effectively? 485 00:41:05,998 --> 00:41:08,979 I think eventually the general models will be able to do it. 486 00:41:09,939 --> 00:41:17,642 I don't know that any of the general models can take like 100 different documents being uploaded at once. 487 00:41:17,642 --> 00:41:26,984 if you just use even some of the better ones that instruction followed really well, that have big context windows, they're still gonna miss a lot if you don't do deeper 488 00:41:26,984 --> 00:41:27,534 algorithms. 489 00:41:27,534 --> 00:41:32,826 So for example, for us, we kind of what I mentioned earlier, we're... 490 00:41:32,826 --> 00:41:39,602 If we were to just say, Gemini, you do this pretty well, go ahead and pull out a timeline, they're going to get a lot of it right. 491 00:41:39,602 --> 00:41:40,893 They're going to miss a lot. 492 00:41:40,893 --> 00:41:44,276 If we say, Gemini, we're going to give you one page at a time. 493 00:41:44,276 --> 00:41:45,247 Here's the full context. 494 00:41:45,247 --> 00:41:47,098 Here's exactly what I want you to look for. 495 00:41:47,098 --> 00:41:48,849 And here's the things you might get tripped up on. 496 00:41:48,849 --> 00:41:50,251 Here's how to solve it. 497 00:41:50,251 --> 00:41:51,802 And now go pull these out. 498 00:41:51,802 --> 00:41:53,703 Then we're going to get really good results. 499 00:41:53,844 --> 00:42:00,030 And so maybe in a couple of years, the tools will be very reliable in a general sense to be able to do that. 500 00:42:00,030 --> 00:42:00,842 I think they're 501 00:42:00,842 --> 00:42:06,106 not that the raw LLMs aren't today, but we're not the only group doing timelines. 502 00:42:06,127 --> 00:42:07,668 YXLR does those really well. 503 00:42:07,668 --> 00:42:09,428 I'm sure other groups do as well. 504 00:42:09,428 --> 00:42:09,768 Yeah. 505 00:42:09,768 --> 00:42:15,448 So you need a layer of engineering on top to manage that today. 506 00:42:16,368 --> 00:42:18,168 That's interesting. 507 00:42:20,488 --> 00:42:27,148 What about building trust with AI in the firms? 508 00:42:27,548 --> 00:42:38,208 And this goes deeper than just within the firms, the clients ultimately, as you can see with almost half still discouraging or prohibiting 509 00:42:38,208 --> 00:42:42,374 use, there's still a lack of trust with these tools. 510 00:42:42,374 --> 00:42:44,846 How do we bridge that gap? 511 00:42:45,934 --> 00:42:50,154 The trust gap can come from a few different, for a few different reasons. 512 00:42:50,154 --> 00:42:52,234 So one, it could be a security issue. 513 00:42:52,234 --> 00:42:54,434 Two, it could be a confidentiality issue. 514 00:42:54,434 --> 00:42:57,414 And then three, it could be like an accuracy hallucination issue. 515 00:42:57,414 --> 00:43:03,914 So from a security standpoint, you obviously want to make sure that the model's not training on any information that you share with it. 516 00:43:03,914 --> 00:43:10,334 But most of the tools that are able to satisfy that requirement pretty easy. 517 00:43:10,334 --> 00:43:15,750 And even now, if you're using like a pro or plus account with .gbt, it's doing that as well. 518 00:43:17,123 --> 00:43:21,466 You still have a lot of security holes that can happen for anything in the cloud. 519 00:43:21,466 --> 00:43:25,899 So it's helpful to see that the group is SOC 2 compliant and has that certification. 520 00:43:25,899 --> 00:43:33,294 It's helpful to ensure that the groups following best practice as far as encryption, they're encrypting at rest and in transit. 521 00:43:33,575 --> 00:43:37,578 It's a nice to have, I think, to say that UPI iSrub as well. 522 00:43:37,578 --> 00:43:41,741 That's something you might want to look for if you're extra cautious for something particularly sensitive. 523 00:43:43,694 --> 00:43:49,679 And so that's helping with security and for the most part, confidentiality as well. 524 00:43:49,679 --> 00:43:54,402 You might want to look for groups that set up double encryption or mutual encryption. 525 00:43:54,402 --> 00:44:01,728 So or end to end encryption, where they're able to encrypt the data to where even their engineers can't see your data sets. 526 00:44:01,728 --> 00:44:04,600 That's possible and technically something that you can do. 527 00:44:04,600 --> 00:44:09,693 And so anything that's extremely sensitive, you might want to ask for that. 528 00:44:09,934 --> 00:44:10,695 But 529 00:44:10,695 --> 00:44:17,829 Based on if you do those two things, you should be in a pretty good position to where you're meeting those requirements. 530 00:44:17,829 --> 00:44:28,815 From an accuracy and hallucination standpoint, to me the value is, the way you saw that is, keep the client in the loop, make audit trails, and build things together. 531 00:44:28,876 --> 00:44:39,694 So if you have software that says, okay, these are the cases I used, click the link to see the cases, double click to say, here's the material facts I relied on, here's the... 532 00:44:39,694 --> 00:44:42,594 quotes I relied on to generate this holding, all that stuff. 533 00:44:42,594 --> 00:44:52,254 If your software is able to do that, I think that it's able to really satisfy the concerns that lawyers might have of, hold on, this might not even be real, is this something I can 534 00:44:52,254 --> 00:44:53,514 rely on? 535 00:44:53,514 --> 00:45:00,074 And you want that audit trail to be something that it's much faster to audit than to just do the work from start to finish. 536 00:45:00,422 --> 00:45:00,712 Right. 537 00:45:00,712 --> 00:45:07,116 Because that's always the rub is like, am I really being more efficient if I have to go back and double check everything? 538 00:45:07,116 --> 00:45:12,659 It really impacts the ROI equation when, when you have to do that. 539 00:45:12,659 --> 00:45:15,601 Well, this has been a really good conversation. 540 00:45:15,601 --> 00:45:20,824 I knew it would be just, uh, we we've had some good dialogue in the past. 541 00:45:20,824 --> 00:45:27,258 How do, before we wrap up here, how do people find out more about what you're doing at Caledas legal? 542 00:45:27,662 --> 00:45:37,822 Yeah, check us out at caladisai.com, C-A-L-L-I-D-U-S-A-I.com, or shoot me a message at justin at caladisai.com. 543 00:45:37,822 --> 00:45:39,202 And I really appreciate you having me on. 544 00:45:39,202 --> 00:45:40,602 This was a great talk. 545 00:45:40,602 --> 00:45:41,162 Thanks. 546 00:45:41,162 --> 00:45:42,263 Yeah, absolutely. 547 00:45:42,263 --> 00:45:43,924 All right, have a great weekend. 548 00:45:44,286 --> 00:45:45,507 All right, take care. 00:00:05,224 Justin, how are you this morning? 2 00:00:05,400 --> 00:00:06,792 Doing well, good morning. 3 00:00:06,803 --> 00:00:11,272 Yeah, I appreciate you jumping on here with me for a few minutes. 4 00:00:11,478 --> 00:00:12,565 Absolutely. 5 00:00:12,981 --> 00:00:15,581 So you and I connected at TLTF. 6 00:00:15,581 --> 00:00:28,261 I think we were having lunch and we were talking about AI and reasoning and you were sitting next to me and chimed in and had some really good thoughts on that topic. 7 00:00:28,261 --> 00:00:38,561 And I think we've, we've covered that on previous episodes, but, um, you and I had another conversation and I thought you had some really good insights to share. 8 00:00:38,561 --> 00:00:41,469 It sounds like you, you dive pretty deep. 9 00:00:41,469 --> 00:00:46,272 which it's always good to hear perspective from folks that really, really jump in. 10 00:00:46,272 --> 00:00:51,314 But before we jump into the content here, let's just get you introduced. 11 00:00:51,314 --> 00:00:55,736 So you currently lead, is it Calidis Legal AI? 12 00:00:56,677 --> 00:00:57,338 Okay. 13 00:00:57,338 --> 00:01:11,475 And your focus on pairing AI with lawyers to enhance core legal work, you're deep in AI and ML, your former practicing attorney doing &A work. 14 00:01:11,549 --> 00:01:14,301 which I think is interesting. 15 00:01:14,301 --> 00:01:17,542 Tell us more about your background and what you're doing today. 16 00:01:17,976 --> 00:01:24,569 Yeah, started off practicing &A, corporate restructuring, ended up working for AT &T after that for a while. 17 00:01:24,569 --> 00:01:29,791 And I led the AT &T legal transformation with our deputy GC. 18 00:01:29,791 --> 00:01:33,412 It was really successful and very interesting for me. 19 00:01:33,552 --> 00:01:38,224 The group looked at just understanding what are our attorneys doing? 20 00:01:38,224 --> 00:01:40,675 What are they doing that's not the highest priority? 21 00:01:40,675 --> 00:01:42,036 How can they reprioritize? 22 00:01:42,036 --> 00:01:48,012 How can we think about bringing in-house some work that we're using outside counsel for that we can 23 00:01:48,012 --> 00:01:52,574 add more efficiency to the internal resources and have them do more of the work internally? 24 00:01:52,574 --> 00:01:59,807 Where are we missing key insights and adding liability and where are we over-focused where we shouldn't be? 25 00:01:59,807 --> 00:02:04,849 And how can we just rethink some of the workflows to where we're operating more effectively? 26 00:02:04,849 --> 00:02:05,989 So I did that for a bit. 27 00:02:05,989 --> 00:02:11,111 And then my next gig was running a data science and engineering org. 28 00:02:11,111 --> 00:02:12,541 And we launched the first Gen. 29 00:02:12,541 --> 00:02:16,713 AI product for the company, the subsidiary, Direct TV. 30 00:02:16,770 --> 00:02:18,031 which was pretty informative. 31 00:02:18,031 --> 00:02:20,092 It was right after CHATGBT came out. 32 00:02:20,092 --> 00:02:30,759 And to me, the reason I started my startup was it was so obvious if you paired the legal transformation work with the GEN.AI work, there was going to be a big opportunity. 33 00:02:30,759 --> 00:02:36,682 And I didn't think it was going to be something from day one that CHATGBT could just replace lawyers jobs or anything like that. 34 00:02:36,682 --> 00:02:43,146 But I thought over time, this looked like a great starting block for something that could be really powerful. 35 00:02:43,476 --> 00:02:44,436 Interesting. 36 00:02:44,436 --> 00:02:44,656 Yeah. 37 00:02:44,656 --> 00:02:56,896 And I remember you and I had some subsequent, we had some subsequent dialogue on LinkedIn talking about, we were talking about the Stanford paper and how, um, and also the Apple 38 00:02:56,896 --> 00:03:01,156 intelligence paper on the GSM eight K battery of test. 39 00:03:01,156 --> 00:03:06,736 think they call it GSM eight K adaptive, which, um, GSM is grade school math. 40 00:03:06,736 --> 00:03:10,188 And then there were 8,000 questions that 41 00:03:10,260 --> 00:03:13,980 were used to evaluate how well AI performed. 42 00:03:14,720 --> 00:03:28,320 And Apple Intelligence did a study on that and changed the adaptive part is where they changed minor details about the questions and to see how the models would perform. 43 00:03:28,460 --> 00:03:37,080 And they degraded quite a bit anywhere from, I think, at the low end, the degradation was like 30%. 44 00:03:37,080 --> 00:03:38,476 So at the time, 45 00:03:38,579 --> 00:03:39,789 Maybe that was one. 46 00:03:39,789 --> 00:03:53,044 I forget exactly what model on the open AI side was the latest and greatest, um, all the way down to like a 70 % degradation just by inserting irrelevant facts about the questions 47 00:03:53,044 --> 00:03:54,375 or changing the names. 48 00:03:54,375 --> 00:04:04,939 Um, that's, and as you pointed out, that has since been resolved and you know, which makes me wonder, all right, did they do that? 49 00:04:05,019 --> 00:04:07,240 Did they game the system at all? 50 00:04:07,240 --> 00:04:08,370 Like, Hey, we've got a, 51 00:04:08,370 --> 00:04:19,675 we've got a weakness here, let's apply a band aid or was there a fundamental adaptation that they implemented that helped? 52 00:04:19,675 --> 00:04:33,231 But I think you had, when you ran those same questions through wherever we were at that point, maybe it was 4.0, you had different output, like it answered successfully. 53 00:04:33,231 --> 00:04:34,982 Am I remembering that correctly? 54 00:04:35,304 --> 00:04:37,215 Yeah, I think pretty close. 55 00:04:37,215 --> 00:04:43,398 I think that the Apple paper, I could be wrong, but I thought it was they used a battery of models. 56 00:04:43,398 --> 00:04:48,360 The only one that was somewhat advanced was GPD for the original GPD for. 57 00:04:48,360 --> 00:04:55,283 And now we have quite a lot better models with three mini and GPD four point five. 58 00:04:55,283 --> 00:05:02,478 And if you look at like the benchmarks for what the benchmark I like the most is live bench, which they. 59 00:05:02,478 --> 00:05:09,838 hide the questions, you can't really game the system, they change the questions regularly, and they do a full battery of tests. 60 00:05:09,998 --> 00:05:15,478 GVD-4 scored about a 45, and the best models now score about a 76. 61 00:05:15,478 --> 00:05:19,278 So they've come a long way in those benchmark tests. 62 00:05:19,278 --> 00:05:30,018 And when you use the top models now to do the same questions that Apple had, and continue to variable different pieces and add irrelevant information so that you're sure that it 63 00:05:30,018 --> 00:05:32,418 wasn't trained on any of that information. 64 00:05:32,610 --> 00:05:34,632 they're answering every question correctly. 65 00:05:34,632 --> 00:05:42,897 And so I had sent over a handful of examples yesterday just to kind of prove my point empirically that this is testable, this is falsifiable. 66 00:05:42,977 --> 00:05:47,640 You can run the test yourself and see, no, the AI actually is able to solve these things. 67 00:05:47,640 --> 00:05:57,787 And as far as how they did it, I'm not sure all of the specifics, but I think a lot of it is on the post-training side where they're teaching it to, after they've completed the 68 00:05:57,787 --> 00:06:02,242 pre-training, they're teaching the model how to be more effective with the information it does have. 69 00:06:02,242 --> 00:06:04,427 And then the reasoners are very good. 70 00:06:04,427 --> 00:06:11,540 anything that they're able to do to add this reasoning capability is definitely enhancing the answers. 71 00:06:11,540 --> 00:06:12,080 Yeah. 72 00:06:12,080 --> 00:06:17,180 And there's, there's so, there's, there's so much movement in the space. 73 00:06:17,180 --> 00:06:20,760 can't even keep up and I use it all the time. 74 00:06:20,760 --> 00:06:28,500 Like, I don't know, five, seven, 10 times a day, but you know, you've got grok three, you've got Claude 3.7. 75 00:06:28,500 --> 00:06:38,040 You've now got, um, Oh three mini, uh, 4.5 apparently five or five GPT five is on the way. 76 00:06:38,040 --> 00:06:40,532 Um, you know, there's deep seek. 77 00:06:40,532 --> 00:06:44,155 There's whatever Alibaba's model is. 78 00:06:44,155 --> 00:06:55,445 mean, there's Mistral, there's Llama, like it's impossible to keep up unless you're doing this full time, which, you know, I'm not. 79 00:06:55,846 --> 00:07:03,893 so I looked at some of the tests that you threw at 03mini and I thought it did really well. 80 00:07:03,893 --> 00:07:08,040 didn't, I just kind of breeze through it, but why don't you kind of tell us some of the... 81 00:07:08,040 --> 00:07:11,002 some of the tests you threw at it and how it performed. 82 00:07:11,470 --> 00:07:12,070 Yeah, yeah. 83 00:07:12,070 --> 00:07:22,410 What I was trying to do is get a sense of how strong is the model for the types of things that people are challenging to say AI is just not able to do these fairly simple tasks. 84 00:07:22,410 --> 00:07:35,110 And so I ran through a handful of examples, one being let's find a case that was not in the training set and let's go have it find the case text online and then give us a full 85 00:07:35,110 --> 00:07:39,758 summary of like the holding and the material facts and so forth and give us legal analysis. 86 00:07:39,758 --> 00:07:43,840 I think that was one people were concerned AI is just not capable of doing that. 87 00:07:44,201 --> 00:07:45,298 I read it. 88 00:07:45,298 --> 00:07:48,763 I thought it did a fantastic job summarizing the case. 89 00:07:48,963 --> 00:07:58,113 I gave it some questions like solve complex numerical problems that also deal with linguistics that are hard to even understand the question being asked. 90 00:07:58,113 --> 00:07:59,329 It did well there. 91 00:07:59,329 --> 00:08:01,518 It does well in constrained poetry. 92 00:08:01,518 --> 00:08:07,746 It did well on just I was surprised it did well on world model questions where I basically had it. 93 00:08:07,746 --> 00:08:16,920 run a scenario where I'm like dumping marbles out of a container and then putting super glue in and then moving them around the house and seeing where they end up and what 94 00:08:16,920 --> 00:08:18,030 walking through the steps. 95 00:08:18,030 --> 00:08:26,134 And it did pretty well on pretty much all of those things to where my point of view is between that and GBD 4.5. 96 00:08:26,134 --> 00:08:36,318 Now you pretty much have something that can reason like a smart human can reason and it can help in a pretty wide variety of ways from a chat window. 97 00:08:36,318 --> 00:08:37,622 There's still some 98 00:08:37,622 --> 00:08:45,566 issues where these tools don't have full capabilities that a human would have outside of the chat window where we can pull additional resources. 99 00:08:45,566 --> 00:08:52,904 But if you're resource constrained and you're just talking to somebody intelligent, it's going to be pretty similar to what these models can do now. 100 00:08:52,904 --> 00:08:53,664 Yeah. 101 00:08:53,664 --> 00:08:59,427 And you and I kicked around the Stanford paper too, which at this point is almost a year old. 102 00:08:59,427 --> 00:09:01,728 It's actually over a year old from their first iteration. 103 00:09:01,728 --> 00:09:06,990 They did a subsequent adjustment and re-release in, I think May of last year. 104 00:09:06,990 --> 00:09:16,324 But some of the challenges that the Stanford paper highlighted was, you know, they categorized too many things as hallucinations in my opinion. 105 00:09:16,324 --> 00:09:20,786 But I think overall, I got a lot of insight from reading the paper. 106 00:09:20,840 --> 00:09:29,093 but that AI misunderstands holdings, it has trouble distinguishing between legal actors, it has difficulty respecting the order of authority. 107 00:09:29,093 --> 00:09:37,264 Did you, it fabricates, did you, do you feel like these specific issues have gotten better? 108 00:09:39,142 --> 00:09:40,623 They've gotten better. 109 00:09:40,843 --> 00:09:53,288 There's tests on hallucination rates for the different models, and the reasoners are about half the hallucination rates of GPT-4, and in GPT-4.5 is also about half the hallucination 110 00:09:53,288 --> 00:09:54,888 rate of GPT-4. 111 00:09:54,949 --> 00:09:58,710 That said, hallucinations are still an issue for these models. 112 00:09:58,790 --> 00:10:03,552 Legal tech companies can solve those issues, and this is where domain-specific software comes in. 113 00:10:03,552 --> 00:10:06,924 There's different algorithms you can run to help there. 114 00:10:06,924 --> 00:10:08,128 For example, 115 00:10:08,128 --> 00:10:13,399 Let's say that you have an issue that you tend to hallucinate cases out of the LLMs. 116 00:10:13,399 --> 00:10:15,640 Well, I think everyone's kind of solved the problem now. 117 00:10:15,640 --> 00:10:26,443 We've been doing this for a long time where you take the case from the LLM, you have a list out of the relevant cases, and then you have a secondary external data source that 118 00:10:26,443 --> 00:10:29,544 has a list of all the cases that you have API access to. 119 00:10:29,544 --> 00:10:33,625 And then you check the Blue Book citation to say, is this a real case or not? 120 00:10:33,625 --> 00:10:37,974 And then if it is, let's go check relevancy to ensure this is relevant to the answer. 121 00:10:37,974 --> 00:10:39,865 And if you get two checks, you say, OK, good. 122 00:10:39,865 --> 00:10:41,207 This is a real case. 123 00:10:41,207 --> 00:10:42,037 It's relevant. 124 00:10:42,037 --> 00:10:46,500 This is going to be passed on to the user, and they're going to be able to access that case. 125 00:10:46,601 --> 00:10:51,845 so this is where, yeah, it's true that I think they had a good insight. 126 00:10:51,845 --> 00:11:01,553 LLMs will continue to hallucinate and cause problems, but LLM software that has domain-specific engineering on top of it can solve those issues. 127 00:11:01,553 --> 00:11:03,672 And then the other one being 128 00:11:03,672 --> 00:11:08,805 Hey, can LLMs actually like pull out the legal actors and how can they figure out the holdings? 129 00:11:08,805 --> 00:11:19,402 That one, I think pretty well now with the top models, they're able to understand pretty well the holdings and you can test it yourself and see whether that's true empirically 130 00:11:19,402 --> 00:11:21,193 yourself pretty easily. 131 00:11:21,194 --> 00:11:27,397 In all my tests, and we use this a lot and do a lot of evaluations, they're quite good for the top models now. 132 00:11:27,412 --> 00:11:29,723 What about respecting the order of authority? 133 00:11:30,814 --> 00:11:32,755 That's another one that they get. 134 00:11:32,755 --> 00:11:37,158 You might have to prompt engineer it a bit, and this is where the domain software comes in again. 135 00:11:37,158 --> 00:11:46,244 But if you prompt engineer it well, it fully understands that the Supreme Court is superior to a state Supreme Court, the US Supreme Court versus state Supreme Court. 136 00:11:46,244 --> 00:11:51,947 It fully understands that that court's superior to a trial court and so forth. 137 00:11:52,567 --> 00:11:57,320 We use this every day, and this is something that it's able to do very consistently. 138 00:11:57,621 --> 00:12:04,902 So how would you assess the current state of AI capabilities in legal research and analysis as we sit today? 139 00:12:05,390 --> 00:12:09,230 Yeah, I would say raw LLMs out of the box, quite bad. 140 00:12:09,230 --> 00:12:12,010 I wouldn't use it for legal research. 141 00:12:12,010 --> 00:12:14,590 And again, they're going to hallucinate everything. 142 00:12:14,590 --> 00:12:16,590 They're going to miss some insights. 143 00:12:16,590 --> 00:12:25,770 They're going to have instances where, because they don't have in the pre-training data the full knowledge about all the cases and all the statute for that state, they're going 144 00:12:25,770 --> 00:12:32,830 to take majority rules and assume that those are right for that state, even though your state might be playing with them or using a minority rule. 145 00:12:33,030 --> 00:12:35,052 And so there's going to be a bunch of issues. 146 00:12:35,052 --> 00:12:38,044 You'll get kind of poorly formatted responses. 147 00:12:38,044 --> 00:12:41,976 If you ask it to draft a full brief, it'll give you like two pages. 148 00:12:42,057 --> 00:12:44,318 All of those things are problems. 149 00:12:44,438 --> 00:12:51,603 If you use good domain specific software, though, these are all engineering problems that are solvable by the legal tech companies. 150 00:12:51,603 --> 00:12:56,466 And a lot of us have started to or very substantially solve those issues. 151 00:12:56,827 --> 00:13:01,550 And so if you use good software, you can expect an extensive 152 00:13:01,550 --> 00:13:07,392 30 page brief, no case hallucinations, hopefully no holding hallucinations. 153 00:13:07,572 --> 00:13:10,094 You can expect that it's properly formatted. 154 00:13:10,094 --> 00:13:20,848 You can expect that it does go into the details regarding the state law that it's in question, just looking at the legal authorities to pull out the insights so that it's not 155 00:13:20,848 --> 00:13:22,298 just relying on majority rules. 156 00:13:22,298 --> 00:13:25,640 So all of those things that I think good software is able to do. 157 00:13:25,640 --> 00:13:31,446 That said, I would strongly suggest that we focus on software that keeps the attorney in the loop. 158 00:13:31,446 --> 00:13:34,929 and gets the, lets the lawyer audit the output. 159 00:13:34,929 --> 00:13:39,092 So I wouldn't want just, hey, here's the full, here's my fact pattern. 160 00:13:39,092 --> 00:13:44,295 I'm just going to let the AI often go off and run and just draft a full 30 page brief. 161 00:13:44,295 --> 00:13:46,997 I don't think that's a good solution right now. 162 00:13:46,997 --> 00:13:56,385 I think what the AI does well is it synthesizes large amounts of information, bubbles them up to the lawyer and probably gets it right the vast majority of the time, but the lawyer 163 00:13:56,385 --> 00:13:59,660 is still going to make the judgment call about which direction to go. 164 00:13:59,660 --> 00:14:01,353 And then the lawyer says, yes, go here. 165 00:14:01,353 --> 00:14:03,237 Don't pursue this and so forth. 166 00:14:03,237 --> 00:14:07,385 And then you work together with the AI to get a great answer very fast. 167 00:14:07,385 --> 00:14:10,900 And that's where I would say the focus should be. 168 00:14:11,304 --> 00:14:13,916 Yeah, and I have seen it's been a couple of months. 169 00:14:13,916 --> 00:14:14,947 I think it was late last year. 170 00:14:14,947 --> 00:14:30,037 I saw a chart of the amount of, gosh, guess comprehension, I guess, for lack of a better term of different sized rag prompts. 171 00:14:30,037 --> 00:14:33,520 So, and it trails off dramatically. 172 00:14:33,520 --> 00:14:39,476 The larger, you know, like there's some like, like, Gemini two has a 173 00:14:39,476 --> 00:14:41,876 a 1 million token context window, right? 174 00:14:41,876 --> 00:14:43,656 Which is pretty significant. 175 00:14:43,656 --> 00:14:47,156 I think Claude is a couple hundred thousand GPT a little lower. 176 00:14:47,156 --> 00:14:48,396 These are always moving. 177 00:14:48,396 --> 00:14:51,656 I'm, I'm, I'm, I might not be the current state of things. 178 00:14:51,656 --> 00:14:51,796 Yeah. 179 00:14:51,796 --> 00:14:52,656 Yeah. 180 00:14:52,656 --> 00:15:03,436 Um, but I, I saw a kind of a performance metric that through large amounts of documents through rag at these models. 181 00:15:03,436 --> 00:15:08,404 And it trailed off pretty substantially in terms of missing, you know, like 182 00:15:08,404 --> 00:15:16,375 key facts during summarization as the number of tokens increased. 183 00:15:16,375 --> 00:15:19,770 Are we getting any better there or is that still a limitation? 184 00:15:20,024 --> 00:15:23,096 Getting better is a limitation, but it can be engineered around. 185 00:15:23,096 --> 00:15:24,257 This is another one. 186 00:15:24,257 --> 00:15:31,641 What we've had to do this where client has say a 40 page document and it may be a hundred 40 page documents. 187 00:15:31,641 --> 00:15:34,783 And we're trying for each one to pull out all the payment terms. 188 00:15:34,783 --> 00:15:43,268 This is an area where out of the box, LLMs do pretty poorly without a tremendous amount of prompt engineering and kind of just general engineering. 189 00:15:43,268 --> 00:15:46,786 So what we need to do is break down the problems to where. 190 00:15:46,786 --> 00:15:55,038 we're only pushing through like a page at a time, maybe a little bit more than that, giving it enough context and then giving very detailed prompts on exactly what to look for 191 00:15:55,038 --> 00:15:56,550 and what not to look for. 192 00:15:56,590 --> 00:16:01,312 You have to do all of that in a pretty domain specific way to get good answers. 193 00:16:01,312 --> 00:16:07,915 And so I think if you're just using a raw LLM without a lot of engineering work, they're not gonna do very well here. 194 00:16:10,116 --> 00:16:11,056 Chunking's a big part of that. 195 00:16:11,056 --> 00:16:15,098 Yeah, you'll chunk the paper and then run in parallel. 196 00:16:15,310 --> 00:16:22,978 like a hundred different agents basically to each have their one page to review and then summarize in groups. 197 00:16:23,142 --> 00:16:24,312 Interesting. 198 00:16:24,413 --> 00:16:24,723 Yeah. 199 00:16:24,723 --> 00:16:37,703 You know, another challenge just as a, I'm not, I don't consider myself an AI expert, more of an enthusiast, but I, I, I really do put, um, AI through its paces on real world stuff 200 00:16:37,703 --> 00:16:41,272 mostly and find a huge variation. 201 00:16:41,272 --> 00:16:43,537 Um, I also find it quite confusing. 202 00:16:43,537 --> 00:16:48,460 All the fragmentation, like just within the open AI world, just the number. 203 00:16:48,460 --> 00:16:51,006 And I know five is supposed to solve that. 204 00:16:51,006 --> 00:17:00,661 but I'm still really curious how that's gonna, how that's gonna work because you know, today, I mean, what do you have eight drop down? 205 00:17:00,661 --> 00:17:01,261 You know what I mean? 206 00:17:01,261 --> 00:17:03,822 Like eight models you can choose from. 207 00:17:04,483 --> 00:17:05,963 that's, that's a big impediment. 208 00:17:05,963 --> 00:17:15,748 Somebody who pays really close attention to this still, I still don't have a firm handle on, you know, when to use what, um, it seems like a moving target. 209 00:17:16,526 --> 00:17:19,707 Yeah, and it's pretty tough for a casual user. 210 00:17:19,707 --> 00:17:24,169 You're far, far more knowledgeable about this than the average user. 211 00:17:25,730 --> 00:17:30,512 Basically, the factors you need to consider are A, how fast do need a response? 212 00:17:30,512 --> 00:17:33,413 B, what's the context window that I need? 213 00:17:33,593 --> 00:17:37,135 C, do I need a model with a lot of pre-training data? 214 00:17:37,135 --> 00:17:43,137 So as in, it has a lot of knowledge that I need to pull from, or do I need something more that's reasoning well? 215 00:17:43,137 --> 00:17:45,720 And based on those factors, you can choose the right model. 216 00:17:45,720 --> 00:17:53,737 But I'm in this every day and this is my business and so I'm familiar for your casual user, you have no idea which one to use. 217 00:17:53,737 --> 00:18:02,914 And yeah, GPT-5 though will help with that to where it's going to basically just figure out your question and then suggest the best model internally and then just give you that 218 00:18:02,914 --> 00:18:05,386 best model without you needing to even think about it. 219 00:18:05,386 --> 00:18:15,016 I think ironically, a lot of the time GPT-5 will be basically GPT-4 where they're just gonna say, well, GPT-4 is good enough to answer this, go ahead and move forward. 220 00:18:15,016 --> 00:18:17,789 because most questions that gets are actually pretty easy. 221 00:18:17,789 --> 00:18:22,272 There's a handful of hard questions that people push on every now and then. 222 00:18:22,353 --> 00:18:31,220 That said, again, I would stress that the legal tech groups are a lot better for solving domain-specific tasks than these models are anyway. 223 00:18:31,220 --> 00:18:37,426 And what's happening is basically we're standing on top of the best models for the specific task we're working on. 224 00:18:37,426 --> 00:18:43,020 We're choosing the best one, knowing exactly the that the tool that we need to use. 225 00:18:43,214 --> 00:18:52,730 And oftentimes we're using a combination of two or three, sometimes even from different groups, to where that combination, plus a lot of prompt engineering and other engineering 226 00:18:52,730 --> 00:18:55,761 on top of it, can yield pretty good results. 227 00:18:56,242 --> 00:19:07,748 And I think what you'll see is, in general, the legal tech companies are going to be about two years ahead of the raw LLMs, as far as their ability to practice law more or less, or 228 00:19:07,748 --> 00:19:11,850 support someone who's practicing law to be an amplifier of that person. 229 00:19:12,003 --> 00:19:23,766 And so in general, and I don't think it's very user friendly just to work from a chat window versus a nice template that's easy to follow just like a webpage. 230 00:19:24,210 --> 00:19:24,801 Yeah. 231 00:19:24,801 --> 00:19:32,266 So the model selection is that has to be done algorithmically, correct? 232 00:19:32,487 --> 00:19:36,520 what does the process look like for selecting the right model? 233 00:19:36,520 --> 00:19:38,412 Just maybe in how you do it. 234 00:19:38,412 --> 00:19:40,322 I think OpenAI is somewhat opaque. 235 00:19:40,322 --> 00:19:43,085 I'm not sure that they provide transparency around that. 236 00:19:43,085 --> 00:19:49,180 But just in broad brushstrokes, like, how does it determine which path to take? 237 00:19:49,870 --> 00:19:55,210 Yeah, for us, we decide, we don't do it in a full algorithm way. 238 00:19:55,210 --> 00:20:00,750 We have across our app probably 100, 200 different API calls. 239 00:20:00,750 --> 00:20:05,870 And for each one of those, we have a general view on, is this going to need speed? 240 00:20:05,870 --> 00:20:09,810 Is it going to need the ability to instruction follow really well? 241 00:20:09,810 --> 00:20:13,730 Is it going to need high pre-training knowledge and so forth? 242 00:20:13,730 --> 00:20:18,702 And then based on those factors, we'll say it's probably one of these three models that we should use. 243 00:20:18,702 --> 00:20:26,616 and then we run evals and anywhere important to say, okay, let's actually see what score these models get on our evaluations. 244 00:20:26,616 --> 00:20:40,634 And so that could be an evaluation, for instance, of how many cases are they returning that are accurate out of the, where we'll try to kind of do a full analysis on, okay, 245 00:20:40,634 --> 00:20:42,855 here's an evaluation question. 246 00:20:42,855 --> 00:20:47,822 Let's have real attorneys do the work and figure out what cases you would want to cite. 247 00:20:47,822 --> 00:20:55,782 And then as you figured out what cases you want to cite, we're going to score these cases to say, this is like a five, this case is a three, this is a one. 248 00:20:55,782 --> 00:21:03,262 And as far as importance, now let's have all the models do their work and give the case that they think are most relevant and we're going to score those. 249 00:21:03,262 --> 00:21:10,862 So we have a lot of those automations in place and then whenever a new model comes out, we just run it through the system of tests and say, okay, it's going to be good here, here 250 00:21:10,862 --> 00:21:11,422 and here. 251 00:21:11,422 --> 00:21:12,742 It's not going to be very good here. 252 00:21:12,742 --> 00:21:14,790 And we can move forward that way. 253 00:21:15,218 --> 00:21:15,668 Interesting. 254 00:21:15,668 --> 00:21:22,781 Yeah, it seems like, you know, finding the right balance between speed and quality is the sweet spot, right? 255 00:21:22,781 --> 00:21:31,535 You can't slow the process down too much or you're going to impact efficiency, but you need to, it's striking that balance. 256 00:21:31,535 --> 00:21:34,766 It seems like is the, is the strategy, is that accurate? 257 00:21:35,148 --> 00:21:37,439 Yeah, it's a fun challenge. 258 00:21:37,638 --> 00:21:47,382 A lot of what we do is we'll have a seven part workflow, for instance, and when the user does step two, we're kicking off a pretty slow model that's really smart. 259 00:21:47,382 --> 00:21:53,843 And then when they get to step six, that slow model is done with the analysis, and then it's inserting the answer for the user. 260 00:21:53,843 --> 00:21:59,805 Then it's done all that work in the background while they're filling out other information that's not as relevant to the answer. 261 00:22:00,025 --> 00:22:02,106 And so we do a lot of that. 262 00:22:02,166 --> 00:22:02,732 then 263 00:22:02,732 --> 00:22:06,684 Sometimes you just use the fast model because it's a fairly easy answer. 264 00:22:06,764 --> 00:22:07,955 So we'll do some of that. 265 00:22:07,955 --> 00:22:19,521 And it's just an interesting game of how do we think about the legal implications, how do we think about the AI driven implications and the technology implications, and then how do 266 00:22:19,521 --> 00:22:25,384 we think about a good user experience and pair all that together to give something that makes sense cohesively. 267 00:22:25,396 --> 00:22:26,096 Yeah. 268 00:22:26,096 --> 00:22:29,596 You know, recently I've seen interesting benchmarks. 269 00:22:29,596 --> 00:22:32,596 Was it Val's AI that put together? 270 00:22:32,596 --> 00:22:33,576 I'm not sure if you've seen it. 271 00:22:33,576 --> 00:22:51,216 It's just been maybe in the last week or so that, I don't know if it was a benchmark or a study that talked about real scenarios, legal workflows in which, measured efficiency. 272 00:22:51,436 --> 00:22:52,850 Again, there's so much stuff. 273 00:22:52,850 --> 00:22:54,411 flying at you these days. 274 00:22:54,411 --> 00:23:05,207 don't have it memorized, but it seems like there's more of a focus now on legal specific use cases and how these models perform in those scenarios. 275 00:23:05,247 --> 00:23:07,378 Are you seeing more of that now? 276 00:23:07,922 --> 00:23:09,263 I need to check out that study. 277 00:23:09,263 --> 00:23:11,725 actually haven't seen it. 278 00:23:11,725 --> 00:23:15,009 We love the idea of doing more legal benchmarks. 279 00:23:15,030 --> 00:23:21,136 That's an area where we've really taken a lot of time to try to build a tool that's useful from that perspective. 280 00:23:21,136 --> 00:23:25,280 And I think it's useful to the end user as well. 281 00:23:25,301 --> 00:23:27,422 But no, I haven't seen that specific study. 282 00:23:27,422 --> 00:23:30,836 I do like the idea though of pursuing that. 283 00:23:31,775 --> 00:23:37,650 Yeah, it's, it's this stuff is March four, three days ago, um, is the post I saw on it. 284 00:23:37,650 --> 00:23:39,761 And again, it's just so hard to keep up with. 285 00:23:39,761 --> 00:23:45,216 And there's so much that even, you know, after you read it, three more things fly at you. 286 00:23:45,216 --> 00:23:52,962 it's like, so what about, um, AI strategies in general for law firms? 287 00:23:52,962 --> 00:24:00,888 So, you know, I, I have been critical of law firms that seem to 288 00:24:01,170 --> 00:24:07,614 immediately deploy tactically versus figuring out strategically what they want to do. 289 00:24:07,614 --> 00:24:09,986 And strategy includes a lot of different things. 290 00:24:09,986 --> 00:24:15,059 It can include where to first focus your AI efforts. 291 00:24:15,059 --> 00:24:21,714 It could include the organizational design within the firm that's going to support those efforts. 292 00:24:22,395 --> 00:24:27,088 It can define the risk tolerance that the firm is willing to take. 293 00:24:27,088 --> 00:24:30,610 know, cause we still have, you know, we still have 294 00:24:30,610 --> 00:24:35,342 I saw an interesting study from the legal value network. 295 00:24:35,342 --> 00:24:39,484 They do an LPM survey every year. 296 00:24:39,484 --> 00:24:43,905 one question that was asked was what percentage of your clients? 297 00:24:44,426 --> 00:24:47,747 So they talked to, think 80 law firm GCs. 298 00:24:48,608 --> 00:24:57,431 And what percentage of your clients either discourage or prohibit the use of AI in their matters? 299 00:24:57,431 --> 00:24:59,592 And the number was 42%. 300 00:24:59,592 --> 00:25:10,258 which seems shockingly high because I saw another study from the Blikstein group that it's the LDO law department. 301 00:25:10,258 --> 00:25:12,300 I forget what the acronym stands for. 302 00:25:12,300 --> 00:25:23,766 And anyway, almost 60 % of the law firm client GCs that they talked to said that law firms aren't using technology enough to drive down costs. 303 00:25:23,766 --> 00:25:26,378 And those are two very conflicting data points. 304 00:25:26,378 --> 00:25:27,004 It's like, 305 00:25:27,004 --> 00:25:33,244 OK, you want me to drive down costs, but you got OCG's that prevent me from implementing the technology. 306 00:25:33,244 --> 00:25:39,738 I can't use them on your matters like I don't know what you feel like that's a disconnect in the marketplace still. 307 00:25:40,152 --> 00:25:41,863 I think it's very bimodal. 308 00:25:41,863 --> 00:25:52,911 I think that you have a lot of attorneys on one side or the other where some really want to embrace the newest technology all in and others are very cautious about it. 309 00:25:52,911 --> 00:25:57,534 And there's not as many groups in the middle as you'd expect. 310 00:25:57,534 --> 00:25:59,795 And so it's not like your normal bell curve. 311 00:25:59,936 --> 00:26:01,957 And so I think that's what's going on. 312 00:26:01,957 --> 00:26:09,474 And I think the organizational strategy and kind of transformation lens is a really tough and interesting question for 313 00:26:09,474 --> 00:26:11,715 organizational leaders to think about. 314 00:26:11,715 --> 00:26:14,436 I think we probably disagree a little bit on this one. 315 00:26:14,436 --> 00:26:24,880 I have more of an engineering mindset on it where I think the way to go is start small and iterate and then run in parallel your strategy. 316 00:26:25,500 --> 00:26:35,765 We've just seen so many instances where a company really wants to get into AI, they're strategizing about it and a year later they haven't really done anything and they don't 317 00:26:35,765 --> 00:26:37,560 really get it because they have 318 00:26:37,560 --> 00:26:41,113 They're senior leaders doing strategy stuff without being very hands-on. 319 00:26:41,113 --> 00:26:42,774 They don't really get it. 320 00:26:42,774 --> 00:26:53,582 I think if you take the time to have a small group of people that are really invested in using AI every day, try out some leading tools, go in the right direction. 321 00:26:53,582 --> 00:26:55,363 Don't do anything just crazy. 322 00:26:55,363 --> 00:26:57,875 And then just don't put in any client information. 323 00:26:57,875 --> 00:27:04,640 Just do everything based on just very kind of random or kind of synthesized or sanitized data. 324 00:27:04,666 --> 00:27:10,570 I think if you do that, you can get a pretty good sense of, now I get what people are using this for. 325 00:27:10,570 --> 00:27:15,773 We could use it here, here, and here, but I can't use it in this area because it's going to have issues. 326 00:27:15,773 --> 00:27:20,796 Or this is decent software in this way, but not in this other way. 327 00:27:20,796 --> 00:27:23,358 Now I can make informed strategic decisions. 328 00:27:23,358 --> 00:27:29,161 I think that if you kind of do that pairing, that's probably what I think would be the best approach. 329 00:27:29,202 --> 00:27:29,482 Yeah. 330 00:27:29,482 --> 00:27:31,403 Well, we're, we're aligned on part of that. 331 00:27:31,403 --> 00:27:43,798 So I think that striking the right risk reward balance is key and, that should be the number one, um, driver of the approach. 332 00:27:43,838 --> 00:27:44,398 Right. 333 00:27:44,398 --> 00:27:54,863 I think that jumping right in on the practice side and, know, going whole hog with attorneys who have super high opportunity costs and low tolerance for missteps is a 334 00:27:54,863 --> 00:27:55,643 mistake. 335 00:27:55,643 --> 00:27:56,943 So we're aligned on that. 336 00:27:56,943 --> 00:27:58,549 I guess where I get hung up, 337 00:27:58,549 --> 00:28:03,480 is that, I'm going to quote another study here, or survey. 338 00:28:03,480 --> 00:28:13,753 Thomson Reuters did one, the professional services, gen AI survey came out late last year and only 10 % of law firms, one out of 10 have a gen AI policy. 339 00:28:13,913 --> 00:28:19,114 So in order to write a policy, I think you need a strategy first, right? 340 00:28:19,114 --> 00:28:26,674 A policy is an outcome, is an outgrowth of a strategy, but nine out of 10 don't have one. 341 00:28:26,674 --> 00:28:37,690 So what you have now is law firm users who don't have proper guidance on, hey, what can I use the public models for? 342 00:28:37,690 --> 00:28:38,691 Can I use them at all? 343 00:28:38,691 --> 00:28:40,412 Do I use it on my phone? 344 00:28:40,412 --> 00:28:44,254 What do I use it my personal laptop when I'm not connected to the VPN? 345 00:28:44,254 --> 00:28:49,196 Like all of those questions not being answered, I think creates unnecessary risk. 346 00:28:49,196 --> 00:28:56,370 Maybe at a certain, you know, altitude defining the strategy and incrementally 347 00:28:56,370 --> 00:29:00,529 working your way down more granularly, maybe that's the right balance. 348 00:29:00,684 --> 00:29:01,935 I think we're in sync there. 349 00:29:01,935 --> 00:29:07,059 think it's crazy that you wouldn't have a GNI policy at this point. 350 00:29:07,059 --> 00:29:12,604 think our company at Direct TV, I think we had one three months in after ChatGBT. 351 00:29:12,604 --> 00:29:20,371 I was on the executive board there and we thought just immediately we have to give the company employees something to give some guidance. 352 00:29:20,371 --> 00:29:22,132 And yeah, I think you're exactly right. 353 00:29:22,132 --> 00:29:29,578 You start high, you make it a little bit overly restrictive, then you dig into the details and you realize, okay, here's where we can open up. 354 00:29:29,622 --> 00:29:38,376 a little bit more, here's where we can be a little bit less or more forgiving on the use of the tools and just be smart about that. 355 00:29:38,858 --> 00:29:44,415 But yeah, if you're working in a law firm, you don't have a strategy, I think you definitely should start working on that right away. 356 00:29:44,415 --> 00:29:45,125 Yeah. 357 00:29:45,125 --> 00:29:55,458 And what do you, what do you think about this is another, you know, opinion of mine, some may agree, disagree, but I see a lot of law firm C-suite and director level roles, both on 358 00:29:55,458 --> 00:30:06,651 the innovation and AI world that are brought in without any sort of strategy and essentially brought in to just kind of figure it out. 359 00:30:06,731 --> 00:30:11,922 And normally I like an agile approach, but the problem with 360 00:30:12,264 --> 00:30:28,148 this approach in law firms is there is those resources are typically not sufficiently empowered to make change and law firm decision making is so friction heavy that it feels 361 00:30:28,148 --> 00:30:31,329 like you're setting these leaders up. 362 00:30:31,449 --> 00:30:36,091 You're not setting them up for success if because the tone has to be set at the top, right? 363 00:30:36,091 --> 00:30:40,472 Again, around risk taking around where they want to. 364 00:30:41,556 --> 00:30:44,436 add value within the business. 365 00:30:45,516 --> 00:30:56,096 know, just all of these things that need to happen at the senior, at the most senior level and bringing somebody in, even if it's a C-suite, but at the director level, like, do 366 00:30:56,096 --> 00:31:01,896 really think this person's going to have the political capital to make recommendations and those get it? 367 00:31:01,896 --> 00:31:03,136 How long is that going to take? 368 00:31:03,136 --> 00:31:06,376 Like they'll be there three years before anything gets, I don't know. 369 00:31:06,376 --> 00:31:08,776 Do you have any thoughts on the sequence? 370 00:31:09,452 --> 00:31:10,442 A couple thoughts. 371 00:31:10,442 --> 00:31:20,055 I think it's a tough problem for one in the fact that you have usually have a lot of partners and managing partners that are making decisions, decisions collectively. 372 00:31:20,055 --> 00:31:24,906 That's just inherently harder to kind of move the ship and all that. 373 00:31:25,026 --> 00:31:35,499 That said, I would say when we speak with most of the senior leaders at firms, I don't think they're that deep on what's possible with TENAI, how the value of it is very 374 00:31:35,499 --> 00:31:37,450 specific some implementation any of that. 375 00:31:37,450 --> 00:31:38,850 What I'd recommend is 376 00:31:38,850 --> 00:31:50,098 Think about the core values that you care about, like risk versus the impact to your business from an acceleration perspective, or the ability to add more insight, and all 377 00:31:50,098 --> 00:31:54,441 those high level values with maybe confidentiality and security and all that. 378 00:31:54,441 --> 00:32:00,726 And just in a very general sense, align at the highest level on what trade-offs you wanna make. 379 00:32:00,726 --> 00:32:06,562 And then once you have that general view, then empower somebody who is 380 00:32:06,562 --> 00:32:16,665 very knowledgeable in the area to give very specific recommendations of, given what you said from a value standpoint, here's how we can implement an end-to-end strategy around 381 00:32:16,665 --> 00:32:21,626 Gen.AI that makes sense and is aligned with what you're guiding me on. 382 00:32:21,626 --> 00:32:32,615 And then I think in parallel, I would really try to have some subset of users be very engaged in using a tool and getting a good sense and getting learnings from that and 383 00:32:32,615 --> 00:32:35,630 having the groups present jointly. 384 00:32:35,795 --> 00:32:37,258 to the managing partners. 385 00:32:37,258 --> 00:32:39,723 I think that's probably a good recipe for success. 386 00:32:39,804 --> 00:32:40,324 Yeah. 387 00:32:40,324 --> 00:32:54,728 And, know, I have advocated for bringing consultants in for that part of the journey, just because I worry that bringing in, you know, again, a director level role to manage this, 388 00:32:54,728 --> 00:33:05,311 um, is just a tough, a tougher sell than if, you know, the executive committee brings in consultants who, and you know what, there's a gap in the marketplace right now. 389 00:33:05,311 --> 00:33:08,776 There's not people like you who really know this stuff. 390 00:33:08,776 --> 00:33:10,677 are sitting in a seat like yours. 391 00:33:10,677 --> 00:33:23,704 There's so much capital being deployed in this area of tech that if you have these skillsets, going out and selling your time hourly is not the best way to capture economic 392 00:33:23,704 --> 00:33:24,465 value. 393 00:33:24,465 --> 00:33:27,276 It's to do something like you're doing with a startup. 394 00:33:27,276 --> 00:33:34,190 And as a result, I think there's a big gap in the consulting world with people who really know their stuff. 395 00:33:34,190 --> 00:33:36,731 So I do sympathize. 396 00:33:36,731 --> 00:33:38,365 Do you see that gap? 397 00:33:38,365 --> 00:33:39,354 as well. 398 00:33:40,280 --> 00:33:41,780 think we're aligned there. 399 00:33:41,780 --> 00:33:44,811 It's a really tough problem for law firms because of that. 400 00:33:45,252 --> 00:33:56,875 I mean, one thing you could try to do is work with a leader of a vendor and just say, hey, look, I can't use your software, but we'd love to form a longer term relationship over 401 00:33:56,875 --> 00:33:57,615 time. 402 00:33:57,615 --> 00:34:06,138 And can you just give us some general guidance on how we can be effective and know that that person is going be a little bit biased is what one thing you can do. 403 00:34:07,375 --> 00:34:16,683 I do think that trying to find the right consultant, that there are some out there that you might be able to find one, but it's tough and you might need to just rely on finding 404 00:34:16,683 --> 00:34:23,248 your most tech forward partner to take a lead position and say, hey, you've got to get really deep on this stuff. 405 00:34:23,248 --> 00:34:35,798 And I think one thing you need to be cautious about is if you find someone who's not very kind of forward from a transformation perspective, they're going to move very slowly. 406 00:34:35,798 --> 00:34:40,825 relative to somebody who's just like, hey, we need to stop everything and figure out how to do this effectively. 407 00:34:40,825 --> 00:34:44,570 That person's gonna have enough friction thrown at them to slow them down anyway. 408 00:34:44,570 --> 00:34:46,642 But I would start with someone like that. 409 00:34:46,642 --> 00:34:48,373 Yeah, that makes sense. 410 00:34:48,373 --> 00:34:53,214 lot of partners still have books of business. 411 00:34:53,975 --> 00:34:57,276 It's a tough problem for sure. 412 00:34:57,276 --> 00:34:58,377 No easy answers. 413 00:34:58,377 --> 00:35:06,780 How should law firms think about balancing efficiency gains and the impact to the billable hour? 414 00:35:07,406 --> 00:35:15,586 Yeah, this is one where we get all the time, Okay, maybe someday or today your software is good enough to where you're adding efficiency. 415 00:35:15,586 --> 00:35:17,146 I'm just going to build less, right? 416 00:35:17,146 --> 00:35:19,226 So why do even want this software? 417 00:35:19,686 --> 00:35:20,726 A few thoughts on that. 418 00:35:20,726 --> 00:35:25,366 One, in a lot of cases, attorneys aren't always billing by the billable hour. 419 00:35:25,366 --> 00:35:33,986 It could be contingency, they could be in-house, that it could be doing a cost per X type of model where it's like I'm going to charge you per demand letter I write or something 420 00:35:33,986 --> 00:35:34,670 like that. 421 00:35:34,670 --> 00:35:42,970 For those that do need to do the billable hour, which is the majority of attorneys, my view is that it's kind of like computers. 422 00:35:43,310 --> 00:35:53,610 It's not like after the computer came out 10 years later, were still spending most of their time going to law libraries and manually checking out books and reading through 423 00:35:53,610 --> 00:35:53,950 books. 424 00:35:53,950 --> 00:35:55,810 It's just not as efficient. 425 00:35:56,010 --> 00:35:59,510 What will happen is that the market will all move toward AI. 426 00:35:59,510 --> 00:36:02,606 Then if you're the one laggard who's not using it at all, 427 00:36:02,606 --> 00:36:04,586 it's just going to be pretty obvious. 428 00:36:04,586 --> 00:36:12,666 Groups are going to know about that and they're not going to use you because you produce less legal work than the alternative groups. 429 00:36:13,446 --> 00:36:15,546 And so that's where I see the market going. 430 00:36:15,546 --> 00:36:18,665 The other benefit is lawyers write off a lot of their time. 431 00:36:18,665 --> 00:36:22,866 I mean, if you work a 10 hour day, you might bill six and a half hours on average. 432 00:36:22,866 --> 00:36:29,046 And a lot of that time is because you're doing background legal research work or background work to get up to speed. 433 00:36:29,046 --> 00:36:30,626 does that really well. 434 00:36:30,732 --> 00:36:34,865 your goal as a law firm would probably be to have higher revenue per attorney. 435 00:36:34,865 --> 00:36:38,527 And if attorneys are billing a higher percentage of their time, you're meeting that goal. 436 00:36:38,527 --> 00:36:42,150 So I think that there's a lot of talk about the billable hour. 437 00:36:42,150 --> 00:36:43,931 And I think it's not going away. 438 00:36:43,931 --> 00:36:48,434 Maybe some on the margins, maybe there's some changes. 439 00:36:48,434 --> 00:36:53,678 But I think that lawyers are going to want to be efficient. 440 00:36:53,678 --> 00:36:56,660 And over time, they're going to lean on these tools. 441 00:36:56,660 --> 00:37:00,248 think the hesitancy with AI has been more 442 00:37:00,248 --> 00:37:04,983 There's a lot of traps and a lot of just, this wasn't very good type of outputs. 443 00:37:04,983 --> 00:37:10,918 And I think that the industry is getting pretty close to where those types of issues are going away pretty fast. 444 00:37:10,918 --> 00:37:11,828 Yeah. 445 00:37:12,009 --> 00:37:16,651 Well, what about high value use cases in, in, on the practice side? 446 00:37:16,651 --> 00:37:22,173 Like I know you and I talked about, document review and timeline creation. 447 00:37:22,173 --> 00:37:24,831 And I thought the timeline creation wasn't, was an interesting one. 448 00:37:24,831 --> 00:37:37,030 I I'm not a lawyer and I don't know how often that scenario comes into play, but any thoughts on, know, where, where the high value use cases are within the practice today? 449 00:37:37,474 --> 00:37:49,104 Yeah, the areas that I think are most interesting are where AI can synthesize very large amounts of data and get a pretty much fully accurate answer almost every time. 450 00:37:49,545 --> 00:37:54,250 And so a couple of areas that really make sense, like you mentioned, timelines. 451 00:37:54,250 --> 00:38:03,286 You can ingest all of your documents, like a discovery set that's all relevant documents, throw that into the AI. 452 00:38:03,286 --> 00:38:09,872 at say, hey, pull out all the relevant pieces and create a timeline based on that, and then use that to draft a statement of facts. 453 00:38:09,872 --> 00:38:11,773 That's gonna be pretty good. 454 00:38:11,773 --> 00:38:15,517 And that's not that hard to set up to do really well. 455 00:38:15,517 --> 00:38:26,486 I've been seeing a lot of users use our tool and are just like, wow, this saved a ton of time before I was kind of nervous about letting AI answer legal research questions. 456 00:38:26,486 --> 00:38:29,528 But when I see this, this is super useful. 457 00:38:29,629 --> 00:38:31,660 Very similar concept with doc review. 458 00:38:31,660 --> 00:38:41,753 you can automate your doc review and put in hundreds of thousands of pages of files that AI is looking through to see whether it's relevant based on the context you give it and 459 00:38:41,753 --> 00:38:43,693 based on what you're asking to search for. 460 00:38:43,693 --> 00:38:51,415 And a very high percentage of the actually relevant files will be pulled out, prioritized, and then synthesized in summary. 461 00:38:51,516 --> 00:38:58,898 Those types of tools are extremely useful where they might save thousands of hours of time to get your first pass in doc review. 462 00:39:00,102 --> 00:39:06,428 I would say that the error rate is pretty similar to humans at this point, if for a well-engineered software. 463 00:39:06,428 --> 00:39:18,799 And there's other things that are similar, like you can do where you have to have an expert say, here's all of the information I relied on before I go take the stand. 464 00:39:18,799 --> 00:39:26,446 And can you create the reliance list based on all of these 200 files that I uploaded to your system? 465 00:39:26,446 --> 00:39:36,066 and said, are the pieces I relied on, it might take an attorney 100 hours to build that tool, we'll do that with 100 % precision if it's well engineered, and you've just saved 466 00:39:36,066 --> 00:39:36,906 that time. 467 00:39:36,906 --> 00:39:45,346 So those areas are the ones that I'd say are the most interesting that I would really recommend groups try out for a good vendor. 468 00:39:45,346 --> 00:39:49,666 And then there's others where I think legal research is a really hard problem. 469 00:39:49,666 --> 00:39:55,054 It's the first one we started tackling, but just think about all the decisions that the lawyer needs to make. 470 00:39:55,054 --> 00:39:59,814 when they do legal research, they're thinking about, what kind of, is this a motion to dismiss? 471 00:39:59,814 --> 00:40:01,534 Is it a motion for summary judgment? 472 00:40:01,594 --> 00:40:03,294 Is it a trial? 473 00:40:03,694 --> 00:40:06,354 Am I drawing, doing the original complaint? 474 00:40:06,354 --> 00:40:09,954 I'm gonna have very different cases that I use in all of those scenarios. 475 00:40:09,954 --> 00:40:13,854 I've gotta understand the relevancy of the cases, the procedural posture of the case. 476 00:40:13,854 --> 00:40:22,874 I need to think about whether in that case the court ruled for or against my client or the person that's analogous to my client. 477 00:40:22,874 --> 00:40:24,662 And there's so many factors. 478 00:40:24,662 --> 00:40:32,709 I think we're getting very close to where I feel pretty good about our analysis, but we still want the lawyer heavily in the loop through the process. 479 00:40:32,710 --> 00:40:35,823 But the other areas are just AI just does it really well. 480 00:40:35,823 --> 00:40:37,014 It's not as complicated. 481 00:40:37,014 --> 00:40:45,072 And I definitely recommend you get started on those areas and then dip your feet in some of the peripheral areas like billing or areas where it's a little bit less related to core 482 00:40:45,072 --> 00:40:45,962 work too. 483 00:40:46,494 --> 00:40:54,791 So like in the scenario of like creating a timeline on the surface to me, that doesn't sound like something that requires a point solution. 484 00:40:54,791 --> 00:41:05,530 Like can the general models or I'm not a big fan of copilot at the moment, but do you need a specifically trained platform to do that effectively? 485 00:41:05,998 --> 00:41:08,979 I think eventually the general models will be able to do it. 486 00:41:09,939 --> 00:41:17,642 I don't know that any of the general models can take like 100 different documents being uploaded at once. 487 00:41:17,642 --> 00:41:26,984 if you just use even some of the better ones that instruction followed really well, that have big context windows, they're still gonna miss a lot if you don't do deeper 488 00:41:26,984 --> 00:41:27,534 algorithms. 489 00:41:27,534 --> 00:41:32,826 So for example, for us, we kind of what I mentioned earlier, we're... 490 00:41:32,826 --> 00:41:39,602 If we were to just say, Gemini, you do this pretty well, go ahead and pull out a timeline, they're going to get a lot of it right. 491 00:41:39,602 --> 00:41:40,893 They're going to miss a lot. 492 00:41:40,893 --> 00:41:44,276 If we say, Gemini, we're going to give you one page at a time. 493 00:41:44,276 --> 00:41:45,247 Here's the full context. 494 00:41:45,247 --> 00:41:47,098 Here's exactly what I want you to look for. 495 00:41:47,098 --> 00:41:48,849 And here's the things you might get tripped up on. 496 00:41:48,849 --> 00:41:50,251 Here's how to solve it. 497 00:41:50,251 --> 00:41:51,802 And now go pull these out. 498 00:41:51,802 --> 00:41:53,703 Then we're going to get really good results. 499 00:41:53,844 --> 00:42:00,030 And so maybe in a couple of years, the tools will be very reliable in a general sense to be able to do that. 500 00:42:00,030 --> 00:42:00,842 I think they're 501 00:42:00,842 --> 00:42:06,106 not that the raw LLMs aren't today, but we're not the only group doing timelines. 502 00:42:06,127 --> 00:42:07,668 YXLR does those really well. 503 00:42:07,668 --> 00:42:09,428 I'm sure other groups do as well. 504 00:42:09,428 --> 00:42:09,768 Yeah. 505 00:42:09,768 --> 00:42:15,448 So you need a layer of engineering on top to manage that today. 506 00:42:16,368 --> 00:42:18,168 That's interesting. 507 00:42:20,488 --> 00:42:27,148 What about building trust with AI in the firms? 508 00:42:27,548 --> 00:42:38,208 And this goes deeper than just within the firms, the clients ultimately, as you can see with almost half still discouraging or prohibiting 509 00:42:38,208 --> 00:42:42,374 use, there's still a lack of trust with these tools. 510 00:42:42,374 --> 00:42:44,846 How do we bridge that gap? 511 00:42:45,934 --> 00:42:50,154 The trust gap can come from a few different, for a few different reasons. 512 00:42:50,154 --> 00:42:52,234 So one, it could be a security issue. 513 00:42:52,234 --> 00:42:54,434 Two, it could be a confidentiality issue. 514 00:42:54,434 --> 00:42:57,414 And then three, it could be like an accuracy hallucination issue. 515 00:42:57,414 --> 00:43:03,914 So from a security standpoint, you obviously want to make sure that the model's not training on any information that you share with it. 516 00:43:03,914 --> 00:43:10,334 But most of the tools that are able to satisfy that requirement pretty easy. 517 00:43:10,334 --> 00:43:15,750 And even now, if you're using like a pro or plus account with .gbt, it's doing that as well. 518 00:43:17,123 --> 00:43:21,466 You still have a lot of security holes that can happen for anything in the cloud. 519 00:43:21,466 --> 00:43:25,899 So it's helpful to see that the group is SOC 2 compliant and has that certification. 520 00:43:25,899 --> 00:43:33,294 It's helpful to ensure that the groups following best practice as far as encryption, they're encrypting at rest and in transit. 521 00:43:33,575 --> 00:43:37,578 It's a nice to have, I think, to say that UPI iSrub as well. 522 00:43:37,578 --> 00:43:41,741 That's something you might want to look for if you're extra cautious for something particularly sensitive. 523 00:43:43,694 --> 00:43:49,679 And so that's helping with security and for the most part, confidentiality as well. 524 00:43:49,679 --> 00:43:54,402 You might want to look for groups that set up double encryption or mutual encryption. 525 00:43:54,402 --> 00:44:01,728 So or end to end encryption, where they're able to encrypt the data to where even their engineers can't see your data sets. 526 00:44:01,728 --> 00:44:04,600 That's possible and technically something that you can do. 527 00:44:04,600 --> 00:44:09,693 And so anything that's extremely sensitive, you might want to ask for that. 528 00:44:09,934 --> 00:44:10,695 But 529 00:44:10,695 --> 00:44:17,829 Based on if you do those two things, you should be in a pretty good position to where you're meeting those requirements. 530 00:44:17,829 --> 00:44:28,815 From an accuracy and hallucination standpoint, to me the value is, the way you saw that is, keep the client in the loop, make audit trails, and build things together. 531 00:44:28,876 --> 00:44:39,694 So if you have software that says, okay, these are the cases I used, click the link to see the cases, double click to say, here's the material facts I relied on, here's the... 532 00:44:39,694 --> 00:44:42,594 quotes I relied on to generate this holding, all that stuff. 533 00:44:42,594 --> 00:44:52,254 If your software is able to do that, I think that it's able to really satisfy the concerns that lawyers might have of, hold on, this might not even be real, is this something I can 534 00:44:52,254 --> 00:44:53,514 rely on? 535 00:44:53,514 --> 00:45:00,074 And you want that audit trail to be something that it's much faster to audit than to just do the work from start to finish. 536 00:45:00,422 --> 00:45:00,712 Right. 537 00:45:00,712 --> 00:45:07,116 Because that's always the rub is like, am I really being more efficient if I have to go back and double check everything? 538 00:45:07,116 --> 00:45:12,659 It really impacts the ROI equation when, when you have to do that. 539 00:45:12,659 --> 00:45:15,601 Well, this has been a really good conversation. 540 00:45:15,601 --> 00:45:20,824 I knew it would be just, uh, we we've had some good dialogue in the past. 541 00:45:20,824 --> 00:45:27,258 How do, before we wrap up here, how do people find out more about what you're doing at Caledas legal? 542 00:45:27,662 --> 00:45:37,822 Yeah, check us out at caladisai.com, C-A-L-L-I-D-U-S-A-I.com, or shoot me a message at justin at caladisai.com. 543 00:45:37,822 --> 00:45:39,202 And I really appreciate you having me on. 544 00:45:39,202 --> 00:45:40,602 This was a great talk. 545 00:45:40,602 --> 00:45:41,162 Thanks. 546 00:45:41,162 --> 00:45:42,263 Yeah, absolutely. 547 00:45:42,263 --> 00:45:43,924 All right, have a great weekend. 548 00:45:44,286 --> 00:45:45,507 All right, take care. -->

Read the full transcript Hide transcript

Stay up on the latest innovations in legal technology and knowledge management.

Justin McCallon

Subscribe for Updates

Newsletter

Machine Generated Episode Transcript

Subscribe

Subscribe