In this episode, Ted sits down with Justin McCallon, CEO & Founder at Callidus Legal AI, to discuss how AI is transforming legal workflows and the challenges law firms face in AI adoption. From understanding model selection to building trust in AI-driven legal tools, Justin shares his expertise in leveraging AI to enhance efficiency while maintaining accuracy and compliance. With AI rapidly evolving, law firms must navigate the balance between innovation and tradition, making this conversation essential for legal professionals looking to integrate AI strategically.
In this episode, Justin shares insights on how to:
Select the right AI models for legal applications
Address concerns around AI accuracy and hallucinations
Build trust in AI through transparency and audit trails
Identify high-value AI use cases that save time and resources
Implement AI while maintaining traditional billing models
Key takeaways:
AI can significantly enhance legal workflows but requires strategic implementation
Model selection is critical for accuracy and efficiency in legal tasks
Trust in AI is built through transparency, security, and proper oversight
Law firms should start small, iterate, and develop AI policies for responsible adoption
AI is a tool to support, not replace, legal professionals
About Justin McCallon
Justin McCallon is the CEO & Founder of Callidus Legal AI, bringing his expertise as a former attorney specializing in M&A and commercial bankruptcy. With a deep understanding of both law and technology, he co-led the transformation of AT&T’s legal department, focusing on optimizing workflows and leveraging AI for efficiency. Now, he helps law firms strategically integrate AI to enhance legal practice while maintaining accuracy and compliance.
What will happen is that the market will all move toward AI. Then if you’re the one laggard who’s not using it at all, it’s going to be pretty obvious. Groups are going to know about that and they’re not going to use you, because you produce less legal work than the alternative groups.
1
00:00:03,487 --> 00:00:05,224
Justin, how are you this morning?
2
00:00:05,400 --> 00:00:06,792
Doing well, good morning.
3
00:00:06,803 --> 00:00:11,272
Yeah, I appreciate you jumping on here with me for a few minutes.
4
00:00:11,478 --> 00:00:12,565
Absolutely.
5
00:00:12,981 --> 00:00:15,581
So you and I connected at TLTF.
6
00:00:15,581 --> 00:00:28,261
I think we were having lunch and we were talking about AI and reasoning and you were
sitting next to me and chimed in and had some really good thoughts on that topic.
7
00:00:28,261 --> 00:00:38,561
And I think we've, we've covered that on previous episodes, but, um, you and I had another
conversation and I thought you had some really good insights to share.
8
00:00:38,561 --> 00:00:41,469
It sounds like you, you dive pretty deep.
9
00:00:41,469 --> 00:00:46,272
which it's always good to hear perspective from folks that really, really jump in.
10
00:00:46,272 --> 00:00:51,314
But before we jump into the content here, let's just get you introduced.
11
00:00:51,314 --> 00:00:55,736
So you currently lead, is it Calidis Legal AI?
12
00:00:56,677 --> 00:00:57,338
Okay.
13
00:00:57,338 --> 00:01:11,475
And your focus on pairing AI with lawyers to enhance core legal work, you're deep in AI
and ML, your former practicing attorney doing &A work.
14
00:01:11,549 --> 00:01:14,301
which I think is interesting.
15
00:01:14,301 --> 00:01:17,542
Tell us more about your background and what you're doing today.
16
00:01:17,976 --> 00:01:24,569
Yeah, started off practicing &A, corporate restructuring, ended up working for AT &T after
that for a while.
17
00:01:24,569 --> 00:01:29,791
And I led the AT &T legal transformation with our deputy GC.
18
00:01:29,791 --> 00:01:33,412
It was really successful and very interesting for me.
19
00:01:33,552 --> 00:01:38,224
The group looked at just understanding what are our attorneys doing?
20
00:01:38,224 --> 00:01:40,675
What are they doing that's not the highest priority?
21
00:01:40,675 --> 00:01:42,036
How can they reprioritize?
22
00:01:42,036 --> 00:01:48,012
How can we think about bringing in-house some work that we're using outside counsel for
that we can
23
00:01:48,012 --> 00:01:52,574
add more efficiency to the internal resources and have them do more of the work
internally?
24
00:01:52,574 --> 00:01:59,807
Where are we missing key insights and adding liability and where are we over-focused where
we shouldn't be?
25
00:01:59,807 --> 00:02:04,849
And how can we just rethink some of the workflows to where we're operating more
effectively?
26
00:02:04,849 --> 00:02:05,989
So I did that for a bit.
27
00:02:05,989 --> 00:02:11,111
And then my next gig was running a data science and engineering org.
28
00:02:11,111 --> 00:02:12,541
And we launched the first Gen.
29
00:02:12,541 --> 00:02:16,713
AI product for the company, the subsidiary, Direct TV.
30
00:02:16,770 --> 00:02:18,031
which was pretty informative.
31
00:02:18,031 --> 00:02:20,092
It was right after CHATGBT came out.
32
00:02:20,092 --> 00:02:30,759
And to me, the reason I started my startup was it was so obvious if you paired the legal
transformation work with the GEN.AI work, there was going to be a big opportunity.
33
00:02:30,759 --> 00:02:36,682
And I didn't think it was going to be something from day one that CHATGBT could just
replace lawyers jobs or anything like that.
34
00:02:36,682 --> 00:02:43,146
But I thought over time, this looked like a great starting block for something that could
be really powerful.
35
00:02:43,476 --> 00:02:44,436
Interesting.
36
00:02:44,436 --> 00:02:44,656
Yeah.
37
00:02:44,656 --> 00:02:56,896
And I remember you and I had some subsequent, we had some subsequent dialogue on LinkedIn
talking about, we were talking about the Stanford paper and how, um, and also the Apple
38
00:02:56,896 --> 00:03:01,156
intelligence paper on the GSM eight K battery of test.
39
00:03:01,156 --> 00:03:06,736
think they call it GSM eight K adaptive, which, um, GSM is grade school math.
40
00:03:06,736 --> 00:03:10,188
And then there were 8,000 questions that
41
00:03:10,260 --> 00:03:13,980
were used to evaluate how well AI performed.
42
00:03:14,720 --> 00:03:28,320
And Apple Intelligence did a study on that and changed the adaptive part is where they
changed minor details about the questions and to see how the models would perform.
43
00:03:28,460 --> 00:03:37,080
And they degraded quite a bit anywhere from, I think, at the low end, the degradation was
like 30%.
44
00:03:37,080 --> 00:03:38,476
So at the time,
45
00:03:38,579 --> 00:03:39,789
Maybe that was one.
46
00:03:39,789 --> 00:03:53,044
I forget exactly what model on the open AI side was the latest and greatest, um, all the
way down to like a 70 % degradation just by inserting irrelevant facts about the questions
47
00:03:53,044 --> 00:03:54,375
or changing the names.
48
00:03:54,375 --> 00:04:04,939
Um, that's, and as you pointed out, that has since been resolved and you know, which makes
me wonder, all right, did they do that?
49
00:04:05,019 --> 00:04:07,240
Did they game the system at all?
50
00:04:07,240 --> 00:04:08,370
Like, Hey, we've got a,
51
00:04:08,370 --> 00:04:19,675
we've got a weakness here, let's apply a band aid or was there a fundamental adaptation
that they implemented that helped?
52
00:04:19,675 --> 00:04:33,231
But I think you had, when you ran those same questions through wherever we were at that
point, maybe it was 4.0, you had different output, like it answered successfully.
53
00:04:33,231 --> 00:04:34,982
Am I remembering that correctly?
54
00:04:35,304 --> 00:04:37,215
Yeah, I think pretty close.
55
00:04:37,215 --> 00:04:43,398
I think that the Apple paper, I could be wrong, but I thought it was they used a battery
of models.
56
00:04:43,398 --> 00:04:48,360
The only one that was somewhat advanced was GPD for the original GPD for.
57
00:04:48,360 --> 00:04:55,283
And now we have quite a lot better models with three mini and GPD four point five.
58
00:04:55,283 --> 00:05:02,478
And if you look at like the benchmarks for what the benchmark I like the most is live
bench, which they.
59
00:05:02,478 --> 00:05:09,838
hide the questions, you can't really game the system, they change the questions regularly,
and they do a full battery of tests.
60
00:05:09,998 --> 00:05:15,478
GVD-4 scored about a 45, and the best models now score about a 76.
61
00:05:15,478 --> 00:05:19,278
So they've come a long way in those benchmark tests.
62
00:05:19,278 --> 00:05:30,018
And when you use the top models now to do the same questions that Apple had, and continue
to variable different pieces and add irrelevant information so that you're sure that it
63
00:05:30,018 --> 00:05:32,418
wasn't trained on any of that information.
64
00:05:32,610 --> 00:05:34,632
they're answering every question correctly.
65
00:05:34,632 --> 00:05:42,897
And so I had sent over a handful of examples yesterday just to kind of prove my point
empirically that this is testable, this is falsifiable.
66
00:05:42,977 --> 00:05:47,640
You can run the test yourself and see, no, the AI actually is able to solve these things.
67
00:05:47,640 --> 00:05:57,787
And as far as how they did it, I'm not sure all of the specifics, but I think a lot of it
is on the post-training side where they're teaching it to, after they've completed the
68
00:05:57,787 --> 00:06:02,242
pre-training, they're teaching the model how to be more effective with the information it
does have.
69
00:06:02,242 --> 00:06:04,427
And then the reasoners are very good.
70
00:06:04,427 --> 00:06:11,540
anything that they're able to do to add this reasoning capability is definitely enhancing
the answers.
71
00:06:11,540 --> 00:06:12,080
Yeah.
72
00:06:12,080 --> 00:06:17,180
And there's, there's so, there's, there's so much movement in the space.
73
00:06:17,180 --> 00:06:20,760
can't even keep up and I use it all the time.
74
00:06:20,760 --> 00:06:28,500
Like, I don't know, five, seven, 10 times a day, but you know, you've got grok three,
you've got Claude 3.7.
75
00:06:28,500 --> 00:06:38,040
You've now got, um, Oh three mini, uh, 4.5 apparently five or five GPT five is on the way.
76
00:06:38,040 --> 00:06:40,532
Um, you know, there's deep seek.
77
00:06:40,532 --> 00:06:44,155
There's whatever Alibaba's model is.
78
00:06:44,155 --> 00:06:55,445
mean, there's Mistral, there's Llama, like it's impossible to keep up unless you're doing
this full time, which, you know, I'm not.
79
00:06:55,846 --> 00:07:03,893
so I looked at some of the tests that you threw at 03mini and I thought it did really
well.
80
00:07:03,893 --> 00:07:08,040
didn't, I just kind of breeze through it, but why don't you kind of tell us some of the...
81
00:07:08,040 --> 00:07:11,002
some of the tests you threw at it and how it performed.
82
00:07:11,470 --> 00:07:12,070
Yeah, yeah.
83
00:07:12,070 --> 00:07:22,410
What I was trying to do is get a sense of how strong is the model for the types of things
that people are challenging to say AI is just not able to do these fairly simple tasks.
84
00:07:22,410 --> 00:07:35,110
And so I ran through a handful of examples, one being let's find a case that was not in
the training set and let's go have it find the case text online and then give us a full
85
00:07:35,110 --> 00:07:39,758
summary of like the holding and the material facts and so forth and give us legal
analysis.
86
00:07:39,758 --> 00:07:43,840
I think that was one people were concerned AI is just not capable of doing that.
87
00:07:44,201 --> 00:07:45,298
I read it.
88
00:07:45,298 --> 00:07:48,763
I thought it did a fantastic job summarizing the case.
89
00:07:48,963 --> 00:07:58,113
I gave it some questions like solve complex numerical problems that also deal with
linguistics that are hard to even understand the question being asked.
90
00:07:58,113 --> 00:07:59,329
It did well there.
91
00:07:59,329 --> 00:08:01,518
It does well in constrained poetry.
92
00:08:01,518 --> 00:08:07,746
It did well on just I was surprised it did well on world model questions where I basically
had it.
93
00:08:07,746 --> 00:08:16,920
run a scenario where I'm like dumping marbles out of a container and then putting super
glue in and then moving them around the house and seeing where they end up and what
94
00:08:16,920 --> 00:08:18,030
walking through the steps.
95
00:08:18,030 --> 00:08:26,134
And it did pretty well on pretty much all of those things to where my point of view is
between that and GBD 4.5.
96
00:08:26,134 --> 00:08:36,318
Now you pretty much have something that can reason like a smart human can reason and it
can help in a pretty wide variety of ways from a chat window.
97
00:08:36,318 --> 00:08:37,622
There's still some
98
00:08:37,622 --> 00:08:45,566
issues where these tools don't have full capabilities that a human would have outside of
the chat window where we can pull additional resources.
99
00:08:45,566 --> 00:08:52,904
But if you're resource constrained and you're just talking to somebody intelligent, it's
going to be pretty similar to what these models can do now.
100
00:08:52,904 --> 00:08:53,664
Yeah.
101
00:08:53,664 --> 00:08:59,427
And you and I kicked around the Stanford paper too, which at this point is almost a year
old.
102
00:08:59,427 --> 00:09:01,728
It's actually over a year old from their first iteration.
103
00:09:01,728 --> 00:09:06,990
They did a subsequent adjustment and re-release in, I think May of last year.
104
00:09:06,990 --> 00:09:16,324
But some of the challenges that the Stanford paper highlighted was, you know, they
categorized too many things as hallucinations in my opinion.
105
00:09:16,324 --> 00:09:20,786
But I think overall, I got a lot of insight from reading the paper.
106
00:09:20,840 --> 00:09:29,093
but that AI misunderstands holdings, it has trouble distinguishing between legal actors,
it has difficulty respecting the order of authority.
107
00:09:29,093 --> 00:09:37,264
Did you, it fabricates, did you, do you feel like these specific issues have gotten
better?
108
00:09:39,142 --> 00:09:40,623
They've gotten better.
109
00:09:40,843 --> 00:09:53,288
There's tests on hallucination rates for the different models, and the reasoners are about
half the hallucination rates of GPT-4, and in GPT-4.5 is also about half the hallucination
110
00:09:53,288 --> 00:09:54,888
rate of GPT-4.
111
00:09:54,949 --> 00:09:58,710
That said, hallucinations are still an issue for these models.
112
00:09:58,790 --> 00:10:03,552
Legal tech companies can solve those issues, and this is where domain-specific software
comes in.
113
00:10:03,552 --> 00:10:06,924
There's different algorithms you can run to help there.
114
00:10:06,924 --> 00:10:08,128
For example,
115
00:10:08,128 --> 00:10:13,399
Let's say that you have an issue that you tend to hallucinate cases out of the LLMs.
116
00:10:13,399 --> 00:10:15,640
Well, I think everyone's kind of solved the problem now.
117
00:10:15,640 --> 00:10:26,443
We've been doing this for a long time where you take the case from the LLM, you have a
list out of the relevant cases, and then you have a secondary external data source that
118
00:10:26,443 --> 00:10:29,544
has a list of all the cases that you have API access to.
119
00:10:29,544 --> 00:10:33,625
And then you check the Blue Book citation to say, is this a real case or not?
120
00:10:33,625 --> 00:10:37,974
And then if it is, let's go check relevancy to ensure this is relevant to the answer.
121
00:10:37,974 --> 00:10:39,865
And if you get two checks, you say, OK, good.
122
00:10:39,865 --> 00:10:41,207
This is a real case.
123
00:10:41,207 --> 00:10:42,037
It's relevant.
124
00:10:42,037 --> 00:10:46,500
This is going to be passed on to the user, and they're going to be able to access that
case.
125
00:10:46,601 --> 00:10:51,845
so this is where, yeah, it's true that I think they had a good insight.
126
00:10:51,845 --> 00:11:01,553
LLMs will continue to hallucinate and cause problems, but LLM software that has
domain-specific engineering on top of it can solve those issues.
127
00:11:01,553 --> 00:11:03,672
And then the other one being
128
00:11:03,672 --> 00:11:08,805
Hey, can LLMs actually like pull out the legal actors and how can they figure out the
holdings?
129
00:11:08,805 --> 00:11:19,402
That one, I think pretty well now with the top models, they're able to understand pretty
well the holdings and you can test it yourself and see whether that's true empirically
130
00:11:19,402 --> 00:11:21,193
yourself pretty easily.
131
00:11:21,194 --> 00:11:27,397
In all my tests, and we use this a lot and do a lot of evaluations, they're quite good for
the top models now.
132
00:11:27,412 --> 00:11:29,723
What about respecting the order of authority?
133
00:11:30,814 --> 00:11:32,755
That's another one that they get.
134
00:11:32,755 --> 00:11:37,158
You might have to prompt engineer it a bit, and this is where the domain software comes in
again.
135
00:11:37,158 --> 00:11:46,244
But if you prompt engineer it well, it fully understands that the Supreme Court is
superior to a state Supreme Court, the US Supreme Court versus state Supreme Court.
136
00:11:46,244 --> 00:11:51,947
It fully understands that that court's superior to a trial court and so forth.
137
00:11:52,567 --> 00:11:57,320
We use this every day, and this is something that it's able to do very consistently.
138
00:11:57,621 --> 00:12:04,902
So how would you assess the current state of AI capabilities in legal research and
analysis as we sit today?
139
00:12:05,390 --> 00:12:09,230
Yeah, I would say raw LLMs out of the box, quite bad.
140
00:12:09,230 --> 00:12:12,010
I wouldn't use it for legal research.
141
00:12:12,010 --> 00:12:14,590
And again, they're going to hallucinate everything.
142
00:12:14,590 --> 00:12:16,590
They're going to miss some insights.
143
00:12:16,590 --> 00:12:25,770
They're going to have instances where, because they don't have in the pre-training data
the full knowledge about all the cases and all the statute for that state, they're going
144
00:12:25,770 --> 00:12:32,830
to take majority rules and assume that those are right for that state, even though your
state might be playing with them or using a minority rule.
145
00:12:33,030 --> 00:12:35,052
And so there's going to be a bunch of issues.
146
00:12:35,052 --> 00:12:38,044
You'll get kind of poorly formatted responses.
147
00:12:38,044 --> 00:12:41,976
If you ask it to draft a full brief, it'll give you like two pages.
148
00:12:42,057 --> 00:12:44,318
All of those things are problems.
149
00:12:44,438 --> 00:12:51,603
If you use good domain specific software, though, these are all engineering problems that
are solvable by the legal tech companies.
150
00:12:51,603 --> 00:12:56,466
And a lot of us have started to or very substantially solve those issues.
151
00:12:56,827 --> 00:13:01,550
And so if you use good software, you can expect an extensive
152
00:13:01,550 --> 00:13:07,392
30 page brief, no case hallucinations, hopefully no holding hallucinations.
153
00:13:07,572 --> 00:13:10,094
You can expect that it's properly formatted.
154
00:13:10,094 --> 00:13:20,848
You can expect that it does go into the details regarding the state law that it's in
question, just looking at the legal authorities to pull out the insights so that it's not
155
00:13:20,848 --> 00:13:22,298
just relying on majority rules.
156
00:13:22,298 --> 00:13:25,640
So all of those things that I think good software is able to do.
157
00:13:25,640 --> 00:13:31,446
That said, I would strongly suggest that we focus on software that keeps the attorney in
the loop.
158
00:13:31,446 --> 00:13:34,929
and gets the, lets the lawyer audit the output.
159
00:13:34,929 --> 00:13:39,092
So I wouldn't want just, hey, here's the full, here's my fact pattern.
160
00:13:39,092 --> 00:13:44,295
I'm just going to let the AI often go off and run and just draft a full 30 page brief.
161
00:13:44,295 --> 00:13:46,997
I don't think that's a good solution right now.
162
00:13:46,997 --> 00:13:56,385
I think what the AI does well is it synthesizes large amounts of information, bubbles them
up to the lawyer and probably gets it right the vast majority of the time, but the lawyer
163
00:13:56,385 --> 00:13:59,660
is still going to make the judgment call about which direction to go.
164
00:13:59,660 --> 00:14:01,353
And then the lawyer says, yes, go here.
165
00:14:01,353 --> 00:14:03,237
Don't pursue this and so forth.
166
00:14:03,237 --> 00:14:07,385
And then you work together with the AI to get a great answer very fast.
167
00:14:07,385 --> 00:14:10,900
And that's where I would say the focus should be.
168
00:14:11,304 --> 00:14:13,916
Yeah, and I have seen it's been a couple of months.
169
00:14:13,916 --> 00:14:14,947
I think it was late last year.
170
00:14:14,947 --> 00:14:30,037
I saw a chart of the amount of, gosh, guess comprehension, I guess, for lack of a better
term of different sized rag prompts.
171
00:14:30,037 --> 00:14:33,520
So, and it trails off dramatically.
172
00:14:33,520 --> 00:14:39,476
The larger, you know, like there's some like, like, Gemini two has a
173
00:14:39,476 --> 00:14:41,876
a 1 million token context window, right?
174
00:14:41,876 --> 00:14:43,656
Which is pretty significant.
175
00:14:43,656 --> 00:14:47,156
I think Claude is a couple hundred thousand GPT a little lower.
176
00:14:47,156 --> 00:14:48,396
These are always moving.
177
00:14:48,396 --> 00:14:51,656
I'm, I'm, I'm, I might not be the current state of things.
178
00:14:51,656 --> 00:14:51,796
Yeah.
179
00:14:51,796 --> 00:14:52,656
Yeah.
180
00:14:52,656 --> 00:15:03,436
Um, but I, I saw a kind of a performance metric that through large amounts of documents
through rag at these models.
181
00:15:03,436 --> 00:15:08,404
And it trailed off pretty substantially in terms of missing, you know, like
182
00:15:08,404 --> 00:15:16,375
key facts during summarization as the number of tokens increased.
183
00:15:16,375 --> 00:15:19,770
Are we getting any better there or is that still a limitation?
184
00:15:20,024 --> 00:15:23,096
Getting better is a limitation, but it can be engineered around.
185
00:15:23,096 --> 00:15:24,257
This is another one.
186
00:15:24,257 --> 00:15:31,641
What we've had to do this where client has say a 40 page document and it may be a hundred
40 page documents.
187
00:15:31,641 --> 00:15:34,783
And we're trying for each one to pull out all the payment terms.
188
00:15:34,783 --> 00:15:43,268
This is an area where out of the box, LLMs do pretty poorly without a tremendous amount of
prompt engineering and kind of just general engineering.
189
00:15:43,268 --> 00:15:46,786
So what we need to do is break down the problems to where.
190
00:15:46,786 --> 00:15:55,038
we're only pushing through like a page at a time, maybe a little bit more than that,
giving it enough context and then giving very detailed prompts on exactly what to look for
191
00:15:55,038 --> 00:15:56,550
and what not to look for.
192
00:15:56,590 --> 00:16:01,312
You have to do all of that in a pretty domain specific way to get good answers.
193
00:16:01,312 --> 00:16:07,915
And so I think if you're just using a raw LLM without a lot of engineering work, they're
not gonna do very well here.
194
00:16:10,116 --> 00:16:11,056
Chunking's a big part of that.
195
00:16:11,056 --> 00:16:15,098
Yeah, you'll chunk the paper and then run in parallel.
196
00:16:15,310 --> 00:16:22,978
like a hundred different agents basically to each have their one page to review and then
summarize in groups.
197
00:16:23,142 --> 00:16:24,312
Interesting.
198
00:16:24,413 --> 00:16:24,723
Yeah.
199
00:16:24,723 --> 00:16:37,703
You know, another challenge just as a, I'm not, I don't consider myself an AI expert, more
of an enthusiast, but I, I, I really do put, um, AI through its paces on real world stuff
200
00:16:37,703 --> 00:16:41,272
mostly and find a huge variation.
201
00:16:41,272 --> 00:16:43,537
Um, I also find it quite confusing.
202
00:16:43,537 --> 00:16:48,460
All the fragmentation, like just within the open AI world, just the number.
203
00:16:48,460 --> 00:16:51,006
And I know five is supposed to solve that.
204
00:16:51,006 --> 00:17:00,661
but I'm still really curious how that's gonna, how that's gonna work because you know,
today, I mean, what do you have eight drop down?
205
00:17:00,661 --> 00:17:01,261
You know what I mean?
206
00:17:01,261 --> 00:17:03,822
Like eight models you can choose from.
207
00:17:04,483 --> 00:17:05,963
that's, that's a big impediment.
208
00:17:05,963 --> 00:17:15,748
Somebody who pays really close attention to this still, I still don't have a firm handle
on, you know, when to use what, um, it seems like a moving target.
209
00:17:16,526 --> 00:17:19,707
Yeah, and it's pretty tough for a casual user.
210
00:17:19,707 --> 00:17:24,169
You're far, far more knowledgeable about this than the average user.
211
00:17:25,730 --> 00:17:30,512
Basically, the factors you need to consider are A, how fast do need a response?
212
00:17:30,512 --> 00:17:33,413
B, what's the context window that I need?
213
00:17:33,593 --> 00:17:37,135
C, do I need a model with a lot of pre-training data?
214
00:17:37,135 --> 00:17:43,137
So as in, it has a lot of knowledge that I need to pull from, or do I need something more
that's reasoning well?
215
00:17:43,137 --> 00:17:45,720
And based on those factors, you can choose the right model.
216
00:17:45,720 --> 00:17:53,737
But I'm in this every day and this is my business and so I'm familiar for your casual
user, you have no idea which one to use.
217
00:17:53,737 --> 00:18:02,914
And yeah, GPT-5 though will help with that to where it's going to basically just figure
out your question and then suggest the best model internally and then just give you that
218
00:18:02,914 --> 00:18:05,386
best model without you needing to even think about it.
219
00:18:05,386 --> 00:18:15,016
I think ironically, a lot of the time GPT-5 will be basically GPT-4 where they're just
gonna say, well, GPT-4 is good enough to answer this, go ahead and move forward.
220
00:18:15,016 --> 00:18:17,789
because most questions that gets are actually pretty easy.
221
00:18:17,789 --> 00:18:22,272
There's a handful of hard questions that people push on every now and then.
222
00:18:22,353 --> 00:18:31,220
That said, again, I would stress that the legal tech groups are a lot better for solving
domain-specific tasks than these models are anyway.
223
00:18:31,220 --> 00:18:37,426
And what's happening is basically we're standing on top of the best models for the
specific task we're working on.
224
00:18:37,426 --> 00:18:43,020
We're choosing the best one, knowing exactly the that the tool that we need to use.
225
00:18:43,214 --> 00:18:52,730
And oftentimes we're using a combination of two or three, sometimes even from different
groups, to where that combination, plus a lot of prompt engineering and other engineering
226
00:18:52,730 --> 00:18:55,761
on top of it, can yield pretty good results.
227
00:18:56,242 --> 00:19:07,748
And I think what you'll see is, in general, the legal tech companies are going to be about
two years ahead of the raw LLMs, as far as their ability to practice law more or less, or
228
00:19:07,748 --> 00:19:11,850
support someone who's practicing law to be an amplifier of that person.
229
00:19:12,003 --> 00:19:23,766
And so in general, and I don't think it's very user friendly just to work from a chat
window versus a nice template that's easy to follow just like a webpage.
230
00:19:24,210 --> 00:19:24,801
Yeah.
231
00:19:24,801 --> 00:19:32,266
So the model selection is that has to be done algorithmically, correct?
232
00:19:32,487 --> 00:19:36,520
what does the process look like for selecting the right model?
233
00:19:36,520 --> 00:19:38,412
Just maybe in how you do it.
234
00:19:38,412 --> 00:19:40,322
I think OpenAI is somewhat opaque.
235
00:19:40,322 --> 00:19:43,085
I'm not sure that they provide transparency around that.
236
00:19:43,085 --> 00:19:49,180
But just in broad brushstrokes, like, how does it determine which path to take?
237
00:19:49,870 --> 00:19:55,210
Yeah, for us, we decide, we don't do it in a full algorithm way.
238
00:19:55,210 --> 00:20:00,750
We have across our app probably 100, 200 different API calls.
239
00:20:00,750 --> 00:20:05,870
And for each one of those, we have a general view on, is this going to need speed?
240
00:20:05,870 --> 00:20:09,810
Is it going to need the ability to instruction follow really well?
241
00:20:09,810 --> 00:20:13,730
Is it going to need high pre-training knowledge and so forth?
242
00:20:13,730 --> 00:20:18,702
And then based on those factors, we'll say it's probably one of these three models that we
should use.
243
00:20:18,702 --> 00:20:26,616
and then we run evals and anywhere important to say, okay, let's actually see what score
these models get on our evaluations.
244
00:20:26,616 --> 00:20:40,634
And so that could be an evaluation, for instance, of how many cases are they returning
that are accurate out of the, where we'll try to kind of do a full analysis on, okay,
245
00:20:40,634 --> 00:20:42,855
here's an evaluation question.
246
00:20:42,855 --> 00:20:47,822
Let's have real attorneys do the work and figure out what cases you would want to cite.
247
00:20:47,822 --> 00:20:55,782
And then as you figured out what cases you want to cite, we're going to score these cases
to say, this is like a five, this case is a three, this is a one.
248
00:20:55,782 --> 00:21:03,262
And as far as importance, now let's have all the models do their work and give the case
that they think are most relevant and we're going to score those.
249
00:21:03,262 --> 00:21:10,862
So we have a lot of those automations in place and then whenever a new model comes out, we
just run it through the system of tests and say, okay, it's going to be good here, here
250
00:21:10,862 --> 00:21:11,422
and here.
251
00:21:11,422 --> 00:21:12,742
It's not going to be very good here.
252
00:21:12,742 --> 00:21:14,790
And we can move forward that way.
253
00:21:15,218 --> 00:21:15,668
Interesting.
254
00:21:15,668 --> 00:21:22,781
Yeah, it seems like, you know, finding the right balance between speed and quality is the
sweet spot, right?
255
00:21:22,781 --> 00:21:31,535
You can't slow the process down too much or you're going to impact efficiency, but you
need to, it's striking that balance.
256
00:21:31,535 --> 00:21:34,766
It seems like is the, is the strategy, is that accurate?
257
00:21:35,148 --> 00:21:37,439
Yeah, it's a fun challenge.
258
00:21:37,638 --> 00:21:47,382
A lot of what we do is we'll have a seven part workflow, for instance, and when the user
does step two, we're kicking off a pretty slow model that's really smart.
259
00:21:47,382 --> 00:21:53,843
And then when they get to step six, that slow model is done with the analysis, and then
it's inserting the answer for the user.
260
00:21:53,843 --> 00:21:59,805
Then it's done all that work in the background while they're filling out other information
that's not as relevant to the answer.
261
00:22:00,025 --> 00:22:02,106
And so we do a lot of that.
262
00:22:02,166 --> 00:22:02,732
then
263
00:22:02,732 --> 00:22:06,684
Sometimes you just use the fast model because it's a fairly easy answer.
264
00:22:06,764 --> 00:22:07,955
So we'll do some of that.
265
00:22:07,955 --> 00:22:19,521
And it's just an interesting game of how do we think about the legal implications, how do
we think about the AI driven implications and the technology implications, and then how do
266
00:22:19,521 --> 00:22:25,384
we think about a good user experience and pair all that together to give something that
makes sense cohesively.
267
00:22:25,396 --> 00:22:26,096
Yeah.
268
00:22:26,096 --> 00:22:29,596
You know, recently I've seen interesting benchmarks.
269
00:22:29,596 --> 00:22:32,596
Was it Val's AI that put together?
270
00:22:32,596 --> 00:22:33,576
I'm not sure if you've seen it.
271
00:22:33,576 --> 00:22:51,216
It's just been maybe in the last week or so that, I don't know if it was a benchmark or a
study that talked about real scenarios, legal workflows in which, measured efficiency.
272
00:22:51,436 --> 00:22:52,850
Again, there's so much stuff.
273
00:22:52,850 --> 00:22:54,411
flying at you these days.
274
00:22:54,411 --> 00:23:05,207
don't have it memorized, but it seems like there's more of a focus now on legal specific
use cases and how these models perform in those scenarios.
275
00:23:05,247 --> 00:23:07,378
Are you seeing more of that now?
276
00:23:07,922 --> 00:23:09,263
I need to check out that study.
277
00:23:09,263 --> 00:23:11,725
actually haven't seen it.
278
00:23:11,725 --> 00:23:15,009
We love the idea of doing more legal benchmarks.
279
00:23:15,030 --> 00:23:21,136
That's an area where we've really taken a lot of time to try to build a tool that's useful
from that perspective.
280
00:23:21,136 --> 00:23:25,280
And I think it's useful to the end user as well.
281
00:23:25,301 --> 00:23:27,422
But no, I haven't seen that specific study.
282
00:23:27,422 --> 00:23:30,836
I do like the idea though of pursuing that.
283
00:23:31,775 --> 00:23:37,650
Yeah, it's, it's this stuff is March four, three days ago, um, is the post I saw on it.
284
00:23:37,650 --> 00:23:39,761
And again, it's just so hard to keep up with.
285
00:23:39,761 --> 00:23:45,216
And there's so much that even, you know, after you read it, three more things fly at you.
286
00:23:45,216 --> 00:23:52,962
it's like, so what about, um, AI strategies in general for law firms?
287
00:23:52,962 --> 00:24:00,888
So, you know, I, I have been critical of law firms that seem to
288
00:24:01,170 --> 00:24:07,614
immediately deploy tactically versus figuring out strategically what they want to do.
289
00:24:07,614 --> 00:24:09,986
And strategy includes a lot of different things.
290
00:24:09,986 --> 00:24:15,059
It can include where to first focus your AI efforts.
291
00:24:15,059 --> 00:24:21,714
It could include the organizational design within the firm that's going to support those
efforts.
292
00:24:22,395 --> 00:24:27,088
It can define the risk tolerance that the firm is willing to take.
293
00:24:27,088 --> 00:24:30,610
know, cause we still have, you know, we still have
294
00:24:30,610 --> 00:24:35,342
I saw an interesting study from the legal value network.
295
00:24:35,342 --> 00:24:39,484
They do an LPM survey every year.
296
00:24:39,484 --> 00:24:43,905
one question that was asked was what percentage of your clients?
297
00:24:44,426 --> 00:24:47,747
So they talked to, think 80 law firm GCs.
298
00:24:48,608 --> 00:24:57,431
And what percentage of your clients either discourage or prohibit the use of AI in their
matters?
299
00:24:57,431 --> 00:24:59,592
And the number was 42%.
300
00:24:59,592 --> 00:25:10,258
which seems shockingly high because I saw another study from the Blikstein group that it's
the LDO law department.
301
00:25:10,258 --> 00:25:12,300
I forget what the acronym stands for.
302
00:25:12,300 --> 00:25:23,766
And anyway, almost 60 % of the law firm client GCs that they talked to said that law firms
aren't using technology enough to drive down costs.
303
00:25:23,766 --> 00:25:26,378
And those are two very conflicting data points.
304
00:25:26,378 --> 00:25:27,004
It's like,
305
00:25:27,004 --> 00:25:33,244
OK, you want me to drive down costs, but you got OCG's that prevent me from implementing
the technology.
306
00:25:33,244 --> 00:25:39,738
I can't use them on your matters like I don't know what you feel like that's a disconnect
in the marketplace still.
307
00:25:40,152 --> 00:25:41,863
I think it's very bimodal.
308
00:25:41,863 --> 00:25:52,911
I think that you have a lot of attorneys on one side or the other where some really want
to embrace the newest technology all in and others are very cautious about it.
309
00:25:52,911 --> 00:25:57,534
And there's not as many groups in the middle as you'd expect.
310
00:25:57,534 --> 00:25:59,795
And so it's not like your normal bell curve.
311
00:25:59,936 --> 00:26:01,957
And so I think that's what's going on.
312
00:26:01,957 --> 00:26:09,474
And I think the organizational strategy and kind of transformation lens is a really tough
and interesting question for
313
00:26:09,474 --> 00:26:11,715
organizational leaders to think about.
314
00:26:11,715 --> 00:26:14,436
I think we probably disagree a little bit on this one.
315
00:26:14,436 --> 00:26:24,880
I have more of an engineering mindset on it where I think the way to go is start small and
iterate and then run in parallel your strategy.
316
00:26:25,500 --> 00:26:35,765
We've just seen so many instances where a company really wants to get into AI, they're
strategizing about it and a year later they haven't really done anything and they don't
317
00:26:35,765 --> 00:26:37,560
really get it because they have
318
00:26:37,560 --> 00:26:41,113
They're senior leaders doing strategy stuff without being very hands-on.
319
00:26:41,113 --> 00:26:42,774
They don't really get it.
320
00:26:42,774 --> 00:26:53,582
I think if you take the time to have a small group of people that are really invested in
using AI every day, try out some leading tools, go in the right direction.
321
00:26:53,582 --> 00:26:55,363
Don't do anything just crazy.
322
00:26:55,363 --> 00:26:57,875
And then just don't put in any client information.
323
00:26:57,875 --> 00:27:04,640
Just do everything based on just very kind of random or kind of synthesized or sanitized
data.
324
00:27:04,666 --> 00:27:10,570
I think if you do that, you can get a pretty good sense of, now I get what people are
using this for.
325
00:27:10,570 --> 00:27:15,773
We could use it here, here, and here, but I can't use it in this area because it's going
to have issues.
326
00:27:15,773 --> 00:27:20,796
Or this is decent software in this way, but not in this other way.
327
00:27:20,796 --> 00:27:23,358
Now I can make informed strategic decisions.
328
00:27:23,358 --> 00:27:29,161
I think that if you kind of do that pairing, that's probably what I think would be the
best approach.
329
00:27:29,202 --> 00:27:29,482
Yeah.
330
00:27:29,482 --> 00:27:31,403
Well, we're, we're aligned on part of that.
331
00:27:31,403 --> 00:27:43,798
So I think that striking the right risk reward balance is key and, that should be the
number one, um, driver of the approach.
332
00:27:43,838 --> 00:27:44,398
Right.
333
00:27:44,398 --> 00:27:54,863
I think that jumping right in on the practice side and, know, going whole hog with
attorneys who have super high opportunity costs and low tolerance for missteps is a
334
00:27:54,863 --> 00:27:55,643
mistake.
335
00:27:55,643 --> 00:27:56,943
So we're aligned on that.
336
00:27:56,943 --> 00:27:58,549
I guess where I get hung up,
337
00:27:58,549 --> 00:28:03,480
is that, I'm going to quote another study here, or survey.
338
00:28:03,480 --> 00:28:13,753
Thomson Reuters did one, the professional services, gen AI survey came out late last year
and only 10 % of law firms, one out of 10 have a gen AI policy.
339
00:28:13,913 --> 00:28:19,114
So in order to write a policy, I think you need a strategy first, right?
340
00:28:19,114 --> 00:28:26,674
A policy is an outcome, is an outgrowth of a strategy, but nine out of 10 don't have one.
341
00:28:26,674 --> 00:28:37,690
So what you have now is law firm users who don't have proper guidance on, hey, what can I
use the public models for?
342
00:28:37,690 --> 00:28:38,691
Can I use them at all?
343
00:28:38,691 --> 00:28:40,412
Do I use it on my phone?
344
00:28:40,412 --> 00:28:44,254
What do I use it my personal laptop when I'm not connected to the VPN?
345
00:28:44,254 --> 00:28:49,196
Like all of those questions not being answered, I think creates unnecessary risk.
346
00:28:49,196 --> 00:28:56,370
Maybe at a certain, you know, altitude defining the strategy and incrementally
347
00:28:56,370 --> 00:29:00,529
working your way down more granularly, maybe that's the right balance.
348
00:29:00,684 --> 00:29:01,935
I think we're in sync there.
349
00:29:01,935 --> 00:29:07,059
think it's crazy that you wouldn't have a GNI policy at this point.
350
00:29:07,059 --> 00:29:12,604
think our company at Direct TV, I think we had one three months in after ChatGBT.
351
00:29:12,604 --> 00:29:20,371
I was on the executive board there and we thought just immediately we have to give the
company employees something to give some guidance.
352
00:29:20,371 --> 00:29:22,132
And yeah, I think you're exactly right.
353
00:29:22,132 --> 00:29:29,578
You start high, you make it a little bit overly restrictive, then you dig into the details
and you realize, okay, here's where we can open up.
354
00:29:29,622 --> 00:29:38,376
a little bit more, here's where we can be a little bit less or more forgiving on the use
of the tools and just be smart about that.
355
00:29:38,858 --> 00:29:44,415
But yeah, if you're working in a law firm, you don't have a strategy, I think you
definitely should start working on that right away.
356
00:29:44,415 --> 00:29:45,125
Yeah.
357
00:29:45,125 --> 00:29:55,458
And what do you, what do you think about this is another, you know, opinion of mine, some
may agree, disagree, but I see a lot of law firm C-suite and director level roles, both on
358
00:29:55,458 --> 00:30:06,651
the innovation and AI world that are brought in without any sort of strategy and
essentially brought in to just kind of figure it out.
359
00:30:06,731 --> 00:30:11,922
And normally I like an agile approach, but the problem with
360
00:30:12,264 --> 00:30:28,148
this approach in law firms is there is those resources are typically not sufficiently
empowered to make change and law firm decision making is so friction heavy that it feels
361
00:30:28,148 --> 00:30:31,329
like you're setting these leaders up.
362
00:30:31,449 --> 00:30:36,091
You're not setting them up for success if because the tone has to be set at the top,
right?
363
00:30:36,091 --> 00:30:40,472
Again, around risk taking around where they want to.
364
00:30:41,556 --> 00:30:44,436
add value within the business.
365
00:30:45,516 --> 00:30:56,096
know, just all of these things that need to happen at the senior, at the most senior level
and bringing somebody in, even if it's a C-suite, but at the director level, like, do
366
00:30:56,096 --> 00:31:01,896
really think this person's going to have the political capital to make recommendations and
those get it?
367
00:31:01,896 --> 00:31:03,136
How long is that going to take?
368
00:31:03,136 --> 00:31:06,376
Like they'll be there three years before anything gets, I don't know.
369
00:31:06,376 --> 00:31:08,776
Do you have any thoughts on the sequence?
370
00:31:09,452 --> 00:31:10,442
A couple thoughts.
371
00:31:10,442 --> 00:31:20,055
I think it's a tough problem for one in the fact that you have usually have a lot of
partners and managing partners that are making decisions, decisions collectively.
372
00:31:20,055 --> 00:31:24,906
That's just inherently harder to kind of move the ship and all that.
373
00:31:25,026 --> 00:31:35,499
That said, I would say when we speak with most of the senior leaders at firms, I don't
think they're that deep on what's possible with TENAI, how the value of it is very
374
00:31:35,499 --> 00:31:37,450
specific some implementation any of that.
375
00:31:37,450 --> 00:31:38,850
What I'd recommend is
376
00:31:38,850 --> 00:31:50,098
Think about the core values that you care about, like risk versus the impact to your
business from an acceleration perspective, or the ability to add more insight, and all
377
00:31:50,098 --> 00:31:54,441
those high level values with maybe confidentiality and security and all that.
378
00:31:54,441 --> 00:32:00,726
And just in a very general sense, align at the highest level on what trade-offs you wanna
make.
379
00:32:00,726 --> 00:32:06,562
And then once you have that general view, then empower somebody who is
380
00:32:06,562 --> 00:32:16,665
very knowledgeable in the area to give very specific recommendations of, given what you
said from a value standpoint, here's how we can implement an end-to-end strategy around
381
00:32:16,665 --> 00:32:21,626
Gen.AI that makes sense and is aligned with what you're guiding me on.
382
00:32:21,626 --> 00:32:32,615
And then I think in parallel, I would really try to have some subset of users be very
engaged in using a tool and getting a good sense and getting learnings from that and
383
00:32:32,615 --> 00:32:35,630
having the groups present jointly.
384
00:32:35,795 --> 00:32:37,258
to the managing partners.
385
00:32:37,258 --> 00:32:39,723
I think that's probably a good recipe for success.
386
00:32:39,804 --> 00:32:40,324
Yeah.
387
00:32:40,324 --> 00:32:54,728
And, know, I have advocated for bringing consultants in for that part of the journey, just
because I worry that bringing in, you know, again, a director level role to manage this,
388
00:32:54,728 --> 00:33:05,311
um, is just a tough, a tougher sell than if, you know, the executive committee brings in
consultants who, and you know what, there's a gap in the marketplace right now.
389
00:33:05,311 --> 00:33:08,776
There's not people like you who really know this stuff.
390
00:33:08,776 --> 00:33:10,677
are sitting in a seat like yours.
391
00:33:10,677 --> 00:33:23,704
There's so much capital being deployed in this area of tech that if you have these
skillsets, going out and selling your time hourly is not the best way to capture economic
392
00:33:23,704 --> 00:33:24,465
value.
393
00:33:24,465 --> 00:33:27,276
It's to do something like you're doing with a startup.
394
00:33:27,276 --> 00:33:34,190
And as a result, I think there's a big gap in the consulting world with people who really
know their stuff.
395
00:33:34,190 --> 00:33:36,731
So I do sympathize.
396
00:33:36,731 --> 00:33:38,365
Do you see that gap?
397
00:33:38,365 --> 00:33:39,354
as well.
398
00:33:40,280 --> 00:33:41,780
think we're aligned there.
399
00:33:41,780 --> 00:33:44,811
It's a really tough problem for law firms because of that.
400
00:33:45,252 --> 00:33:56,875
I mean, one thing you could try to do is work with a leader of a vendor and just say, hey,
look, I can't use your software, but we'd love to form a longer term relationship over
401
00:33:56,875 --> 00:33:57,615
time.
402
00:33:57,615 --> 00:34:06,138
And can you just give us some general guidance on how we can be effective and know that
that person is going be a little bit biased is what one thing you can do.
403
00:34:07,375 --> 00:34:16,683
I do think that trying to find the right consultant, that there are some out there that
you might be able to find one, but it's tough and you might need to just rely on finding
404
00:34:16,683 --> 00:34:23,248
your most tech forward partner to take a lead position and say, hey, you've got to get
really deep on this stuff.
405
00:34:23,248 --> 00:34:35,798
And I think one thing you need to be cautious about is if you find someone who's not very
kind of forward from a transformation perspective, they're going to move very slowly.
406
00:34:35,798 --> 00:34:40,825
relative to somebody who's just like, hey, we need to stop everything and figure out how
to do this effectively.
407
00:34:40,825 --> 00:34:44,570
That person's gonna have enough friction thrown at them to slow them down anyway.
408
00:34:44,570 --> 00:34:46,642
But I would start with someone like that.
409
00:34:46,642 --> 00:34:48,373
Yeah, that makes sense.
410
00:34:48,373 --> 00:34:53,214
lot of partners still have books of business.
411
00:34:53,975 --> 00:34:57,276
It's a tough problem for sure.
412
00:34:57,276 --> 00:34:58,377
No easy answers.
413
00:34:58,377 --> 00:35:06,780
How should law firms think about balancing efficiency gains and the impact to the billable
hour?
414
00:35:07,406 --> 00:35:15,586
Yeah, this is one where we get all the time, Okay, maybe someday or today your software is
good enough to where you're adding efficiency.
415
00:35:15,586 --> 00:35:17,146
I'm just going to build less, right?
416
00:35:17,146 --> 00:35:19,226
So why do even want this software?
417
00:35:19,686 --> 00:35:20,726
A few thoughts on that.
418
00:35:20,726 --> 00:35:25,366
One, in a lot of cases, attorneys aren't always billing by the billable hour.
419
00:35:25,366 --> 00:35:33,986
It could be contingency, they could be in-house, that it could be doing a cost per X type
of model where it's like I'm going to charge you per demand letter I write or something
420
00:35:33,986 --> 00:35:34,670
like that.
421
00:35:34,670 --> 00:35:42,970
For those that do need to do the billable hour, which is the majority of attorneys, my
view is that it's kind of like computers.
422
00:35:43,310 --> 00:35:53,610
It's not like after the computer came out 10 years later, were still spending most of
their time going to law libraries and manually checking out books and reading through
423
00:35:53,610 --> 00:35:53,950
books.
424
00:35:53,950 --> 00:35:55,810
It's just not as efficient.
425
00:35:56,010 --> 00:35:59,510
What will happen is that the market will all move toward AI.
426
00:35:59,510 --> 00:36:02,606
Then if you're the one laggard who's not using it at all,
427
00:36:02,606 --> 00:36:04,586
it's just going to be pretty obvious.
428
00:36:04,586 --> 00:36:12,666
Groups are going to know about that and they're not going to use you because you produce
less legal work than the alternative groups.
429
00:36:13,446 --> 00:36:15,546
And so that's where I see the market going.
430
00:36:15,546 --> 00:36:18,665
The other benefit is lawyers write off a lot of their time.
431
00:36:18,665 --> 00:36:22,866
I mean, if you work a 10 hour day, you might bill six and a half hours on average.
432
00:36:22,866 --> 00:36:29,046
And a lot of that time is because you're doing background legal research work or
background work to get up to speed.
433
00:36:29,046 --> 00:36:30,626
does that really well.
434
00:36:30,732 --> 00:36:34,865
your goal as a law firm would probably be to have higher revenue per attorney.
435
00:36:34,865 --> 00:36:38,527
And if attorneys are billing a higher percentage of their time, you're meeting that goal.
436
00:36:38,527 --> 00:36:42,150
So I think that there's a lot of talk about the billable hour.
437
00:36:42,150 --> 00:36:43,931
And I think it's not going away.
438
00:36:43,931 --> 00:36:48,434
Maybe some on the margins, maybe there's some changes.
439
00:36:48,434 --> 00:36:53,678
But I think that lawyers are going to want to be efficient.
440
00:36:53,678 --> 00:36:56,660
And over time, they're going to lean on these tools.
441
00:36:56,660 --> 00:37:00,248
think the hesitancy with AI has been more
442
00:37:00,248 --> 00:37:04,983
There's a lot of traps and a lot of just, this wasn't very good type of outputs.
443
00:37:04,983 --> 00:37:10,918
And I think that the industry is getting pretty close to where those types of issues are
going away pretty fast.
444
00:37:10,918 --> 00:37:11,828
Yeah.
445
00:37:12,009 --> 00:37:16,651
Well, what about high value use cases in, in, on the practice side?
446
00:37:16,651 --> 00:37:22,173
Like I know you and I talked about, document review and timeline creation.
447
00:37:22,173 --> 00:37:24,831
And I thought the timeline creation wasn't, was an interesting one.
448
00:37:24,831 --> 00:37:37,030
I I'm not a lawyer and I don't know how often that scenario comes into play, but any
thoughts on, know, where, where the high value use cases are within the practice today?
449
00:37:37,474 --> 00:37:49,104
Yeah, the areas that I think are most interesting are where AI can synthesize very large
amounts of data and get a pretty much fully accurate answer almost every time.
450
00:37:49,545 --> 00:37:54,250
And so a couple of areas that really make sense, like you mentioned, timelines.
451
00:37:54,250 --> 00:38:03,286
You can ingest all of your documents, like a discovery set that's all relevant documents,
throw that into the AI.
452
00:38:03,286 --> 00:38:09,872
at say, hey, pull out all the relevant pieces and create a timeline based on that, and
then use that to draft a statement of facts.
453
00:38:09,872 --> 00:38:11,773
That's gonna be pretty good.
454
00:38:11,773 --> 00:38:15,517
And that's not that hard to set up to do really well.
455
00:38:15,517 --> 00:38:26,486
I've been seeing a lot of users use our tool and are just like, wow, this saved a ton of
time before I was kind of nervous about letting AI answer legal research questions.
456
00:38:26,486 --> 00:38:29,528
But when I see this, this is super useful.
457
00:38:29,629 --> 00:38:31,660
Very similar concept with doc review.
458
00:38:31,660 --> 00:38:41,753
you can automate your doc review and put in hundreds of thousands of pages of files that
AI is looking through to see whether it's relevant based on the context you give it and
459
00:38:41,753 --> 00:38:43,693
based on what you're asking to search for.
460
00:38:43,693 --> 00:38:51,415
And a very high percentage of the actually relevant files will be pulled out, prioritized,
and then synthesized in summary.
461
00:38:51,516 --> 00:38:58,898
Those types of tools are extremely useful where they might save thousands of hours of time
to get your first pass in doc review.
462
00:39:00,102 --> 00:39:06,428
I would say that the error rate is pretty similar to humans at this point, if for a
well-engineered software.
463
00:39:06,428 --> 00:39:18,799
And there's other things that are similar, like you can do where you have to have an
expert say, here's all of the information I relied on before I go take the stand.
464
00:39:18,799 --> 00:39:26,446
And can you create the reliance list based on all of these 200 files that I uploaded to
your system?
465
00:39:26,446 --> 00:39:36,066
and said, are the pieces I relied on, it might take an attorney 100 hours to build that
tool, we'll do that with 100 % precision if it's well engineered, and you've just saved
466
00:39:36,066 --> 00:39:36,906
that time.
467
00:39:36,906 --> 00:39:45,346
So those areas are the ones that I'd say are the most interesting that I would really
recommend groups try out for a good vendor.
468
00:39:45,346 --> 00:39:49,666
And then there's others where I think legal research is a really hard problem.
469
00:39:49,666 --> 00:39:55,054
It's the first one we started tackling, but just think about all the decisions that the
lawyer needs to make.
470
00:39:55,054 --> 00:39:59,814
when they do legal research, they're thinking about, what kind of, is this a motion to
dismiss?
471
00:39:59,814 --> 00:40:01,534
Is it a motion for summary judgment?
472
00:40:01,594 --> 00:40:03,294
Is it a trial?
473
00:40:03,694 --> 00:40:06,354
Am I drawing, doing the original complaint?
474
00:40:06,354 --> 00:40:09,954
I'm gonna have very different cases that I use in all of those scenarios.
475
00:40:09,954 --> 00:40:13,854
I've gotta understand the relevancy of the cases, the procedural posture of the case.
476
00:40:13,854 --> 00:40:22,874
I need to think about whether in that case the court ruled for or against my client or the
person that's analogous to my client.
477
00:40:22,874 --> 00:40:24,662
And there's so many factors.
478
00:40:24,662 --> 00:40:32,709
I think we're getting very close to where I feel pretty good about our analysis, but we
still want the lawyer heavily in the loop through the process.
479
00:40:32,710 --> 00:40:35,823
But the other areas are just AI just does it really well.
480
00:40:35,823 --> 00:40:37,014
It's not as complicated.
481
00:40:37,014 --> 00:40:45,072
And I definitely recommend you get started on those areas and then dip your feet in some
of the peripheral areas like billing or areas where it's a little bit less related to core
482
00:40:45,072 --> 00:40:45,962
work too.
483
00:40:46,494 --> 00:40:54,791
So like in the scenario of like creating a timeline on the surface to me, that doesn't
sound like something that requires a point solution.
484
00:40:54,791 --> 00:41:05,530
Like can the general models or I'm not a big fan of copilot at the moment, but do you need
a specifically trained platform to do that effectively?
485
00:41:05,998 --> 00:41:08,979
I think eventually the general models will be able to do it.
486
00:41:09,939 --> 00:41:17,642
I don't know that any of the general models can take like 100 different documents being
uploaded at once.
487
00:41:17,642 --> 00:41:26,984
if you just use even some of the better ones that instruction followed really well, that
have big context windows, they're still gonna miss a lot if you don't do deeper
488
00:41:26,984 --> 00:41:27,534
algorithms.
489
00:41:27,534 --> 00:41:32,826
So for example, for us, we kind of what I mentioned earlier, we're...
490
00:41:32,826 --> 00:41:39,602
If we were to just say, Gemini, you do this pretty well, go ahead and pull out a timeline,
they're going to get a lot of it right.
491
00:41:39,602 --> 00:41:40,893
They're going to miss a lot.
492
00:41:40,893 --> 00:41:44,276
If we say, Gemini, we're going to give you one page at a time.
493
00:41:44,276 --> 00:41:45,247
Here's the full context.
494
00:41:45,247 --> 00:41:47,098
Here's exactly what I want you to look for.
495
00:41:47,098 --> 00:41:48,849
And here's the things you might get tripped up on.
496
00:41:48,849 --> 00:41:50,251
Here's how to solve it.
497
00:41:50,251 --> 00:41:51,802
And now go pull these out.
498
00:41:51,802 --> 00:41:53,703
Then we're going to get really good results.
499
00:41:53,844 --> 00:42:00,030
And so maybe in a couple of years, the tools will be very reliable in a general sense to
be able to do that.
500
00:42:00,030 --> 00:42:00,842
I think they're
501
00:42:00,842 --> 00:42:06,106
not that the raw LLMs aren't today, but we're not the only group doing timelines.
502
00:42:06,127 --> 00:42:07,668
YXLR does those really well.
503
00:42:07,668 --> 00:42:09,428
I'm sure other groups do as well.
504
00:42:09,428 --> 00:42:09,768
Yeah.
505
00:42:09,768 --> 00:42:15,448
So you need a layer of engineering on top to manage that today.
506
00:42:16,368 --> 00:42:18,168
That's interesting.
507
00:42:20,488 --> 00:42:27,148
What about building trust with AI in the firms?
508
00:42:27,548 --> 00:42:38,208
And this goes deeper than just within the firms, the clients ultimately, as you can see
with almost half still discouraging or prohibiting
509
00:42:38,208 --> 00:42:42,374
use, there's still a lack of trust with these tools.
510
00:42:42,374 --> 00:42:44,846
How do we bridge that gap?
511
00:42:45,934 --> 00:42:50,154
The trust gap can come from a few different, for a few different reasons.
512
00:42:50,154 --> 00:42:52,234
So one, it could be a security issue.
513
00:42:52,234 --> 00:42:54,434
Two, it could be a confidentiality issue.
514
00:42:54,434 --> 00:42:57,414
And then three, it could be like an accuracy hallucination issue.
515
00:42:57,414 --> 00:43:03,914
So from a security standpoint, you obviously want to make sure that the model's not
training on any information that you share with it.
516
00:43:03,914 --> 00:43:10,334
But most of the tools that are able to satisfy that requirement pretty easy.
517
00:43:10,334 --> 00:43:15,750
And even now, if you're using like a pro or plus account with .gbt, it's doing that as
well.
518
00:43:17,123 --> 00:43:21,466
You still have a lot of security holes that can happen for anything in the cloud.
519
00:43:21,466 --> 00:43:25,899
So it's helpful to see that the group is SOC 2 compliant and has that certification.
520
00:43:25,899 --> 00:43:33,294
It's helpful to ensure that the groups following best practice as far as encryption,
they're encrypting at rest and in transit.
521
00:43:33,575 --> 00:43:37,578
It's a nice to have, I think, to say that UPI iSrub as well.
522
00:43:37,578 --> 00:43:41,741
That's something you might want to look for if you're extra cautious for something
particularly sensitive.
523
00:43:43,694 --> 00:43:49,679
And so that's helping with security and for the most part, confidentiality as well.
524
00:43:49,679 --> 00:43:54,402
You might want to look for groups that set up double encryption or mutual encryption.
525
00:43:54,402 --> 00:44:01,728
So or end to end encryption, where they're able to encrypt the data to where even their
engineers can't see your data sets.
526
00:44:01,728 --> 00:44:04,600
That's possible and technically something that you can do.
527
00:44:04,600 --> 00:44:09,693
And so anything that's extremely sensitive, you might want to ask for that.
528
00:44:09,934 --> 00:44:10,695
But
529
00:44:10,695 --> 00:44:17,829
Based on if you do those two things, you should be in a pretty good position to where
you're meeting those requirements.
530
00:44:17,829 --> 00:44:28,815
From an accuracy and hallucination standpoint, to me the value is, the way you saw that
is, keep the client in the loop, make audit trails, and build things together.
531
00:44:28,876 --> 00:44:39,694
So if you have software that says, okay, these are the cases I used, click the link to see
the cases, double click to say, here's the material facts I relied on, here's the...
532
00:44:39,694 --> 00:44:42,594
quotes I relied on to generate this holding, all that stuff.
533
00:44:42,594 --> 00:44:52,254
If your software is able to do that, I think that it's able to really satisfy the concerns
that lawyers might have of, hold on, this might not even be real, is this something I can
534
00:44:52,254 --> 00:44:53,514
rely on?
535
00:44:53,514 --> 00:45:00,074
And you want that audit trail to be something that it's much faster to audit than to just
do the work from start to finish.
536
00:45:00,422 --> 00:45:00,712
Right.
537
00:45:00,712 --> 00:45:07,116
Because that's always the rub is like, am I really being more efficient if I have to go
back and double check everything?
538
00:45:07,116 --> 00:45:12,659
It really impacts the ROI equation when, when you have to do that.
539
00:45:12,659 --> 00:45:15,601
Well, this has been a really good conversation.
540
00:45:15,601 --> 00:45:20,824
I knew it would be just, uh, we we've had some good dialogue in the past.
541
00:45:20,824 --> 00:45:27,258
How do, before we wrap up here, how do people find out more about what you're doing at
Caledas legal?
542
00:45:27,662 --> 00:45:37,822
Yeah, check us out at caladisai.com, C-A-L-L-I-D-U-S-A-I.com, or shoot me a message at
justin at caladisai.com.
543
00:45:37,822 --> 00:45:39,202
And I really appreciate you having me on.
544
00:45:39,202 --> 00:45:40,602
This was a great talk.
545
00:45:40,602 --> 00:45:41,162
Thanks.
546
00:45:41,162 --> 00:45:42,263
Yeah, absolutely.
547
00:45:42,263 --> 00:45:43,924
All right, have a great weekend.
548
00:45:44,286 --> 00:45:45,507
All right, take care.
00:00:05,224
Justin, how are you this morning?
2
00:00:05,400 --> 00:00:06,792
Doing well, good morning.
3
00:00:06,803 --> 00:00:11,272
Yeah, I appreciate you jumping on here with me for a few minutes.
4
00:00:11,478 --> 00:00:12,565
Absolutely.
5
00:00:12,981 --> 00:00:15,581
So you and I connected at TLTF.
6
00:00:15,581 --> 00:00:28,261
I think we were having lunch and we were talking about AI and reasoning and you were
sitting next to me and chimed in and had some really good thoughts on that topic.
7
00:00:28,261 --> 00:00:38,561
And I think we've, we've covered that on previous episodes, but, um, you and I had another
conversation and I thought you had some really good insights to share.
8
00:00:38,561 --> 00:00:41,469
It sounds like you, you dive pretty deep.
9
00:00:41,469 --> 00:00:46,272
which it's always good to hear perspective from folks that really, really jump in.
10
00:00:46,272 --> 00:00:51,314
But before we jump into the content here, let's just get you introduced.
11
00:00:51,314 --> 00:00:55,736
So you currently lead, is it Calidis Legal AI?
12
00:00:56,677 --> 00:00:57,338
Okay.
13
00:00:57,338 --> 00:01:11,475
And your focus on pairing AI with lawyers to enhance core legal work, you're deep in AI
and ML, your former practicing attorney doing &A work.
14
00:01:11,549 --> 00:01:14,301
which I think is interesting.
15
00:01:14,301 --> 00:01:17,542
Tell us more about your background and what you're doing today.
16
00:01:17,976 --> 00:01:24,569
Yeah, started off practicing &A, corporate restructuring, ended up working for AT &T after
that for a while.
17
00:01:24,569 --> 00:01:29,791
And I led the AT &T legal transformation with our deputy GC.
18
00:01:29,791 --> 00:01:33,412
It was really successful and very interesting for me.
19
00:01:33,552 --> 00:01:38,224
The group looked at just understanding what are our attorneys doing?
20
00:01:38,224 --> 00:01:40,675
What are they doing that's not the highest priority?
21
00:01:40,675 --> 00:01:42,036
How can they reprioritize?
22
00:01:42,036 --> 00:01:48,012
How can we think about bringing in-house some work that we're using outside counsel for
that we can
23
00:01:48,012 --> 00:01:52,574
add more efficiency to the internal resources and have them do more of the work
internally?
24
00:01:52,574 --> 00:01:59,807
Where are we missing key insights and adding liability and where are we over-focused where
we shouldn't be?
25
00:01:59,807 --> 00:02:04,849
And how can we just rethink some of the workflows to where we're operating more
effectively?
26
00:02:04,849 --> 00:02:05,989
So I did that for a bit.
27
00:02:05,989 --> 00:02:11,111
And then my next gig was running a data science and engineering org.
28
00:02:11,111 --> 00:02:12,541
And we launched the first Gen.
29
00:02:12,541 --> 00:02:16,713
AI product for the company, the subsidiary, Direct TV.
30
00:02:16,770 --> 00:02:18,031
which was pretty informative.
31
00:02:18,031 --> 00:02:20,092
It was right after CHATGBT came out.
32
00:02:20,092 --> 00:02:30,759
And to me, the reason I started my startup was it was so obvious if you paired the legal
transformation work with the GEN.AI work, there was going to be a big opportunity.
33
00:02:30,759 --> 00:02:36,682
And I didn't think it was going to be something from day one that CHATGBT could just
replace lawyers jobs or anything like that.
34
00:02:36,682 --> 00:02:43,146
But I thought over time, this looked like a great starting block for something that could
be really powerful.
35
00:02:43,476 --> 00:02:44,436
Interesting.
36
00:02:44,436 --> 00:02:44,656
Yeah.
37
00:02:44,656 --> 00:02:56,896
And I remember you and I had some subsequent, we had some subsequent dialogue on LinkedIn
talking about, we were talking about the Stanford paper and how, um, and also the Apple
38
00:02:56,896 --> 00:03:01,156
intelligence paper on the GSM eight K battery of test.
39
00:03:01,156 --> 00:03:06,736
think they call it GSM eight K adaptive, which, um, GSM is grade school math.
40
00:03:06,736 --> 00:03:10,188
And then there were 8,000 questions that
41
00:03:10,260 --> 00:03:13,980
were used to evaluate how well AI performed.
42
00:03:14,720 --> 00:03:28,320
And Apple Intelligence did a study on that and changed the adaptive part is where they
changed minor details about the questions and to see how the models would perform.
43
00:03:28,460 --> 00:03:37,080
And they degraded quite a bit anywhere from, I think, at the low end, the degradation was
like 30%.
44
00:03:37,080 --> 00:03:38,476
So at the time,
45
00:03:38,579 --> 00:03:39,789
Maybe that was one.
46
00:03:39,789 --> 00:03:53,044
I forget exactly what model on the open AI side was the latest and greatest, um, all the
way down to like a 70 % degradation just by inserting irrelevant facts about the questions
47
00:03:53,044 --> 00:03:54,375
or changing the names.
48
00:03:54,375 --> 00:04:04,939
Um, that's, and as you pointed out, that has since been resolved and you know, which makes
me wonder, all right, did they do that?
49
00:04:05,019 --> 00:04:07,240
Did they game the system at all?
50
00:04:07,240 --> 00:04:08,370
Like, Hey, we've got a,
51
00:04:08,370 --> 00:04:19,675
we've got a weakness here, let's apply a band aid or was there a fundamental adaptation
that they implemented that helped?
52
00:04:19,675 --> 00:04:33,231
But I think you had, when you ran those same questions through wherever we were at that
point, maybe it was 4.0, you had different output, like it answered successfully.
53
00:04:33,231 --> 00:04:34,982
Am I remembering that correctly?
54
00:04:35,304 --> 00:04:37,215
Yeah, I think pretty close.
55
00:04:37,215 --> 00:04:43,398
I think that the Apple paper, I could be wrong, but I thought it was they used a battery
of models.
56
00:04:43,398 --> 00:04:48,360
The only one that was somewhat advanced was GPD for the original GPD for.
57
00:04:48,360 --> 00:04:55,283
And now we have quite a lot better models with three mini and GPD four point five.
58
00:04:55,283 --> 00:05:02,478
And if you look at like the benchmarks for what the benchmark I like the most is live
bench, which they.
59
00:05:02,478 --> 00:05:09,838
hide the questions, you can't really game the system, they change the questions regularly,
and they do a full battery of tests.
60
00:05:09,998 --> 00:05:15,478
GVD-4 scored about a 45, and the best models now score about a 76.
61
00:05:15,478 --> 00:05:19,278
So they've come a long way in those benchmark tests.
62
00:05:19,278 --> 00:05:30,018
And when you use the top models now to do the same questions that Apple had, and continue
to variable different pieces and add irrelevant information so that you're sure that it
63
00:05:30,018 --> 00:05:32,418
wasn't trained on any of that information.
64
00:05:32,610 --> 00:05:34,632
they're answering every question correctly.
65
00:05:34,632 --> 00:05:42,897
And so I had sent over a handful of examples yesterday just to kind of prove my point
empirically that this is testable, this is falsifiable.
66
00:05:42,977 --> 00:05:47,640
You can run the test yourself and see, no, the AI actually is able to solve these things.
67
00:05:47,640 --> 00:05:57,787
And as far as how they did it, I'm not sure all of the specifics, but I think a lot of it
is on the post-training side where they're teaching it to, after they've completed the
68
00:05:57,787 --> 00:06:02,242
pre-training, they're teaching the model how to be more effective with the information it
does have.
69
00:06:02,242 --> 00:06:04,427
And then the reasoners are very good.
70
00:06:04,427 --> 00:06:11,540
anything that they're able to do to add this reasoning capability is definitely enhancing
the answers.
71
00:06:11,540 --> 00:06:12,080
Yeah.
72
00:06:12,080 --> 00:06:17,180
And there's, there's so, there's, there's so much movement in the space.
73
00:06:17,180 --> 00:06:20,760
can't even keep up and I use it all the time.
74
00:06:20,760 --> 00:06:28,500
Like, I don't know, five, seven, 10 times a day, but you know, you've got grok three,
you've got Claude 3.7.
75
00:06:28,500 --> 00:06:38,040
You've now got, um, Oh three mini, uh, 4.5 apparently five or five GPT five is on the way.
76
00:06:38,040 --> 00:06:40,532
Um, you know, there's deep seek.
77
00:06:40,532 --> 00:06:44,155
There's whatever Alibaba's model is.
78
00:06:44,155 --> 00:06:55,445
mean, there's Mistral, there's Llama, like it's impossible to keep up unless you're doing
this full time, which, you know, I'm not.
79
00:06:55,846 --> 00:07:03,893
so I looked at some of the tests that you threw at 03mini and I thought it did really
well.
80
00:07:03,893 --> 00:07:08,040
didn't, I just kind of breeze through it, but why don't you kind of tell us some of the...
81
00:07:08,040 --> 00:07:11,002
some of the tests you threw at it and how it performed.
82
00:07:11,470 --> 00:07:12,070
Yeah, yeah.
83
00:07:12,070 --> 00:07:22,410
What I was trying to do is get a sense of how strong is the model for the types of things
that people are challenging to say AI is just not able to do these fairly simple tasks.
84
00:07:22,410 --> 00:07:35,110
And so I ran through a handful of examples, one being let's find a case that was not in
the training set and let's go have it find the case text online and then give us a full
85
00:07:35,110 --> 00:07:39,758
summary of like the holding and the material facts and so forth and give us legal
analysis.
86
00:07:39,758 --> 00:07:43,840
I think that was one people were concerned AI is just not capable of doing that.
87
00:07:44,201 --> 00:07:45,298
I read it.
88
00:07:45,298 --> 00:07:48,763
I thought it did a fantastic job summarizing the case.
89
00:07:48,963 --> 00:07:58,113
I gave it some questions like solve complex numerical problems that also deal with
linguistics that are hard to even understand the question being asked.
90
00:07:58,113 --> 00:07:59,329
It did well there.
91
00:07:59,329 --> 00:08:01,518
It does well in constrained poetry.
92
00:08:01,518 --> 00:08:07,746
It did well on just I was surprised it did well on world model questions where I basically
had it.
93
00:08:07,746 --> 00:08:16,920
run a scenario where I'm like dumping marbles out of a container and then putting super
glue in and then moving them around the house and seeing where they end up and what
94
00:08:16,920 --> 00:08:18,030
walking through the steps.
95
00:08:18,030 --> 00:08:26,134
And it did pretty well on pretty much all of those things to where my point of view is
between that and GBD 4.5.
96
00:08:26,134 --> 00:08:36,318
Now you pretty much have something that can reason like a smart human can reason and it
can help in a pretty wide variety of ways from a chat window.
97
00:08:36,318 --> 00:08:37,622
There's still some
98
00:08:37,622 --> 00:08:45,566
issues where these tools don't have full capabilities that a human would have outside of
the chat window where we can pull additional resources.
99
00:08:45,566 --> 00:08:52,904
But if you're resource constrained and you're just talking to somebody intelligent, it's
going to be pretty similar to what these models can do now.
100
00:08:52,904 --> 00:08:53,664
Yeah.
101
00:08:53,664 --> 00:08:59,427
And you and I kicked around the Stanford paper too, which at this point is almost a year
old.
102
00:08:59,427 --> 00:09:01,728
It's actually over a year old from their first iteration.
103
00:09:01,728 --> 00:09:06,990
They did a subsequent adjustment and re-release in, I think May of last year.
104
00:09:06,990 --> 00:09:16,324
But some of the challenges that the Stanford paper highlighted was, you know, they
categorized too many things as hallucinations in my opinion.
105
00:09:16,324 --> 00:09:20,786
But I think overall, I got a lot of insight from reading the paper.
106
00:09:20,840 --> 00:09:29,093
but that AI misunderstands holdings, it has trouble distinguishing between legal actors,
it has difficulty respecting the order of authority.
107
00:09:29,093 --> 00:09:37,264
Did you, it fabricates, did you, do you feel like these specific issues have gotten
better?
108
00:09:39,142 --> 00:09:40,623
They've gotten better.
109
00:09:40,843 --> 00:09:53,288
There's tests on hallucination rates for the different models, and the reasoners are about
half the hallucination rates of GPT-4, and in GPT-4.5 is also about half the hallucination
110
00:09:53,288 --> 00:09:54,888
rate of GPT-4.
111
00:09:54,949 --> 00:09:58,710
That said, hallucinations are still an issue for these models.
112
00:09:58,790 --> 00:10:03,552
Legal tech companies can solve those issues, and this is where domain-specific software
comes in.
113
00:10:03,552 --> 00:10:06,924
There's different algorithms you can run to help there.
114
00:10:06,924 --> 00:10:08,128
For example,
115
00:10:08,128 --> 00:10:13,399
Let's say that you have an issue that you tend to hallucinate cases out of the LLMs.
116
00:10:13,399 --> 00:10:15,640
Well, I think everyone's kind of solved the problem now.
117
00:10:15,640 --> 00:10:26,443
We've been doing this for a long time where you take the case from the LLM, you have a
list out of the relevant cases, and then you have a secondary external data source that
118
00:10:26,443 --> 00:10:29,544
has a list of all the cases that you have API access to.
119
00:10:29,544 --> 00:10:33,625
And then you check the Blue Book citation to say, is this a real case or not?
120
00:10:33,625 --> 00:10:37,974
And then if it is, let's go check relevancy to ensure this is relevant to the answer.
121
00:10:37,974 --> 00:10:39,865
And if you get two checks, you say, OK, good.
122
00:10:39,865 --> 00:10:41,207
This is a real case.
123
00:10:41,207 --> 00:10:42,037
It's relevant.
124
00:10:42,037 --> 00:10:46,500
This is going to be passed on to the user, and they're going to be able to access that
case.
125
00:10:46,601 --> 00:10:51,845
so this is where, yeah, it's true that I think they had a good insight.
126
00:10:51,845 --> 00:11:01,553
LLMs will continue to hallucinate and cause problems, but LLM software that has
domain-specific engineering on top of it can solve those issues.
127
00:11:01,553 --> 00:11:03,672
And then the other one being
128
00:11:03,672 --> 00:11:08,805
Hey, can LLMs actually like pull out the legal actors and how can they figure out the
holdings?
129
00:11:08,805 --> 00:11:19,402
That one, I think pretty well now with the top models, they're able to understand pretty
well the holdings and you can test it yourself and see whether that's true empirically
130
00:11:19,402 --> 00:11:21,193
yourself pretty easily.
131
00:11:21,194 --> 00:11:27,397
In all my tests, and we use this a lot and do a lot of evaluations, they're quite good for
the top models now.
132
00:11:27,412 --> 00:11:29,723
What about respecting the order of authority?
133
00:11:30,814 --> 00:11:32,755
That's another one that they get.
134
00:11:32,755 --> 00:11:37,158
You might have to prompt engineer it a bit, and this is where the domain software comes in
again.
135
00:11:37,158 --> 00:11:46,244
But if you prompt engineer it well, it fully understands that the Supreme Court is
superior to a state Supreme Court, the US Supreme Court versus state Supreme Court.
136
00:11:46,244 --> 00:11:51,947
It fully understands that that court's superior to a trial court and so forth.
137
00:11:52,567 --> 00:11:57,320
We use this every day, and this is something that it's able to do very consistently.
138
00:11:57,621 --> 00:12:04,902
So how would you assess the current state of AI capabilities in legal research and
analysis as we sit today?
139
00:12:05,390 --> 00:12:09,230
Yeah, I would say raw LLMs out of the box, quite bad.
140
00:12:09,230 --> 00:12:12,010
I wouldn't use it for legal research.
141
00:12:12,010 --> 00:12:14,590
And again, they're going to hallucinate everything.
142
00:12:14,590 --> 00:12:16,590
They're going to miss some insights.
143
00:12:16,590 --> 00:12:25,770
They're going to have instances where, because they don't have in the pre-training data
the full knowledge about all the cases and all the statute for that state, they're going
144
00:12:25,770 --> 00:12:32,830
to take majority rules and assume that those are right for that state, even though your
state might be playing with them or using a minority rule.
145
00:12:33,030 --> 00:12:35,052
And so there's going to be a bunch of issues.
146
00:12:35,052 --> 00:12:38,044
You'll get kind of poorly formatted responses.
147
00:12:38,044 --> 00:12:41,976
If you ask it to draft a full brief, it'll give you like two pages.
148
00:12:42,057 --> 00:12:44,318
All of those things are problems.
149
00:12:44,438 --> 00:12:51,603
If you use good domain specific software, though, these are all engineering problems that
are solvable by the legal tech companies.
150
00:12:51,603 --> 00:12:56,466
And a lot of us have started to or very substantially solve those issues.
151
00:12:56,827 --> 00:13:01,550
And so if you use good software, you can expect an extensive
152
00:13:01,550 --> 00:13:07,392
30 page brief, no case hallucinations, hopefully no holding hallucinations.
153
00:13:07,572 --> 00:13:10,094
You can expect that it's properly formatted.
154
00:13:10,094 --> 00:13:20,848
You can expect that it does go into the details regarding the state law that it's in
question, just looking at the legal authorities to pull out the insights so that it's not
155
00:13:20,848 --> 00:13:22,298
just relying on majority rules.
156
00:13:22,298 --> 00:13:25,640
So all of those things that I think good software is able to do.
157
00:13:25,640 --> 00:13:31,446
That said, I would strongly suggest that we focus on software that keeps the attorney in
the loop.
158
00:13:31,446 --> 00:13:34,929
and gets the, lets the lawyer audit the output.
159
00:13:34,929 --> 00:13:39,092
So I wouldn't want just, hey, here's the full, here's my fact pattern.
160
00:13:39,092 --> 00:13:44,295
I'm just going to let the AI often go off and run and just draft a full 30 page brief.
161
00:13:44,295 --> 00:13:46,997
I don't think that's a good solution right now.
162
00:13:46,997 --> 00:13:56,385
I think what the AI does well is it synthesizes large amounts of information, bubbles them
up to the lawyer and probably gets it right the vast majority of the time, but the lawyer
163
00:13:56,385 --> 00:13:59,660
is still going to make the judgment call about which direction to go.
164
00:13:59,660 --> 00:14:01,353
And then the lawyer says, yes, go here.
165
00:14:01,353 --> 00:14:03,237
Don't pursue this and so forth.
166
00:14:03,237 --> 00:14:07,385
And then you work together with the AI to get a great answer very fast.
167
00:14:07,385 --> 00:14:10,900
And that's where I would say the focus should be.
168
00:14:11,304 --> 00:14:13,916
Yeah, and I have seen it's been a couple of months.
169
00:14:13,916 --> 00:14:14,947
I think it was late last year.
170
00:14:14,947 --> 00:14:30,037
I saw a chart of the amount of, gosh, guess comprehension, I guess, for lack of a better
term of different sized rag prompts.
171
00:14:30,037 --> 00:14:33,520
So, and it trails off dramatically.
172
00:14:33,520 --> 00:14:39,476
The larger, you know, like there's some like, like, Gemini two has a
173
00:14:39,476 --> 00:14:41,876
a 1 million token context window, right?
174
00:14:41,876 --> 00:14:43,656
Which is pretty significant.
175
00:14:43,656 --> 00:14:47,156
I think Claude is a couple hundred thousand GPT a little lower.
176
00:14:47,156 --> 00:14:48,396
These are always moving.
177
00:14:48,396 --> 00:14:51,656
I'm, I'm, I'm, I might not be the current state of things.
178
00:14:51,656 --> 00:14:51,796
Yeah.
179
00:14:51,796 --> 00:14:52,656
Yeah.
180
00:14:52,656 --> 00:15:03,436
Um, but I, I saw a kind of a performance metric that through large amounts of documents
through rag at these models.
181
00:15:03,436 --> 00:15:08,404
And it trailed off pretty substantially in terms of missing, you know, like
182
00:15:08,404 --> 00:15:16,375
key facts during summarization as the number of tokens increased.
183
00:15:16,375 --> 00:15:19,770
Are we getting any better there or is that still a limitation?
184
00:15:20,024 --> 00:15:23,096
Getting better is a limitation, but it can be engineered around.
185
00:15:23,096 --> 00:15:24,257
This is another one.
186
00:15:24,257 --> 00:15:31,641
What we've had to do this where client has say a 40 page document and it may be a hundred
40 page documents.
187
00:15:31,641 --> 00:15:34,783
And we're trying for each one to pull out all the payment terms.
188
00:15:34,783 --> 00:15:43,268
This is an area where out of the box, LLMs do pretty poorly without a tremendous amount of
prompt engineering and kind of just general engineering.
189
00:15:43,268 --> 00:15:46,786
So what we need to do is break down the problems to where.
190
00:15:46,786 --> 00:15:55,038
we're only pushing through like a page at a time, maybe a little bit more than that,
giving it enough context and then giving very detailed prompts on exactly what to look for
191
00:15:55,038 --> 00:15:56,550
and what not to look for.
192
00:15:56,590 --> 00:16:01,312
You have to do all of that in a pretty domain specific way to get good answers.
193
00:16:01,312 --> 00:16:07,915
And so I think if you're just using a raw LLM without a lot of engineering work, they're
not gonna do very well here.
194
00:16:10,116 --> 00:16:11,056
Chunking's a big part of that.
195
00:16:11,056 --> 00:16:15,098
Yeah, you'll chunk the paper and then run in parallel.
196
00:16:15,310 --> 00:16:22,978
like a hundred different agents basically to each have their one page to review and then
summarize in groups.
197
00:16:23,142 --> 00:16:24,312
Interesting.
198
00:16:24,413 --> 00:16:24,723
Yeah.
199
00:16:24,723 --> 00:16:37,703
You know, another challenge just as a, I'm not, I don't consider myself an AI expert, more
of an enthusiast, but I, I, I really do put, um, AI through its paces on real world stuff
200
00:16:37,703 --> 00:16:41,272
mostly and find a huge variation.
201
00:16:41,272 --> 00:16:43,537
Um, I also find it quite confusing.
202
00:16:43,537 --> 00:16:48,460
All the fragmentation, like just within the open AI world, just the number.
203
00:16:48,460 --> 00:16:51,006
And I know five is supposed to solve that.
204
00:16:51,006 --> 00:17:00,661
but I'm still really curious how that's gonna, how that's gonna work because you know,
today, I mean, what do you have eight drop down?
205
00:17:00,661 --> 00:17:01,261
You know what I mean?
206
00:17:01,261 --> 00:17:03,822
Like eight models you can choose from.
207
00:17:04,483 --> 00:17:05,963
that's, that's a big impediment.
208
00:17:05,963 --> 00:17:15,748
Somebody who pays really close attention to this still, I still don't have a firm handle
on, you know, when to use what, um, it seems like a moving target.
209
00:17:16,526 --> 00:17:19,707
Yeah, and it's pretty tough for a casual user.
210
00:17:19,707 --> 00:17:24,169
You're far, far more knowledgeable about this than the average user.
211
00:17:25,730 --> 00:17:30,512
Basically, the factors you need to consider are A, how fast do need a response?
212
00:17:30,512 --> 00:17:33,413
B, what's the context window that I need?
213
00:17:33,593 --> 00:17:37,135
C, do I need a model with a lot of pre-training data?
214
00:17:37,135 --> 00:17:43,137
So as in, it has a lot of knowledge that I need to pull from, or do I need something more
that's reasoning well?
215
00:17:43,137 --> 00:17:45,720
And based on those factors, you can choose the right model.
216
00:17:45,720 --> 00:17:53,737
But I'm in this every day and this is my business and so I'm familiar for your casual
user, you have no idea which one to use.
217
00:17:53,737 --> 00:18:02,914
And yeah, GPT-5 though will help with that to where it's going to basically just figure
out your question and then suggest the best model internally and then just give you that
218
00:18:02,914 --> 00:18:05,386
best model without you needing to even think about it.
219
00:18:05,386 --> 00:18:15,016
I think ironically, a lot of the time GPT-5 will be basically GPT-4 where they're just
gonna say, well, GPT-4 is good enough to answer this, go ahead and move forward.
220
00:18:15,016 --> 00:18:17,789
because most questions that gets are actually pretty easy.
221
00:18:17,789 --> 00:18:22,272
There's a handful of hard questions that people push on every now and then.
222
00:18:22,353 --> 00:18:31,220
That said, again, I would stress that the legal tech groups are a lot better for solving
domain-specific tasks than these models are anyway.
223
00:18:31,220 --> 00:18:37,426
And what's happening is basically we're standing on top of the best models for the
specific task we're working on.
224
00:18:37,426 --> 00:18:43,020
We're choosing the best one, knowing exactly the that the tool that we need to use.
225
00:18:43,214 --> 00:18:52,730
And oftentimes we're using a combination of two or three, sometimes even from different
groups, to where that combination, plus a lot of prompt engineering and other engineering
226
00:18:52,730 --> 00:18:55,761
on top of it, can yield pretty good results.
227
00:18:56,242 --> 00:19:07,748
And I think what you'll see is, in general, the legal tech companies are going to be about
two years ahead of the raw LLMs, as far as their ability to practice law more or less, or
228
00:19:07,748 --> 00:19:11,850
support someone who's practicing law to be an amplifier of that person.
229
00:19:12,003 --> 00:19:23,766
And so in general, and I don't think it's very user friendly just to work from a chat
window versus a nice template that's easy to follow just like a webpage.
230
00:19:24,210 --> 00:19:24,801
Yeah.
231
00:19:24,801 --> 00:19:32,266
So the model selection is that has to be done algorithmically, correct?
232
00:19:32,487 --> 00:19:36,520
what does the process look like for selecting the right model?
233
00:19:36,520 --> 00:19:38,412
Just maybe in how you do it.
234
00:19:38,412 --> 00:19:40,322
I think OpenAI is somewhat opaque.
235
00:19:40,322 --> 00:19:43,085
I'm not sure that they provide transparency around that.
236
00:19:43,085 --> 00:19:49,180
But just in broad brushstrokes, like, how does it determine which path to take?
237
00:19:49,870 --> 00:19:55,210
Yeah, for us, we decide, we don't do it in a full algorithm way.
238
00:19:55,210 --> 00:20:00,750
We have across our app probably 100, 200 different API calls.
239
00:20:00,750 --> 00:20:05,870
And for each one of those, we have a general view on, is this going to need speed?
240
00:20:05,870 --> 00:20:09,810
Is it going to need the ability to instruction follow really well?
241
00:20:09,810 --> 00:20:13,730
Is it going to need high pre-training knowledge and so forth?
242
00:20:13,730 --> 00:20:18,702
And then based on those factors, we'll say it's probably one of these three models that we
should use.
243
00:20:18,702 --> 00:20:26,616
and then we run evals and anywhere important to say, okay, let's actually see what score
these models get on our evaluations.
244
00:20:26,616 --> 00:20:40,634
And so that could be an evaluation, for instance, of how many cases are they returning
that are accurate out of the, where we'll try to kind of do a full analysis on, okay,
245
00:20:40,634 --> 00:20:42,855
here's an evaluation question.
246
00:20:42,855 --> 00:20:47,822
Let's have real attorneys do the work and figure out what cases you would want to cite.
247
00:20:47,822 --> 00:20:55,782
And then as you figured out what cases you want to cite, we're going to score these cases
to say, this is like a five, this case is a three, this is a one.
248
00:20:55,782 --> 00:21:03,262
And as far as importance, now let's have all the models do their work and give the case
that they think are most relevant and we're going to score those.
249
00:21:03,262 --> 00:21:10,862
So we have a lot of those automations in place and then whenever a new model comes out, we
just run it through the system of tests and say, okay, it's going to be good here, here
250
00:21:10,862 --> 00:21:11,422
and here.
251
00:21:11,422 --> 00:21:12,742
It's not going to be very good here.
252
00:21:12,742 --> 00:21:14,790
And we can move forward that way.
253
00:21:15,218 --> 00:21:15,668
Interesting.
254
00:21:15,668 --> 00:21:22,781
Yeah, it seems like, you know, finding the right balance between speed and quality is the
sweet spot, right?
255
00:21:22,781 --> 00:21:31,535
You can't slow the process down too much or you're going to impact efficiency, but you
need to, it's striking that balance.
256
00:21:31,535 --> 00:21:34,766
It seems like is the, is the strategy, is that accurate?
257
00:21:35,148 --> 00:21:37,439
Yeah, it's a fun challenge.
258
00:21:37,638 --> 00:21:47,382
A lot of what we do is we'll have a seven part workflow, for instance, and when the user
does step two, we're kicking off a pretty slow model that's really smart.
259
00:21:47,382 --> 00:21:53,843
And then when they get to step six, that slow model is done with the analysis, and then
it's inserting the answer for the user.
260
00:21:53,843 --> 00:21:59,805
Then it's done all that work in the background while they're filling out other information
that's not as relevant to the answer.
261
00:22:00,025 --> 00:22:02,106
And so we do a lot of that.
262
00:22:02,166 --> 00:22:02,732
then
263
00:22:02,732 --> 00:22:06,684
Sometimes you just use the fast model because it's a fairly easy answer.
264
00:22:06,764 --> 00:22:07,955
So we'll do some of that.
265
00:22:07,955 --> 00:22:19,521
And it's just an interesting game of how do we think about the legal implications, how do
we think about the AI driven implications and the technology implications, and then how do
266
00:22:19,521 --> 00:22:25,384
we think about a good user experience and pair all that together to give something that
makes sense cohesively.
267
00:22:25,396 --> 00:22:26,096
Yeah.
268
00:22:26,096 --> 00:22:29,596
You know, recently I've seen interesting benchmarks.
269
00:22:29,596 --> 00:22:32,596
Was it Val's AI that put together?
270
00:22:32,596 --> 00:22:33,576
I'm not sure if you've seen it.
271
00:22:33,576 --> 00:22:51,216
It's just been maybe in the last week or so that, I don't know if it was a benchmark or a
study that talked about real scenarios, legal workflows in which, measured efficiency.
272
00:22:51,436 --> 00:22:52,850
Again, there's so much stuff.
273
00:22:52,850 --> 00:22:54,411
flying at you these days.
274
00:22:54,411 --> 00:23:05,207
don't have it memorized, but it seems like there's more of a focus now on legal specific
use cases and how these models perform in those scenarios.
275
00:23:05,247 --> 00:23:07,378
Are you seeing more of that now?
276
00:23:07,922 --> 00:23:09,263
I need to check out that study.
277
00:23:09,263 --> 00:23:11,725
actually haven't seen it.
278
00:23:11,725 --> 00:23:15,009
We love the idea of doing more legal benchmarks.
279
00:23:15,030 --> 00:23:21,136
That's an area where we've really taken a lot of time to try to build a tool that's useful
from that perspective.
280
00:23:21,136 --> 00:23:25,280
And I think it's useful to the end user as well.
281
00:23:25,301 --> 00:23:27,422
But no, I haven't seen that specific study.
282
00:23:27,422 --> 00:23:30,836
I do like the idea though of pursuing that.
283
00:23:31,775 --> 00:23:37,650
Yeah, it's, it's this stuff is March four, three days ago, um, is the post I saw on it.
284
00:23:37,650 --> 00:23:39,761
And again, it's just so hard to keep up with.
285
00:23:39,761 --> 00:23:45,216
And there's so much that even, you know, after you read it, three more things fly at you.
286
00:23:45,216 --> 00:23:52,962
it's like, so what about, um, AI strategies in general for law firms?
287
00:23:52,962 --> 00:24:00,888
So, you know, I, I have been critical of law firms that seem to
288
00:24:01,170 --> 00:24:07,614
immediately deploy tactically versus figuring out strategically what they want to do.
289
00:24:07,614 --> 00:24:09,986
And strategy includes a lot of different things.
290
00:24:09,986 --> 00:24:15,059
It can include where to first focus your AI efforts.
291
00:24:15,059 --> 00:24:21,714
It could include the organizational design within the firm that's going to support those
efforts.
292
00:24:22,395 --> 00:24:27,088
It can define the risk tolerance that the firm is willing to take.
293
00:24:27,088 --> 00:24:30,610
know, cause we still have, you know, we still have
294
00:24:30,610 --> 00:24:35,342
I saw an interesting study from the legal value network.
295
00:24:35,342 --> 00:24:39,484
They do an LPM survey every year.
296
00:24:39,484 --> 00:24:43,905
one question that was asked was what percentage of your clients?
297
00:24:44,426 --> 00:24:47,747
So they talked to, think 80 law firm GCs.
298
00:24:48,608 --> 00:24:57,431
And what percentage of your clients either discourage or prohibit the use of AI in their
matters?
299
00:24:57,431 --> 00:24:59,592
And the number was 42%.
300
00:24:59,592 --> 00:25:10,258
which seems shockingly high because I saw another study from the Blikstein group that it's
the LDO law department.
301
00:25:10,258 --> 00:25:12,300
I forget what the acronym stands for.
302
00:25:12,300 --> 00:25:23,766
And anyway, almost 60 % of the law firm client GCs that they talked to said that law firms
aren't using technology enough to drive down costs.
303
00:25:23,766 --> 00:25:26,378
And those are two very conflicting data points.
304
00:25:26,378 --> 00:25:27,004
It's like,
305
00:25:27,004 --> 00:25:33,244
OK, you want me to drive down costs, but you got OCG's that prevent me from implementing
the technology.
306
00:25:33,244 --> 00:25:39,738
I can't use them on your matters like I don't know what you feel like that's a disconnect
in the marketplace still.
307
00:25:40,152 --> 00:25:41,863
I think it's very bimodal.
308
00:25:41,863 --> 00:25:52,911
I think that you have a lot of attorneys on one side or the other where some really want
to embrace the newest technology all in and others are very cautious about it.
309
00:25:52,911 --> 00:25:57,534
And there's not as many groups in the middle as you'd expect.
310
00:25:57,534 --> 00:25:59,795
And so it's not like your normal bell curve.
311
00:25:59,936 --> 00:26:01,957
And so I think that's what's going on.
312
00:26:01,957 --> 00:26:09,474
And I think the organizational strategy and kind of transformation lens is a really tough
and interesting question for
313
00:26:09,474 --> 00:26:11,715
organizational leaders to think about.
314
00:26:11,715 --> 00:26:14,436
I think we probably disagree a little bit on this one.
315
00:26:14,436 --> 00:26:24,880
I have more of an engineering mindset on it where I think the way to go is start small and
iterate and then run in parallel your strategy.
316
00:26:25,500 --> 00:26:35,765
We've just seen so many instances where a company really wants to get into AI, they're
strategizing about it and a year later they haven't really done anything and they don't
317
00:26:35,765 --> 00:26:37,560
really get it because they have
318
00:26:37,560 --> 00:26:41,113
They're senior leaders doing strategy stuff without being very hands-on.
319
00:26:41,113 --> 00:26:42,774
They don't really get it.
320
00:26:42,774 --> 00:26:53,582
I think if you take the time to have a small group of people that are really invested in
using AI every day, try out some leading tools, go in the right direction.
321
00:26:53,582 --> 00:26:55,363
Don't do anything just crazy.
322
00:26:55,363 --> 00:26:57,875
And then just don't put in any client information.
323
00:26:57,875 --> 00:27:04,640
Just do everything based on just very kind of random or kind of synthesized or sanitized
data.
324
00:27:04,666 --> 00:27:10,570
I think if you do that, you can get a pretty good sense of, now I get what people are
using this for.
325
00:27:10,570 --> 00:27:15,773
We could use it here, here, and here, but I can't use it in this area because it's going
to have issues.
326
00:27:15,773 --> 00:27:20,796
Or this is decent software in this way, but not in this other way.
327
00:27:20,796 --> 00:27:23,358
Now I can make informed strategic decisions.
328
00:27:23,358 --> 00:27:29,161
I think that if you kind of do that pairing, that's probably what I think would be the
best approach.
329
00:27:29,202 --> 00:27:29,482
Yeah.
330
00:27:29,482 --> 00:27:31,403
Well, we're, we're aligned on part of that.
331
00:27:31,403 --> 00:27:43,798
So I think that striking the right risk reward balance is key and, that should be the
number one, um, driver of the approach.
332
00:27:43,838 --> 00:27:44,398
Right.
333
00:27:44,398 --> 00:27:54,863
I think that jumping right in on the practice side and, know, going whole hog with
attorneys who have super high opportunity costs and low tolerance for missteps is a
334
00:27:54,863 --> 00:27:55,643
mistake.
335
00:27:55,643 --> 00:27:56,943
So we're aligned on that.
336
00:27:56,943 --> 00:27:58,549
I guess where I get hung up,
337
00:27:58,549 --> 00:28:03,480
is that, I'm going to quote another study here, or survey.
338
00:28:03,480 --> 00:28:13,753
Thomson Reuters did one, the professional services, gen AI survey came out late last year
and only 10 % of law firms, one out of 10 have a gen AI policy.
339
00:28:13,913 --> 00:28:19,114
So in order to write a policy, I think you need a strategy first, right?
340
00:28:19,114 --> 00:28:26,674
A policy is an outcome, is an outgrowth of a strategy, but nine out of 10 don't have one.
341
00:28:26,674 --> 00:28:37,690
So what you have now is law firm users who don't have proper guidance on, hey, what can I
use the public models for?
342
00:28:37,690 --> 00:28:38,691
Can I use them at all?
343
00:28:38,691 --> 00:28:40,412
Do I use it on my phone?
344
00:28:40,412 --> 00:28:44,254
What do I use it my personal laptop when I'm not connected to the VPN?
345
00:28:44,254 --> 00:28:49,196
Like all of those questions not being answered, I think creates unnecessary risk.
346
00:28:49,196 --> 00:28:56,370
Maybe at a certain, you know, altitude defining the strategy and incrementally
347
00:28:56,370 --> 00:29:00,529
working your way down more granularly, maybe that's the right balance.
348
00:29:00,684 --> 00:29:01,935
I think we're in sync there.
349
00:29:01,935 --> 00:29:07,059
think it's crazy that you wouldn't have a GNI policy at this point.
350
00:29:07,059 --> 00:29:12,604
think our company at Direct TV, I think we had one three months in after ChatGBT.
351
00:29:12,604 --> 00:29:20,371
I was on the executive board there and we thought just immediately we have to give the
company employees something to give some guidance.
352
00:29:20,371 --> 00:29:22,132
And yeah, I think you're exactly right.
353
00:29:22,132 --> 00:29:29,578
You start high, you make it a little bit overly restrictive, then you dig into the details
and you realize, okay, here's where we can open up.
354
00:29:29,622 --> 00:29:38,376
a little bit more, here's where we can be a little bit less or more forgiving on the use
of the tools and just be smart about that.
355
00:29:38,858 --> 00:29:44,415
But yeah, if you're working in a law firm, you don't have a strategy, I think you
definitely should start working on that right away.
356
00:29:44,415 --> 00:29:45,125
Yeah.
357
00:29:45,125 --> 00:29:55,458
And what do you, what do you think about this is another, you know, opinion of mine, some
may agree, disagree, but I see a lot of law firm C-suite and director level roles, both on
358
00:29:55,458 --> 00:30:06,651
the innovation and AI world that are brought in without any sort of strategy and
essentially brought in to just kind of figure it out.
359
00:30:06,731 --> 00:30:11,922
And normally I like an agile approach, but the problem with
360
00:30:12,264 --> 00:30:28,148
this approach in law firms is there is those resources are typically not sufficiently
empowered to make change and law firm decision making is so friction heavy that it feels
361
00:30:28,148 --> 00:30:31,329
like you're setting these leaders up.
362
00:30:31,449 --> 00:30:36,091
You're not setting them up for success if because the tone has to be set at the top,
right?
363
00:30:36,091 --> 00:30:40,472
Again, around risk taking around where they want to.
364
00:30:41,556 --> 00:30:44,436
add value within the business.
365
00:30:45,516 --> 00:30:56,096
know, just all of these things that need to happen at the senior, at the most senior level
and bringing somebody in, even if it's a C-suite, but at the director level, like, do
366
00:30:56,096 --> 00:31:01,896
really think this person's going to have the political capital to make recommendations and
those get it?
367
00:31:01,896 --> 00:31:03,136
How long is that going to take?
368
00:31:03,136 --> 00:31:06,376
Like they'll be there three years before anything gets, I don't know.
369
00:31:06,376 --> 00:31:08,776
Do you have any thoughts on the sequence?
370
00:31:09,452 --> 00:31:10,442
A couple thoughts.
371
00:31:10,442 --> 00:31:20,055
I think it's a tough problem for one in the fact that you have usually have a lot of
partners and managing partners that are making decisions, decisions collectively.
372
00:31:20,055 --> 00:31:24,906
That's just inherently harder to kind of move the ship and all that.
373
00:31:25,026 --> 00:31:35,499
That said, I would say when we speak with most of the senior leaders at firms, I don't
think they're that deep on what's possible with TENAI, how the value of it is very
374
00:31:35,499 --> 00:31:37,450
specific some implementation any of that.
375
00:31:37,450 --> 00:31:38,850
What I'd recommend is
376
00:31:38,850 --> 00:31:50,098
Think about the core values that you care about, like risk versus the impact to your
business from an acceleration perspective, or the ability to add more insight, and all
377
00:31:50,098 --> 00:31:54,441
those high level values with maybe confidentiality and security and all that.
378
00:31:54,441 --> 00:32:00,726
And just in a very general sense, align at the highest level on what trade-offs you wanna
make.
379
00:32:00,726 --> 00:32:06,562
And then once you have that general view, then empower somebody who is
380
00:32:06,562 --> 00:32:16,665
very knowledgeable in the area to give very specific recommendations of, given what you
said from a value standpoint, here's how we can implement an end-to-end strategy around
381
00:32:16,665 --> 00:32:21,626
Gen.AI that makes sense and is aligned with what you're guiding me on.
382
00:32:21,626 --> 00:32:32,615
And then I think in parallel, I would really try to have some subset of users be very
engaged in using a tool and getting a good sense and getting learnings from that and
383
00:32:32,615 --> 00:32:35,630
having the groups present jointly.
384
00:32:35,795 --> 00:32:37,258
to the managing partners.
385
00:32:37,258 --> 00:32:39,723
I think that's probably a good recipe for success.
386
00:32:39,804 --> 00:32:40,324
Yeah.
387
00:32:40,324 --> 00:32:54,728
And, know, I have advocated for bringing consultants in for that part of the journey, just
because I worry that bringing in, you know, again, a director level role to manage this,
388
00:32:54,728 --> 00:33:05,311
um, is just a tough, a tougher sell than if, you know, the executive committee brings in
consultants who, and you know what, there's a gap in the marketplace right now.
389
00:33:05,311 --> 00:33:08,776
There's not people like you who really know this stuff.
390
00:33:08,776 --> 00:33:10,677
are sitting in a seat like yours.
391
00:33:10,677 --> 00:33:23,704
There's so much capital being deployed in this area of tech that if you have these
skillsets, going out and selling your time hourly is not the best way to capture economic
392
00:33:23,704 --> 00:33:24,465
value.
393
00:33:24,465 --> 00:33:27,276
It's to do something like you're doing with a startup.
394
00:33:27,276 --> 00:33:34,190
And as a result, I think there's a big gap in the consulting world with people who really
know their stuff.
395
00:33:34,190 --> 00:33:36,731
So I do sympathize.
396
00:33:36,731 --> 00:33:38,365
Do you see that gap?
397
00:33:38,365 --> 00:33:39,354
as well.
398
00:33:40,280 --> 00:33:41,780
think we're aligned there.
399
00:33:41,780 --> 00:33:44,811
It's a really tough problem for law firms because of that.
400
00:33:45,252 --> 00:33:56,875
I mean, one thing you could try to do is work with a leader of a vendor and just say, hey,
look, I can't use your software, but we'd love to form a longer term relationship over
401
00:33:56,875 --> 00:33:57,615
time.
402
00:33:57,615 --> 00:34:06,138
And can you just give us some general guidance on how we can be effective and know that
that person is going be a little bit biased is what one thing you can do.
403
00:34:07,375 --> 00:34:16,683
I do think that trying to find the right consultant, that there are some out there that
you might be able to find one, but it's tough and you might need to just rely on finding
404
00:34:16,683 --> 00:34:23,248
your most tech forward partner to take a lead position and say, hey, you've got to get
really deep on this stuff.
405
00:34:23,248 --> 00:34:35,798
And I think one thing you need to be cautious about is if you find someone who's not very
kind of forward from a transformation perspective, they're going to move very slowly.
406
00:34:35,798 --> 00:34:40,825
relative to somebody who's just like, hey, we need to stop everything and figure out how
to do this effectively.
407
00:34:40,825 --> 00:34:44,570
That person's gonna have enough friction thrown at them to slow them down anyway.
408
00:34:44,570 --> 00:34:46,642
But I would start with someone like that.
409
00:34:46,642 --> 00:34:48,373
Yeah, that makes sense.
410
00:34:48,373 --> 00:34:53,214
lot of partners still have books of business.
411
00:34:53,975 --> 00:34:57,276
It's a tough problem for sure.
412
00:34:57,276 --> 00:34:58,377
No easy answers.
413
00:34:58,377 --> 00:35:06,780
How should law firms think about balancing efficiency gains and the impact to the billable
hour?
414
00:35:07,406 --> 00:35:15,586
Yeah, this is one where we get all the time, Okay, maybe someday or today your software is
good enough to where you're adding efficiency.
415
00:35:15,586 --> 00:35:17,146
I'm just going to build less, right?
416
00:35:17,146 --> 00:35:19,226
So why do even want this software?
417
00:35:19,686 --> 00:35:20,726
A few thoughts on that.
418
00:35:20,726 --> 00:35:25,366
One, in a lot of cases, attorneys aren't always billing by the billable hour.
419
00:35:25,366 --> 00:35:33,986
It could be contingency, they could be in-house, that it could be doing a cost per X type
of model where it's like I'm going to charge you per demand letter I write or something
420
00:35:33,986 --> 00:35:34,670
like that.
421
00:35:34,670 --> 00:35:42,970
For those that do need to do the billable hour, which is the majority of attorneys, my
view is that it's kind of like computers.
422
00:35:43,310 --> 00:35:53,610
It's not like after the computer came out 10 years later, were still spending most of
their time going to law libraries and manually checking out books and reading through
423
00:35:53,610 --> 00:35:53,950
books.
424
00:35:53,950 --> 00:35:55,810
It's just not as efficient.
425
00:35:56,010 --> 00:35:59,510
What will happen is that the market will all move toward AI.
426
00:35:59,510 --> 00:36:02,606
Then if you're the one laggard who's not using it at all,
427
00:36:02,606 --> 00:36:04,586
it's just going to be pretty obvious.
428
00:36:04,586 --> 00:36:12,666
Groups are going to know about that and they're not going to use you because you produce
less legal work than the alternative groups.
429
00:36:13,446 --> 00:36:15,546
And so that's where I see the market going.
430
00:36:15,546 --> 00:36:18,665
The other benefit is lawyers write off a lot of their time.
431
00:36:18,665 --> 00:36:22,866
I mean, if you work a 10 hour day, you might bill six and a half hours on average.
432
00:36:22,866 --> 00:36:29,046
And a lot of that time is because you're doing background legal research work or
background work to get up to speed.
433
00:36:29,046 --> 00:36:30,626
does that really well.
434
00:36:30,732 --> 00:36:34,865
your goal as a law firm would probably be to have higher revenue per attorney.
435
00:36:34,865 --> 00:36:38,527
And if attorneys are billing a higher percentage of their time, you're meeting that goal.
436
00:36:38,527 --> 00:36:42,150
So I think that there's a lot of talk about the billable hour.
437
00:36:42,150 --> 00:36:43,931
And I think it's not going away.
438
00:36:43,931 --> 00:36:48,434
Maybe some on the margins, maybe there's some changes.
439
00:36:48,434 --> 00:36:53,678
But I think that lawyers are going to want to be efficient.
440
00:36:53,678 --> 00:36:56,660
And over time, they're going to lean on these tools.
441
00:36:56,660 --> 00:37:00,248
think the hesitancy with AI has been more
442
00:37:00,248 --> 00:37:04,983
There's a lot of traps and a lot of just, this wasn't very good type of outputs.
443
00:37:04,983 --> 00:37:10,918
And I think that the industry is getting pretty close to where those types of issues are
going away pretty fast.
444
00:37:10,918 --> 00:37:11,828
Yeah.
445
00:37:12,009 --> 00:37:16,651
Well, what about high value use cases in, in, on the practice side?
446
00:37:16,651 --> 00:37:22,173
Like I know you and I talked about, document review and timeline creation.
447
00:37:22,173 --> 00:37:24,831
And I thought the timeline creation wasn't, was an interesting one.
448
00:37:24,831 --> 00:37:37,030
I I'm not a lawyer and I don't know how often that scenario comes into play, but any
thoughts on, know, where, where the high value use cases are within the practice today?
449
00:37:37,474 --> 00:37:49,104
Yeah, the areas that I think are most interesting are where AI can synthesize very large
amounts of data and get a pretty much fully accurate answer almost every time.
450
00:37:49,545 --> 00:37:54,250
And so a couple of areas that really make sense, like you mentioned, timelines.
451
00:37:54,250 --> 00:38:03,286
You can ingest all of your documents, like a discovery set that's all relevant documents,
throw that into the AI.
452
00:38:03,286 --> 00:38:09,872
at say, hey, pull out all the relevant pieces and create a timeline based on that, and
then use that to draft a statement of facts.
453
00:38:09,872 --> 00:38:11,773
That's gonna be pretty good.
454
00:38:11,773 --> 00:38:15,517
And that's not that hard to set up to do really well.
455
00:38:15,517 --> 00:38:26,486
I've been seeing a lot of users use our tool and are just like, wow, this saved a ton of
time before I was kind of nervous about letting AI answer legal research questions.
456
00:38:26,486 --> 00:38:29,528
But when I see this, this is super useful.
457
00:38:29,629 --> 00:38:31,660
Very similar concept with doc review.
458
00:38:31,660 --> 00:38:41,753
you can automate your doc review and put in hundreds of thousands of pages of files that
AI is looking through to see whether it's relevant based on the context you give it and
459
00:38:41,753 --> 00:38:43,693
based on what you're asking to search for.
460
00:38:43,693 --> 00:38:51,415
And a very high percentage of the actually relevant files will be pulled out, prioritized,
and then synthesized in summary.
461
00:38:51,516 --> 00:38:58,898
Those types of tools are extremely useful where they might save thousands of hours of time
to get your first pass in doc review.
462
00:39:00,102 --> 00:39:06,428
I would say that the error rate is pretty similar to humans at this point, if for a
well-engineered software.
463
00:39:06,428 --> 00:39:18,799
And there's other things that are similar, like you can do where you have to have an
expert say, here's all of the information I relied on before I go take the stand.
464
00:39:18,799 --> 00:39:26,446
And can you create the reliance list based on all of these 200 files that I uploaded to
your system?
465
00:39:26,446 --> 00:39:36,066
and said, are the pieces I relied on, it might take an attorney 100 hours to build that
tool, we'll do that with 100 % precision if it's well engineered, and you've just saved
466
00:39:36,066 --> 00:39:36,906
that time.
467
00:39:36,906 --> 00:39:45,346
So those areas are the ones that I'd say are the most interesting that I would really
recommend groups try out for a good vendor.
468
00:39:45,346 --> 00:39:49,666
And then there's others where I think legal research is a really hard problem.
469
00:39:49,666 --> 00:39:55,054
It's the first one we started tackling, but just think about all the decisions that the
lawyer needs to make.
470
00:39:55,054 --> 00:39:59,814
when they do legal research, they're thinking about, what kind of, is this a motion to
dismiss?
471
00:39:59,814 --> 00:40:01,534
Is it a motion for summary judgment?
472
00:40:01,594 --> 00:40:03,294
Is it a trial?
473
00:40:03,694 --> 00:40:06,354
Am I drawing, doing the original complaint?
474
00:40:06,354 --> 00:40:09,954
I'm gonna have very different cases that I use in all of those scenarios.
475
00:40:09,954 --> 00:40:13,854
I've gotta understand the relevancy of the cases, the procedural posture of the case.
476
00:40:13,854 --> 00:40:22,874
I need to think about whether in that case the court ruled for or against my client or the
person that's analogous to my client.
477
00:40:22,874 --> 00:40:24,662
And there's so many factors.
478
00:40:24,662 --> 00:40:32,709
I think we're getting very close to where I feel pretty good about our analysis, but we
still want the lawyer heavily in the loop through the process.
479
00:40:32,710 --> 00:40:35,823
But the other areas are just AI just does it really well.
480
00:40:35,823 --> 00:40:37,014
It's not as complicated.
481
00:40:37,014 --> 00:40:45,072
And I definitely recommend you get started on those areas and then dip your feet in some
of the peripheral areas like billing or areas where it's a little bit less related to core
482
00:40:45,072 --> 00:40:45,962
work too.
483
00:40:46,494 --> 00:40:54,791
So like in the scenario of like creating a timeline on the surface to me, that doesn't
sound like something that requires a point solution.
484
00:40:54,791 --> 00:41:05,530
Like can the general models or I'm not a big fan of copilot at the moment, but do you need
a specifically trained platform to do that effectively?
485
00:41:05,998 --> 00:41:08,979
I think eventually the general models will be able to do it.
486
00:41:09,939 --> 00:41:17,642
I don't know that any of the general models can take like 100 different documents being
uploaded at once.
487
00:41:17,642 --> 00:41:26,984
if you just use even some of the better ones that instruction followed really well, that
have big context windows, they're still gonna miss a lot if you don't do deeper
488
00:41:26,984 --> 00:41:27,534
algorithms.
489
00:41:27,534 --> 00:41:32,826
So for example, for us, we kind of what I mentioned earlier, we're...
490
00:41:32,826 --> 00:41:39,602
If we were to just say, Gemini, you do this pretty well, go ahead and pull out a timeline,
they're going to get a lot of it right.
491
00:41:39,602 --> 00:41:40,893
They're going to miss a lot.
492
00:41:40,893 --> 00:41:44,276
If we say, Gemini, we're going to give you one page at a time.
493
00:41:44,276 --> 00:41:45,247
Here's the full context.
494
00:41:45,247 --> 00:41:47,098
Here's exactly what I want you to look for.
495
00:41:47,098 --> 00:41:48,849
And here's the things you might get tripped up on.
496
00:41:48,849 --> 00:41:50,251
Here's how to solve it.
497
00:41:50,251 --> 00:41:51,802
And now go pull these out.
498
00:41:51,802 --> 00:41:53,703
Then we're going to get really good results.
499
00:41:53,844 --> 00:42:00,030
And so maybe in a couple of years, the tools will be very reliable in a general sense to
be able to do that.
500
00:42:00,030 --> 00:42:00,842
I think they're
501
00:42:00,842 --> 00:42:06,106
not that the raw LLMs aren't today, but we're not the only group doing timelines.
502
00:42:06,127 --> 00:42:07,668
YXLR does those really well.
503
00:42:07,668 --> 00:42:09,428
I'm sure other groups do as well.
504
00:42:09,428 --> 00:42:09,768
Yeah.
505
00:42:09,768 --> 00:42:15,448
So you need a layer of engineering on top to manage that today.
506
00:42:16,368 --> 00:42:18,168
That's interesting.
507
00:42:20,488 --> 00:42:27,148
What about building trust with AI in the firms?
508
00:42:27,548 --> 00:42:38,208
And this goes deeper than just within the firms, the clients ultimately, as you can see
with almost half still discouraging or prohibiting
509
00:42:38,208 --> 00:42:42,374
use, there's still a lack of trust with these tools.
510
00:42:42,374 --> 00:42:44,846
How do we bridge that gap?
511
00:42:45,934 --> 00:42:50,154
The trust gap can come from a few different, for a few different reasons.
512
00:42:50,154 --> 00:42:52,234
So one, it could be a security issue.
513
00:42:52,234 --> 00:42:54,434
Two, it could be a confidentiality issue.
514
00:42:54,434 --> 00:42:57,414
And then three, it could be like an accuracy hallucination issue.
515
00:42:57,414 --> 00:43:03,914
So from a security standpoint, you obviously want to make sure that the model's not
training on any information that you share with it.
516
00:43:03,914 --> 00:43:10,334
But most of the tools that are able to satisfy that requirement pretty easy.
517
00:43:10,334 --> 00:43:15,750
And even now, if you're using like a pro or plus account with .gbt, it's doing that as
well.
518
00:43:17,123 --> 00:43:21,466
You still have a lot of security holes that can happen for anything in the cloud.
519
00:43:21,466 --> 00:43:25,899
So it's helpful to see that the group is SOC 2 compliant and has that certification.
520
00:43:25,899 --> 00:43:33,294
It's helpful to ensure that the groups following best practice as far as encryption,
they're encrypting at rest and in transit.
521
00:43:33,575 --> 00:43:37,578
It's a nice to have, I think, to say that UPI iSrub as well.
522
00:43:37,578 --> 00:43:41,741
That's something you might want to look for if you're extra cautious for something
particularly sensitive.
523
00:43:43,694 --> 00:43:49,679
And so that's helping with security and for the most part, confidentiality as well.
524
00:43:49,679 --> 00:43:54,402
You might want to look for groups that set up double encryption or mutual encryption.
525
00:43:54,402 --> 00:44:01,728
So or end to end encryption, where they're able to encrypt the data to where even their
engineers can't see your data sets.
526
00:44:01,728 --> 00:44:04,600
That's possible and technically something that you can do.
527
00:44:04,600 --> 00:44:09,693
And so anything that's extremely sensitive, you might want to ask for that.
528
00:44:09,934 --> 00:44:10,695
But
529
00:44:10,695 --> 00:44:17,829
Based on if you do those two things, you should be in a pretty good position to where
you're meeting those requirements.
530
00:44:17,829 --> 00:44:28,815
From an accuracy and hallucination standpoint, to me the value is, the way you saw that
is, keep the client in the loop, make audit trails, and build things together.
531
00:44:28,876 --> 00:44:39,694
So if you have software that says, okay, these are the cases I used, click the link to see
the cases, double click to say, here's the material facts I relied on, here's the...
532
00:44:39,694 --> 00:44:42,594
quotes I relied on to generate this holding, all that stuff.
533
00:44:42,594 --> 00:44:52,254
If your software is able to do that, I think that it's able to really satisfy the concerns
that lawyers might have of, hold on, this might not even be real, is this something I can
534
00:44:52,254 --> 00:44:53,514
rely on?
535
00:44:53,514 --> 00:45:00,074
And you want that audit trail to be something that it's much faster to audit than to just
do the work from start to finish.
536
00:45:00,422 --> 00:45:00,712
Right.
537
00:45:00,712 --> 00:45:07,116
Because that's always the rub is like, am I really being more efficient if I have to go
back and double check everything?
538
00:45:07,116 --> 00:45:12,659
It really impacts the ROI equation when, when you have to do that.
539
00:45:12,659 --> 00:45:15,601
Well, this has been a really good conversation.
540
00:45:15,601 --> 00:45:20,824
I knew it would be just, uh, we we've had some good dialogue in the past.
541
00:45:20,824 --> 00:45:27,258
How do, before we wrap up here, how do people find out more about what you're doing at
Caledas legal?
542
00:45:27,662 --> 00:45:37,822
Yeah, check us out at caladisai.com, C-A-L-L-I-D-U-S-A-I.com, or shoot me a message at
justin at caladisai.com.
543
00:45:37,822 --> 00:45:39,202
And I really appreciate you having me on.
544
00:45:39,202 --> 00:45:40,602
This was a great talk.
545
00:45:40,602 --> 00:45:41,162
Thanks.
546
00:45:41,162 --> 00:45:42,263
Yeah, absolutely.
547
00:45:42,263 --> 00:45:43,924
All right, have a great weekend.
548
00:45:44,286 --> 00:45:45,507
All right, take care. -->
Subscribe
Stay up on the latest innovations in legal technology and knowledge management.