← All Webinars | L.A.B.S. #7
AI in Practice: Part 2 | What Fine-Tuning an AI Model Really Means
Explore the essentials of fine-tuning AI models, clear up common misconceptions, and discover practical real-world applications.
Level: Advanced🦅
Witness the enhanced performance of a fine-tuned model and get introduced to RAG and its significance.
+
Watch the highlights: https://blog.testsys.com/2024/09/05/ai-in-practice-your-quick-guide-to-practical-ai-use/
+
Interested in partnering on a webinar? Share your ideas at webinars@testsys.com.
1
00:00:05.395 --> 00:00:05.745
Great.
2
00:00:05.845 --> 00:00:08.545
Hi, everyone. Happy. Hi. Happy afternoon.
3
00:00:09.285 --> 00:00:11.305
We will get started in just a moment here.
4
00:00:11.305 --> 00:00:12.665
We're letting people into the room.
5
00:00:47.045 --> 00:00:49.375
Welcome everyone. We'll get started in just a moment.
6
00:01:15.465 --> 00:01:17.835
Welcome everyone. Just a moment. We'll get started.
7
00:01:18.025 --> 00:01:20.155
Just letting people time to, to come in
8
00:01:20.175 --> 00:01:22.595
and grab their lunch and join us.
9
00:01:46.005 --> 00:01:48.435
We're gonna go ahead and get started. So, hi everyone.
10
00:01:48.435 --> 00:01:49.875
Thank you for joining us today.
11
00:01:49.985 --> 00:01:53.475
This is part two of our ITS Summer Demo Days series,
12
00:01:53.895 --> 00:01:54.955
AI in Practice.
13
00:01:55.535 --> 00:01:58.515
I'm Amanda Crowley, the Director of Marketing here at ITS.
14
00:01:58.935 --> 00:02:00.835
Uh, I'll be your host for the series.
15
00:02:01.495 --> 00:02:04.115
Uh, two housekeeping things before we get started.
16
00:02:04.335 --> 00:02:06.395
The first is we'll be using the q
17
00:02:06.395 --> 00:02:08.995
and a feature, which is located at the bottom of Zoom.
18
00:02:09.375 --> 00:02:12.275
If you have comments or questions, anything you wanna ask
19
00:02:12.855 --> 00:02:15.315
our, um, two presenters, just put them there
20
00:02:15.375 --> 00:02:17.315
and we're gonna be answering them live time.
21
00:02:17.935 --> 00:02:19.595
Uh, second to that, the recording, uh,
22
00:02:19.595 --> 00:02:20.755
the webinar will be recorded.
23
00:02:21.375 --> 00:02:24.195
So, uh, we will share the link with you afterwards.
24
00:02:24.575 --> 00:02:25.995
So if you happen to have to drop
25
00:02:26.015 --> 00:02:27.515
or you wanna share this with your colleagues,
26
00:02:27.515 --> 00:02:29.315
that will definitely be available to you.
27
00:02:30.135 --> 00:02:32.635
So thank you so much for being here for with us today.
28
00:02:33.215 --> 00:02:34.715
Um, we have Chris Glacken.
29
00:02:34.845 --> 00:02:35.875
Chris is our Director
30
00:02:35.875 --> 00:02:38.035
of Innovative Technologies here at ITS,
31
00:02:38.455 --> 00:02:40.275
and joining him is Kyle Miller.
32
00:02:40.705 --> 00:02:43.875
Kyle is our manager of Item Workshop, uh,
33
00:02:43.875 --> 00:02:45.675
which is the ITS Item Bank.
34
00:02:46.055 --> 00:02:47.475
So thank you again. And with that,
35
00:02:47.505 --> 00:02:48.515
I'll turn it over to them.
36
00:02:50.145 --> 00:02:51.845
Thanks, Amanda. Hey, everyone.
37
00:02:52.465 --> 00:02:57.005
Um, so this, uh, webinar, uh, is titled, um,
38
00:02:57.115 --> 00:03:00.005
what does fine tuning, uh, really mean?
39
00:03:00.065 --> 00:03:02.285
Uh, we're gonna get into a few more things, uh,
40
00:03:02.285 --> 00:03:03.445
than just fine tuning.
41
00:03:04.075 --> 00:03:08.125
What we'd really like to cover is, uh, if, if you get into,
42
00:03:08.665 --> 00:03:11.605
uh, an ai, you get that, um, in your workplace,
43
00:03:11.665 --> 00:03:13.165
you start using it, um,
44
00:03:13.425 --> 00:03:17.565
and you find that it doesn't really, uh, meet all
45
00:03:17.645 --> 00:03:21.085
of your needs, um, what are your options, uh, for,
46
00:03:21.425 --> 00:03:22.805
uh, customization?
47
00:03:23.145 --> 00:03:26.885
So, um, Chris, why don't we start with, um,
48
00:03:27.325 --> 00:03:30.445
I think we have kind of four options for customization.
49
00:03:30.865 --> 00:03:34.845
Um, why don't we start with, uh, training, uh, a base model
50
00:03:35.305 --> 00:03:37.605
and, uh, then fine tuning that model.
51
00:03:37.785 --> 00:03:42.045
So, um, if you could, uh, kind of describe, um,
52
00:03:42.315 --> 00:03:44.165
what it is to train a model, what it is
53
00:03:44.165 --> 00:03:45.285
to fine tune a model,
54
00:03:45.505 --> 00:03:47.445
and, um, what the differences are,
55
00:03:47.545 --> 00:03:48.805
you know, between those two things.
56
00:03:50.055 --> 00:03:52.185
Sure. So, pre-training
57
00:03:52.185 --> 00:03:55.345
or training a model is something that if you're looking
58
00:03:55.345 --> 00:03:57.305
to do it, you're probably not on this call.
59
00:03:57.445 --> 00:03:59.545
Um, it, it requires a lot of resources
60
00:03:59.645 --> 00:04:00.665
and a lot of knowledge.
61
00:04:00.725 --> 00:04:02.785
Um, it requires a lot of money and a lot of tech.
62
00:04:02.805 --> 00:04:04.985
So if you think about pre-training, um,
63
00:04:05.605 --> 00:04:07.345
if you take the first model, it typically comes
64
00:04:07.345 --> 00:04:08.585
to mind is GPT, right?
65
00:04:08.665 --> 00:04:10.545
GPT-4, generative pre-trained model.
66
00:04:10.605 --> 00:04:12.425
So it's something that's trained on the
67
00:04:12.745 --> 00:04:14.305
whole internet, right?
68
00:04:14.325 --> 00:04:17.305
So it has all that documentation, all those teams
69
00:04:17.325 --> 00:04:18.585
behind it, all of that.
70
00:04:18.605 --> 00:04:20.705
And so to try to take on a task that yourself,
71
00:04:20.705 --> 00:04:23.025
that's a pretty big, that's a pretty big ask,
72
00:04:23.045 --> 00:04:24.185
and not a lot of, um,
73
00:04:24.185 --> 00:04:26.985
organizations are even gonna have the documentation to
74
00:04:26.985 --> 00:04:28.905
and text available to support something like that.
75
00:04:29.165 --> 00:04:31.265
So that's where fine tuning comes into play.
76
00:04:31.335 --> 00:04:33.545
What Fine Tune is gonna do, it's gonna take one
77
00:04:33.545 --> 00:04:35.865
of those foundational models, um, GPT-3
78
00:04:35.865 --> 00:04:37.745
that I even see they're offering G PT four
79
00:04:37.745 --> 00:04:39.185
and g PT 4.0 right now.
80
00:04:39.445 --> 00:04:41.385
But basically what you're gonna do is take one of those, um,
81
00:04:41.405 --> 00:04:45.545
models, foundational models, so GPT-4, GPT-3 0.5 turbo.
82
00:04:45.645 --> 00:04:47.585
You're gonna take those and you're gonna fine tune it
83
00:04:47.585 --> 00:04:48.745
and to fine tune it, what
84
00:04:48.745 --> 00:04:51.305
that means is you're really just adjusting the weights
85
00:04:51.405 --> 00:04:55.145
to kind of address some nuances and get some certain styles
86
00:04:55.145 --> 00:04:56.825
and expectations that you back certain
87
00:04:56.825 --> 00:04:58.145
formatting that you want back.
88
00:04:58.365 --> 00:05:00.065
So it's a whole lot more light, lightweight,
89
00:05:00.085 --> 00:05:01.985
it still requires a good bit of knowledge.
90
00:05:02.285 --> 00:05:04.065
Um, so you can't just blindly do it.
91
00:05:04.065 --> 00:05:06.345
It's gonna take certainly a bit of trial and error.
92
00:05:06.565 --> 00:05:09.625
Um, but it is, it's something that's more realistic
93
00:05:09.725 --> 00:05:12.505
for organizations looking to get a more, um,
94
00:05:13.095 --> 00:05:15.305
nuanced response back from the generative model,
95
00:05:17.045 --> 00:05:18.045
Right? So, um,
96
00:05:18.045 --> 00:05:20.875
training a, a foundational model, we,
97
00:05:20.975 --> 00:05:24.195
we probably don't need to, uh, talk about that, uh, anymore
98
00:05:24.255 --> 00:05:28.035
or dive any deeper, um, uh, that, like you said,
99
00:05:28.155 --> 00:05:31.355
that's reserved for the, the corporations that have millions
100
00:05:31.355 --> 00:05:32.835
of dollars to throw at ai.
101
00:05:33.415 --> 00:05:37.515
Um, so if fine tuning is not really working for you,
102
00:05:37.575 --> 00:05:40.355
you need something more than just getting a particular
103
00:05:40.525 --> 00:05:42.275
style, uh, out of an ai.
104
00:05:42.275 --> 00:05:45.795
You need to, uh, give it, uh, new facts, uh, new
105
00:05:46.305 --> 00:05:47.355
knowledge bases.
106
00:05:47.935 --> 00:05:51.915
Um, we have, uh, retrieval augmented generation,
107
00:05:52.335 --> 00:05:55.235
and we have, um, ontech stuffing.
108
00:05:55.535 --> 00:05:59.125
Uh, so, uh, let's talk about, let's talk about those next,
109
00:05:59.345 --> 00:06:00.525
um, context.
110
00:06:00.565 --> 00:06:02.965
Stuffing i I think is, is pretty easy.
111
00:06:03.145 --> 00:06:05.605
We can, um, we can just define that and then,
112
00:06:05.785 --> 00:06:07.205
and then move past it.
113
00:06:07.585 --> 00:06:11.405
Um, that's really, uh, just taking an entire document
114
00:06:11.545 --> 00:06:14.525
and putting it in your prompt along with the question, uh,
115
00:06:14.525 --> 00:06:16.805
that you have about that document, right?
116
00:06:16.905 --> 00:06:21.005
So if I wanted to know, uh, the names of the characters in,
117
00:06:21.105 --> 00:06:23.205
uh, the Novel War and Peace, right?
118
00:06:23.725 --> 00:06:25.045
I would just attach War
119
00:06:25.045 --> 00:06:26.605
and Peace to my, uh,
120
00:06:27.185 --> 00:06:29.965
to my prompt send the question, it would answer, right?
121
00:06:29.985 --> 00:06:33.445
So, um, let's talk about rag though retrieval,
122
00:06:33.565 --> 00:06:34.725
augmented generation.
123
00:06:35.265 --> 00:06:39.205
Um, how is that different, uh, from, from context stuffing
124
00:06:39.205 --> 00:06:41.045
and, and how's it different than fine tuning?
125
00:06:42.035 --> 00:06:44.405
Sure, yeah. So fine tuning.
126
00:06:44.835 --> 00:06:47.765
What what fine tuning is gonna do is is, like I mentioned,
127
00:06:47.765 --> 00:06:50.085
it's gonna give you the ability to kind of get certain
128
00:06:50.685 --> 00:06:52.885
expected formats back or styles back.
129
00:06:52.915 --> 00:06:55.525
What, what fine tuning is not gonna do is you're not gonna
130
00:06:55.525 --> 00:06:58.645
train it on a whole domain that doesn't already exist within
131
00:06:58.875 --> 00:07:01.165
what, what, what, what whatever's been trained on.
132
00:07:01.165 --> 00:07:02.685
So that's where RAG comes into play,
133
00:07:02.685 --> 00:07:04.365
and that's where context stuffing comes into play.
134
00:07:04.565 --> 00:07:05.885
'cause you're giving it those new inter
135
00:07:05.885 --> 00:07:06.925
information and facts.
136
00:07:07.275 --> 00:07:10.725
Some ways to look at fine tuning is you can give it all the
137
00:07:11.205 --> 00:07:13.845
lyrics to your favorite artist, your musical artist,
138
00:07:14.425 --> 00:07:17.565
and you can then turn around and use that fine tuned model,
139
00:07:17.945 --> 00:07:21.045
and you could have it generate responses, not even songs,
140
00:07:21.065 --> 00:07:23.325
but generate responses in the style of that artist.
141
00:07:23.665 --> 00:07:25.045
But what you can't do is turn around
142
00:07:25.045 --> 00:07:28.285
and ask it about the songs in that artist, um, catalog
143
00:07:28.705 --> 00:07:31.045
or, uh, recite lyrics back or things like that.
144
00:07:31.045 --> 00:07:32.765
That's not what Fine Tune is gonna give you,
145
00:07:33.025 --> 00:07:35.725
but that's something where if you give it context stuffing,
146
00:07:36.065 --> 00:07:38.045
um, if you give it additional information along
147
00:07:38.045 --> 00:07:39.485
with their prompt, you can do that.
148
00:07:39.905 --> 00:07:41.245
And so that, that's what, that's really
149
00:07:41.245 --> 00:07:42.645
where we're gonna find those differences.
150
00:07:42.785 --> 00:07:44.565
Now, if we take your example with the war
151
00:07:44.565 --> 00:07:46.325
and piece, right, with the context stuffing,
152
00:07:46.825 --> 00:07:49.445
you passed in a lot of tokens in order
153
00:07:49.465 --> 00:07:52.085
to get a couple names out of a document, right?
154
00:07:52.385 --> 00:07:56.045
And so what RAG is gonna do retrieval augmented generation.
155
00:07:56.185 --> 00:07:59.125
And so what that's gonna do is instead
156
00:07:59.125 --> 00:08:01.325
of just giving it the entirety of the war
157
00:08:01.325 --> 00:08:02.885
and piece, you're actually gonna have
158
00:08:02.885 --> 00:08:05.205
that already set up in an index database
159
00:08:05.555 --> 00:08:08.285
with all it's gonna be chunked out in a nice little pieces.
160
00:08:08.625 --> 00:08:11.085
And the Rag, what that rag does is it goes
161
00:08:11.085 --> 00:08:12.525
and identifies the chunks.
162
00:08:12.525 --> 00:08:14.925
It's gonna go and find the ones that are pertinent
163
00:08:14.925 --> 00:08:16.005
to the question in the hand,
164
00:08:16.265 --> 00:08:17.765
and then only attach those
165
00:08:17.785 --> 00:08:19.125
to the prompt that you're sending up.
166
00:08:19.145 --> 00:08:21.365
So it reduces, it makes things a lot more efficient
167
00:08:21.435 --> 00:08:23.445
because when you get into large tokens,
168
00:08:23.445 --> 00:08:24.925
even though you have some
169
00:08:24.925 --> 00:08:26.405
of the things out there like Gemini
170
00:08:26.405 --> 00:08:28.605
that are boasting like the millions of tokens,
171
00:08:28.605 --> 00:08:31.805
context windows and things, you're still not, at least,
172
00:08:31.865 --> 00:08:33.925
at least in the current stages of things there,
173
00:08:33.925 --> 00:08:35.285
you're still gonna get a loss
174
00:08:35.285 --> 00:08:36.885
of quality the more you put in there.
175
00:08:37.065 --> 00:08:39.325
And there's also the, uh, things you wanna think about
176
00:08:39.325 --> 00:08:40.685
with cost and efficiency there too.
177
00:08:40.685 --> 00:08:42.965
So Rag Rag really helps with that.
178
00:08:43.265 --> 00:08:45.005
Now, maybe in a couple years down the road,
179
00:08:45.245 --> 00:08:46.845
rag is gonna become less and less relevant
180
00:08:46.845 --> 00:08:48.485
as these model models get more powerful
181
00:08:48.485 --> 00:08:50.405
and you can just throw everything powerful and cheaper.
182
00:08:50.705 --> 00:08:53.325
But right now, rag Rag is really giving you that benefit,
183
00:08:54.505 --> 00:08:55.505
Right? And, and you
184
00:08:55.505 --> 00:08:58.765
do pay for the tokens that you supply.
185
00:08:59.105 --> 00:09:03.685
So, um, putting an entire novel into a prompt rather than
186
00:09:03.685 --> 00:09:06.285
just the relevant, uh, information, if you're doing
187
00:09:06.285 --> 00:09:08.565
that over and over again, it, it can definitely get,
188
00:09:08.745 --> 00:09:09.965
uh, expensive.
189
00:09:10.745 --> 00:09:14.845
Um, so let's dive into fine tuning
190
00:09:14.985 --> 00:09:17.685
and then let's dive into rag, uh, a bit later.
191
00:09:18.305 --> 00:09:21.445
Um, so for, for fine tuning, um,
192
00:09:22.505 --> 00:09:25.455
where might this work really well, uh, and
193
00:09:25.515 --> 00:09:27.495
and what are some common misconceptions
194
00:09:27.495 --> 00:09:28.855
about, about fine tuning?
195
00:09:29.415 --> 00:09:34.335
I, I hear, um, when I read about, um,
196
00:09:35.275 --> 00:09:37.135
you know, models being customized
197
00:09:37.315 --> 00:09:41.575
or, um, AI being customized for, uh, individuals
198
00:09:41.575 --> 00:09:42.655
or for corporations,
199
00:09:43.205 --> 00:09:45.255
fine tuning is always what you hear about.
200
00:09:45.325 --> 00:09:46.775
It's, it's the buzzword for
201
00:09:46.995 --> 00:09:48.735
how do you get AI to work for you.
202
00:09:49.395 --> 00:09:52.375
So, um, being that, uh, you
203
00:09:52.375 --> 00:09:55.055
and I have, uh, uh, tried and,
204
00:09:55.075 --> 00:09:56.975
and not done very well at, at, uh,
205
00:09:56.975 --> 00:10:00.335
getting information we want as a result of fine tuning, uh,
206
00:10:00.405 --> 00:10:03.015
what are some misconceptions about how fine tuning works
207
00:10:03.075 --> 00:10:05.535
and, and what, what it can really produce for you?
208
00:10:06.475 --> 00:10:08.935
So, fine tuning is really gonna come in
209
00:10:08.935 --> 00:10:09.975
handy in scenarios.
210
00:10:09.975 --> 00:10:13.255
Like if you, if you find yourself constantly providing one
211
00:10:13.255 --> 00:10:16.575
or multi-shot prompts, um, uh, a or one
212
00:10:16.575 --> 00:10:18.495
or multi-shot context into your prompts,
213
00:10:18.725 --> 00:10:21.215
fine tuning is going, probably gonna help you there.
214
00:10:21.315 --> 00:10:23.055
Now, what's a one shot or a multi-shot?
215
00:10:23.155 --> 00:10:25.775
So think about it like if I, if I have a question, right?
216
00:10:25.775 --> 00:10:28.375
And I wanna say, just generate an item, alright,
217
00:10:28.395 --> 00:10:31.375
gen generate me an item about any kind of domain, alright?
218
00:10:31.755 --> 00:10:33.255
And so when it generates that item,
219
00:10:34.085 --> 00:10:35.615
it's gonna give you a random format
220
00:10:35.675 --> 00:10:37.135
unless you give things more specific,
221
00:10:37.195 --> 00:10:38.775
you might even say generate a multiple
222
00:10:38.775 --> 00:10:39.815
choice item, all right?
223
00:10:39.915 --> 00:10:42.335
And so maybe it'll give you your, uh, your options
224
00:10:42.355 --> 00:10:44.655
as 1, 2, 3, 4, or A, B, C, D, right?
225
00:10:44.995 --> 00:10:47.895
And so in order to kind of tune that, what you do is you,
226
00:10:47.895 --> 00:10:49.735
you provided one or more multi shots
227
00:10:49.835 --> 00:10:51.415
or shots or multi shots, right?
228
00:10:51.415 --> 00:10:53.295
And so each shot is gonna be a context,
229
00:10:53.435 --> 00:10:54.695
and what you're doing is, okay,
230
00:10:55.775 --> 00:10:57.775
generate me a multiple choice question here.
231
00:10:57.795 --> 00:10:59.695
Here's an example of a multiple choice question.
232
00:10:59.695 --> 00:11:01.775
Here's another example of a multi-choice question, right?
233
00:11:01.775 --> 00:11:03.575
So you're providing an additional context,
234
00:11:04.075 --> 00:11:05.255
and then that'll work great.
235
00:11:05.315 --> 00:11:06.975
The, the generative model will see it
236
00:11:06.995 --> 00:11:08.135
and it'll say, oh, it wants it,
237
00:11:08.135 --> 00:11:09.415
you want this back in this format,
238
00:11:09.675 --> 00:11:10.935
let me go ahead and address that.
239
00:11:11.275 --> 00:11:13.375
But if you're just doing it repetitively over
240
00:11:13.375 --> 00:11:16.375
and over again, well now you're using your tokens in order
241
00:11:16.395 --> 00:11:17.535
to send it to the model.
242
00:11:17.755 --> 00:11:20.055
And, um, it's not as, uh, it's not as efficient and,
243
00:11:20.355 --> 00:11:22.295
and you're gonna, you're probably not gonna get
244
00:11:22.295 --> 00:11:23.495
that much latency, but you're,
245
00:11:23.495 --> 00:11:25.175
you might get a little latency there too.
246
00:11:25.235 --> 00:11:27.935
And so where fine tuning can come into play is you can
247
00:11:28.135 --> 00:11:31.175
actually tune the generative model in, in order, um,
248
00:11:31.275 --> 00:11:33.135
and give it some examples of your items.
249
00:11:33.195 --> 00:11:35.015
And so that way the next time you do ask it
250
00:11:35.015 --> 00:11:37.335
to generate an item, the idea is that you don't have
251
00:11:37.335 --> 00:11:39.775
to provide it all of those contexts in addition
252
00:11:39.775 --> 00:11:41.215
to your prompt, you can just act,
253
00:11:41.435 --> 00:11:43.255
ask the prompt and then get that back,
254
00:11:44.485 --> 00:11:45.485
Right? So you said,
255
00:11:45.485 --> 00:11:47.395
um, when we were talking about prepping
256
00:11:47.395 --> 00:11:50.115
for this webinar yesterday, uh, you said something
257
00:11:50.115 --> 00:11:51.955
that was really interesting that
258
00:11:52.545 --> 00:11:56.205
fine tuning a model is really like the ability
259
00:11:56.345 --> 00:12:00.325
to give it a thousand examples every time without having
260
00:12:00.325 --> 00:12:01.565
to supply them in the prompt.
261
00:12:01.825 --> 00:12:04.845
So you go through the, the process of, of doing
262
00:12:04.845 --> 00:12:06.965
that fine tuning once with a thousand examples,
263
00:12:07.585 --> 00:12:11.525
and from then on when you query that model,
264
00:12:12.065 --> 00:12:14.365
it knows about those thousand examples and it,
265
00:12:14.365 --> 00:12:15.685
and it will use those in,
266
00:12:15.685 --> 00:12:17.765
in generating the proper response. Is that right?
267
00:12:18.105 --> 00:12:19.405
Yep, yep.
268
00:12:20.235 --> 00:12:24.905
Okay, perfect. Um, how about as far as,
269
00:12:25.245 --> 00:12:29.345
uh, using it to, to get additional data, uh,
270
00:12:30.015 --> 00:12:31.145
into the model?
271
00:12:31.465 --> 00:12:33.265
I, I think you, you touched on this,
272
00:12:33.405 --> 00:12:37.625
but, um, let's be, uh, a little bit more, uh,
273
00:12:38.865 --> 00:12:41.505
explicit about, you know, what we've seen as, as far
274
00:12:41.505 --> 00:12:46.265
as using fine tuning to add additional data to,
275
00:12:46.765 --> 00:12:48.385
uh, to a model,
276
00:12:49.725 --> 00:12:50.725
Right? So,
277
00:12:50.725 --> 00:12:54.075
well, if, if we try to use it for, let's say, let's,
278
00:12:54.075 --> 00:12:56.155
let's just stick with item generation, right?
279
00:12:56.455 --> 00:12:59.555
And so maybe I wanna feed it a whole bunch of, uh, cases,
280
00:13:00.095 --> 00:13:01.755
um, that I, that I have, right?
281
00:13:01.815 --> 00:13:03.555
So let's just say a whole bunch of medical cases,
282
00:13:03.615 --> 00:13:05.395
and I wanted to generate items about these
283
00:13:05.395 --> 00:13:06.555
medical cases, okay?
284
00:13:06.935 --> 00:13:09.035
Uh, let's just keep it simple. Multiple choice items.
285
00:13:09.115 --> 00:13:10.475
I wanted to generate these items.
286
00:13:10.985 --> 00:13:14.235
Well, what it's going to do well is it's going to,
287
00:13:14.615 --> 00:13:17.395
if I fine tune it on a whole bunch of my medical cases,
288
00:13:17.585 --> 00:13:19.915
what it's gonna do well is it's gonna recognize the
289
00:13:19.915 --> 00:13:21.955
terminologies and the way I'm using certain words
290
00:13:21.975 --> 00:13:23.635
and the styles that I'm putting things together
291
00:13:23.975 --> 00:13:26.275
and kind of the structure of, of sentences.
292
00:13:26.415 --> 00:13:28.555
And it will gimme items that kind of match that.
293
00:13:28.905 --> 00:13:31.675
What it's not going to do, though, it is not going
294
00:13:31.675 --> 00:13:33.595
to be able to reference a certain case
295
00:13:33.855 --> 00:13:36.275
and, uh, ask me a specific item about that.
296
00:13:36.535 --> 00:13:38.755
For that you're going to wanna look more at like a rag
297
00:13:38.995 --> 00:13:40.715
approach, um, that we mentioned earlier, a context,
298
00:13:40.715 --> 00:13:41.755
something, something like that.
299
00:13:42.175 --> 00:13:44.035
So, so that, that's where it's not going
300
00:13:44.175 --> 00:13:45.635
to kind of do too well.
301
00:13:45.635 --> 00:13:48.515
It's, it's gonna give you more of that style, that syntax.
302
00:13:48.515 --> 00:13:51.555
Another, another example that comes to mind is, um,
303
00:13:51.585 --> 00:13:53.555
this was more in the early days, I don't hit this,
304
00:13:53.755 --> 00:13:55.995
I don't hit this too much, but back when I was first playing
305
00:13:55.995 --> 00:13:59.275
with the GPT-3 three model, I wa that's
306
00:13:59.275 --> 00:14:00.435
before they even had function calling,
307
00:14:00.435 --> 00:14:03.235
where you can get a more structured JSON approach back.
308
00:14:03.415 --> 00:14:06.435
So what I was doing was I was trying to, um,
309
00:14:06.785 --> 00:14:09.075
have the model recognize that the user wanted
310
00:14:09.075 --> 00:14:12.955
to do an action that I needed to do a function on, right?
311
00:14:12.975 --> 00:14:14.595
So, so now they have this all baked in
312
00:14:14.595 --> 00:14:16.475
and it, it just keeps getting better and better.
313
00:14:16.575 --> 00:14:20.715
But the way, what I was facing is every time that I asked it
314
00:14:20.715 --> 00:14:25.275
to, um, send me back A-J-S-O-N structure, so I really wanted
315
00:14:25.275 --> 00:14:26.755
to back a certain syntax, right?
316
00:14:26.835 --> 00:14:28.275
I wanted key value pairs
317
00:14:28.375 --> 00:14:29.835
and I wanted them in a certain order,
318
00:14:29.895 --> 00:14:32.035
and I wanted it to match the JSON structure.
319
00:14:32.135 --> 00:14:33.995
So little curly brackets at the beginning and,
320
00:14:33.995 --> 00:14:35.635
and all kind of set up with the quotes everywhere.
321
00:14:35.825 --> 00:14:38.475
What I was finding, in certain cases when I asked the
322
00:14:38.635 --> 00:14:41.595
question a certain way, it would gimme back a structure,
323
00:14:41.695 --> 00:14:43.235
but it wasn't syntactically correct,
324
00:14:43.285 --> 00:14:45.915
which just caused me a whole bunch of problems downstream.
325
00:14:46.415 --> 00:14:48.555
So fine tuning can help me there,
326
00:14:48.555 --> 00:14:50.995
because now again, you're not gonna have the problem,
327
00:14:51.030 --> 00:14:52.455
problem with JSON these days with the models.
328
00:14:52.555 --> 00:14:55.735
But back then fine tuning, what that helps you do is say,
329
00:14:55.735 --> 00:14:57.855
Hey, when I ask you a question, I want you
330
00:14:57.855 --> 00:14:59.375
to return the prompt in this way, right?
331
00:14:59.375 --> 00:15:01.055
You're giving it, you're giving it a sample
332
00:15:01.315 --> 00:15:03.255
and then a response for it to train on.
333
00:15:03.355 --> 00:15:05.375
And that helps you kind of tighten up those edge cases
334
00:15:05.375 --> 00:15:07.055
where maybe it wasn't giving you that back.
335
00:15:07.055 --> 00:15:09.615
So, so like styles and those types of things.
336
00:15:09.715 --> 00:15:11.415
And so, so those are ways where it can kind
337
00:15:11.415 --> 00:15:12.855
of help you there and not help you.
338
00:15:13.525 --> 00:15:17.935
Perfect. Um, let's get into a demo of, uh, fine tuning.
339
00:15:18.515 --> 00:15:21.895
Um, so just to, uh, set the stage here, uh, what,
340
00:15:21.895 --> 00:15:24.735
what we're going to do is we're gonna ask, uh, AI
341
00:15:24.755 --> 00:15:27.735
to generate us a thousand test questions.
342
00:15:28.385 --> 00:15:30.895
We're then going to review those test questions
343
00:15:30.915 --> 00:15:32.255
programmatically, uh,
344
00:15:32.255 --> 00:15:33.615
and we're gonna discard those
345
00:15:33.615 --> 00:15:35.535
that we don't feel are long enough.
346
00:15:36.115 --> 00:15:38.935
Uh, once we get, uh, all of the test questions
347
00:15:38.935 --> 00:15:41.255
that we do think are long enough, we're going
348
00:15:41.255 --> 00:15:42.455
to generate a training file.
349
00:15:42.665 --> 00:15:46.375
We're going to use that to, uh, fine tune and model,
350
00:15:47.325 --> 00:15:50.705
and then we're going to test that fine tuned model again,
351
00:15:50.925 --> 00:15:53.105
ask for another, uh, thousand questions
352
00:15:53.525 --> 00:15:55.025
and we'll see, uh, whether
353
00:15:55.025 --> 00:15:56.025
or not the length
354
00:15:56.325 --> 00:15:59.905
of those questions is is now longer since we have examples
355
00:16:00.055 --> 00:16:02.465
that we're providing of longer questions.
356
00:16:03.965 --> 00:16:07.025
The interesting thing here is that we are not going
357
00:16:07.025 --> 00:16:11.985
to instruct the AI that we want our questions to be longer.
358
00:16:12.455 --> 00:16:15.065
Alls we're going to do is ask for a thousand questions,
359
00:16:15.495 --> 00:16:17.785
discard the ones that are not of a certain length,
360
00:16:18.125 --> 00:16:20.065
and we're gonna train on, on the longer ones.
361
00:16:20.065 --> 00:16:23.385
And, and when we generate questions, again, we should see
362
00:16:23.695 --> 00:16:27.305
that we're now getting, uh, longer test questions back.
363
00:16:27.485 --> 00:16:30.625
So, uh, Chris, I'll, I'll turn it over to you for the demo.
364
00:16:31.295 --> 00:16:32.945
Okay? Sure. All right.
365
00:16:32.945 --> 00:16:35.705
So we're gonna start out, um, by first getting our,
366
00:16:35.765 --> 00:16:36.825
our base set of data.
367
00:16:37.005 --> 00:16:38.905
All right? We're gonna, we wanna, like Kyle said, we want
368
00:16:38.905 --> 00:16:40.785
to prompt GPT for a thousand questions.
369
00:16:40.945 --> 00:16:42.105
'cause I'm not gonna sit here and
370
00:16:42.105 --> 00:16:43.505
type out a thousand questions.
371
00:16:43.755 --> 00:16:45.505
We're not gonna give it any instructions
372
00:16:45.605 --> 00:16:46.945
or anything along those lines.
373
00:16:47.085 --> 00:16:49.305
So what we're doing here is, uh,
374
00:16:49.305 --> 00:16:50.945
we're just gonna do something pretty simple.
375
00:16:50.945 --> 00:16:52.905
We're gonna call the chat completions endpoint.
376
00:16:53.215 --> 00:16:55.945
I've generated a, uh, a list here of a couple topics
377
00:16:55.945 --> 00:16:57.425
that I want to generate questions on.
378
00:16:57.885 --> 00:16:59.385
And we're going to go ahead
379
00:16:59.485 --> 00:17:03.265
and generate these questions 10 at a time using the GPT-3
380
00:17:03.265 --> 00:17:04.385
0.5 turbo model.
381
00:17:04.385 --> 00:17:06.865
Refreshed is November 6th, alright?
382
00:17:07.285 --> 00:17:09.065
And, uh, it's just a little max token.
383
00:17:09.065 --> 00:17:12.105
So this is a base, if, if you've done any type of API call,
384
00:17:12.105 --> 00:17:14.585
this is very vanilla, nothing crazy going on here.
385
00:17:14.585 --> 00:17:15.785
And so we're gonna generate a question,
386
00:17:16.085 --> 00:17:18.745
and so we're gonna generate 10 questions at a time.
387
00:17:18.745 --> 00:17:20.145
That's what that end value is.
388
00:17:20.145 --> 00:17:22.745
So every time I send a request to the GPT endpoint,
389
00:17:23.045 --> 00:17:25.625
I'm gonna say, give me 10 questions using this model,
390
00:17:25.965 --> 00:17:28.065
and then I'm going to just do this 10 times
391
00:17:28.165 --> 00:17:30.545
and I'm going to end up with my end result there.
392
00:17:30.545 --> 00:17:32.425
We're gonna end up with a file that we're gonna write out.
393
00:17:33.085 --> 00:17:34.505
All right? So I'm gonna go ahead
394
00:17:34.505 --> 00:17:35.665
and just kind of start this off
395
00:17:36.405 --> 00:17:38.985
and we can see that it's gonna generate these questions.
396
00:17:40.655 --> 00:17:42.115
All right? So here's my little prompt.
397
00:17:42.455 --> 00:17:45.595
I'm gonna say, okay, I want a thousand, um, questions here.
398
00:17:45.805 --> 00:17:47.435
Write the file out to my desktop.
399
00:17:47.655 --> 00:17:49.165
And so for the purposes of the demo,
400
00:17:49.255 --> 00:17:51.245
we're first gonna write it out to a CSV, so
401
00:17:51.245 --> 00:17:52.845
that way we can compare and look at these things.
402
00:17:52.865 --> 00:17:55.005
But there's no reason I couldn't just do this all in one
403
00:17:55.005 --> 00:17:56.325
step with my JSON L file.
404
00:17:56.665 --> 00:17:58.525
All right? So I'm gonna kick this off,
405
00:17:58.545 --> 00:18:00.405
and so just to see that it's actually going
406
00:18:00.465 --> 00:18:02.565
and it's doing in, in real time, we'll just look at it
407
00:18:02.565 --> 00:18:04.005
with a little proxy debugger here
408
00:18:04.665 --> 00:18:08.445
and just see that it is making calls out to open AI
409
00:18:09.235 --> 00:18:10.925
windows is trying to snap me.
410
00:18:11.265 --> 00:18:12.405
All right, nonstop.
411
00:18:14.125 --> 00:18:15.705
All right, so we're just gonna scale that down.
412
00:18:15.805 --> 00:18:17.745
All right, so you can see right here I'm capturing,
413
00:18:17.745 --> 00:18:19.265
so I'm making my API calls.
414
00:18:19.605 --> 00:18:20.985
So within this API call,
415
00:18:21.045 --> 00:18:23.225
you could see it's just generated a question about history,
416
00:18:23.525 --> 00:18:25.985
and then it's coming back with all of these responses.
417
00:18:25.985 --> 00:18:28.505
And so it's giving me 10 questions every single time.
418
00:18:29.545 --> 00:18:31.205
All right? So we're just kind of capturing those
419
00:18:31.545 --> 00:18:35.405
and we're logging those into a CSV file that we're going
420
00:18:35.405 --> 00:18:36.525
to have on the desktop.
421
00:18:36.785 --> 00:18:38.805
And then from that CSV file, what we're going
422
00:18:38.805 --> 00:18:40.885
to do is we're going to then generate what's called
423
00:18:41.005 --> 00:18:42.045
a JL file.
424
00:18:42.345 --> 00:18:45.685
All right? So I have an example of the JL file here.
425
00:18:46.025 --> 00:18:47.965
And so the way it works with the, uh,
426
00:18:47.965 --> 00:18:51.565
more modern GPT miles is it's really using a chat structure.
427
00:18:51.945 --> 00:18:53.765
Now, you can, you, you can either do a single
428
00:18:53.795 --> 00:18:55.005
turn or a multi turn.
429
00:18:55.105 --> 00:18:57.445
So here you're gonna see that we have a multi turn example.
430
00:18:57.985 --> 00:19:00.405
So this is just a series of all the questions
431
00:19:00.405 --> 00:19:02.205
that we're going to pass up to it.
432
00:19:02.205 --> 00:19:03.125
So while it's running, and then
433
00:19:03.125 --> 00:19:04.605
I'll generate that Jason l file.
434
00:19:04.605 --> 00:19:07.165
But what we do is, so we got our thousand questions,
435
00:19:07.165 --> 00:19:09.485
which we'll look at when it's done processing, um,
436
00:19:09.485 --> 00:19:10.845
that's gonna take probably another minute.
437
00:19:11.185 --> 00:19:13.125
But the idea here is then we want to take
438
00:19:13.125 --> 00:19:16.005
that thousand questions and we wanna remove everything.
439
00:19:16.065 --> 00:19:17.805
So just for the purposes of this example,
440
00:19:17.805 --> 00:19:19.325
we're gonna move everything
441
00:19:19.325 --> 00:19:22.485
that's less than 105 characters long
442
00:19:22.485 --> 00:19:25.285
because we want to see if we can train the model to,
443
00:19:25.505 --> 00:19:28.285
to get the style of generating questions
444
00:19:28.285 --> 00:19:31.325
that are more than 105 without us having to instruct it
445
00:19:31.325 --> 00:19:32.845
or do anything along those lines.
446
00:19:33.225 --> 00:19:34.805
So out of those thousand questions,
447
00:19:34.975 --> 00:19:36.125
we're gonna weed out everything
448
00:19:36.125 --> 00:19:37.605
that's less than 105 characters.
449
00:19:37.705 --> 00:19:40.405
And then we're going to generate our training file.
450
00:19:40.425 --> 00:19:42.285
In this case, it's called a JSON l file.
451
00:19:42.285 --> 00:19:43.645
So it's in the JSON structure,
452
00:19:44.065 --> 00:19:46.405
but it's using a, a strict message format.
453
00:19:46.585 --> 00:19:48.525
So you can see right here I have a message array.
454
00:19:48.945 --> 00:19:51.925
And so in there, I, I'm, I send it a message
455
00:19:52.025 --> 00:19:53.125
as a system instruction.
456
00:19:53.125 --> 00:19:54.565
So I, I have that in my instruction,
457
00:19:54.865 --> 00:19:56.885
and then I have a user message that says,
458
00:19:57.035 --> 00:19:58.565
okay, here's my question.
459
00:19:59.265 --> 00:20:03.005
And then I have my, um, uh, I say, generate me a question,
460
00:20:03.025 --> 00:20:06.045
and then I have, as the assistant generating a question
461
00:20:06.345 --> 00:20:07.765
that's a thousand characters.
462
00:20:08.615 --> 00:20:10.395
All right? So that, that's my JSNL file.
463
00:20:10.455 --> 00:20:12.635
Now what I, um, so actually I'll hold
464
00:20:12.635 --> 00:20:14.355
until we get into the next piece there.
465
00:20:14.375 --> 00:20:16.595
All right, so now my file's created successfully.
466
00:20:16.935 --> 00:20:20.435
And so now I can go ahead and create a JL file from that.
467
00:20:20.495 --> 00:20:22.835
So if we go and we look at our CSV file here,
468
00:20:22.835 --> 00:20:27.665
that was generated, so I dropped it right on my desktop.
469
00:20:27.725 --> 00:20:31.725
So if I open that up, we'll actually see now we have
470
00:20:32.855 --> 00:20:37.105
a CSV file filled
471
00:20:37.105 --> 00:20:38.865
with 1000 questions.
472
00:20:39.175 --> 00:20:41.545
Alright? Some of these are really short, some
473
00:20:41.545 --> 00:20:42.825
of 'em are on the longer side.
474
00:20:43.045 --> 00:20:44.745
All right, but you see we have a thousand.
475
00:20:44.805 --> 00:20:47.525
So now what I'm going to do is now I'm going
476
00:20:47.525 --> 00:20:48.525
to take this document
477
00:20:48.945 --> 00:20:52.285
and I'm going to turn it into my Jason l file here.
478
00:20:52.545 --> 00:20:55.285
And by doing that, I should end up with something closer
479
00:20:55.425 --> 00:20:57.965
to 200 or some kind of subset of that.
480
00:20:57.985 --> 00:20:59.405
I'm not gonna have a thousand, I'm going
481
00:20:59.405 --> 00:21:00.485
to only weed out the ones
482
00:21:00.485 --> 00:21:02.125
that are longer than 105 characters
483
00:21:02.125 --> 00:21:04.405
because that's the behavior that we're going for here.
484
00:21:05.545 --> 00:21:08.475
All right? Okay.
485
00:21:08.475 --> 00:21:10.915
So I'm going to generate my JSON l file,
486
00:21:11.535 --> 00:21:13.435
and then, so the JS l file is going
487
00:21:13.435 --> 00:21:15.115
to look exactly like we had it.
488
00:21:15.575 --> 00:21:19.775
So that should be done now. Yep.
489
00:21:23.475 --> 00:21:25.175
All right. So you see we have a subset here
490
00:21:25.175 --> 00:21:26.575
of 530 questions.
491
00:21:26.995 --> 00:21:30.375
All right? So now what I can do is I can go into open ai.
492
00:21:30.375 --> 00:21:32.015
So I can do this through an API,
493
00:21:32.195 --> 00:21:34.535
but to make things a little more, um, user-friendly here,
494
00:21:34.555 --> 00:21:36.615
I'm just going to go through their fine tuning playground
495
00:21:36.615 --> 00:21:38.255
and I'm gonna start up a fine tuning job.
496
00:21:38.565 --> 00:21:40.575
Alright? So again, this could all be done
497
00:21:40.575 --> 00:21:42.695
through a system services API calls,
498
00:21:42.715 --> 00:21:44.295
but for the purposes of just demonstrating,
499
00:21:44.295 --> 00:21:46.295
I'm just doing this through their, uh, GUI here,
500
00:21:46.295 --> 00:21:47.455
their playground that they have.
501
00:21:47.915 --> 00:21:49.775
So I'm gonna start up a new fine tuning job.
502
00:21:49.775 --> 00:21:51.335
So I'm in their fine tuning playground.
503
00:21:51.715 --> 00:21:53.695
So what I wanna do is select my base model.
504
00:21:53.715 --> 00:21:55.615
So you see they have a couple models to choose from.
505
00:21:55.635 --> 00:21:57.415
So GPT-4 0.0 you need
506
00:21:57.415 --> 00:21:59.135
to request access for at this point in time.
507
00:21:59.195 --> 00:22:01.415
So I'm just going to do a GPT turbo
508
00:22:01.595 --> 00:22:03.255
1106 job with that model.
509
00:22:03.255 --> 00:22:06.135
Then you upload your JSL file, your training document.
510
00:22:06.635 --> 00:22:08.935
So I'm going to go ahead and grab that,
511
00:22:08.955 --> 00:22:10.735
and I'm just going to drop that in here.
512
00:22:12.355 --> 00:22:14.015
All right. And then validation data.
513
00:22:14.115 --> 00:22:16.815
So the validation data, what I could do is I'm,
514
00:22:16.815 --> 00:22:18.375
I'm not gonna do it in this demo, um,
515
00:22:18.615 --> 00:22:20.495
'cause I've, I've already run it on the backend anyway,
516
00:22:20.515 --> 00:22:24.175
but so what the validation data is, you can take a subset
517
00:22:24.395 --> 00:22:25.455
of your training data
518
00:22:25.835 --> 00:22:28.415
and then provide that as a validation file.
519
00:22:28.435 --> 00:22:30.695
And so what that'll do is every time
520
00:22:30.695 --> 00:22:33.135
that the job finishes running, its first pass through
521
00:22:33.135 --> 00:22:35.415
through your data, it'll run a validation check.
522
00:22:35.475 --> 00:22:36.935
And so it'll take those samples
523
00:22:37.315 --> 00:22:39.175
and so it'll run through the validation.
524
00:22:39.175 --> 00:22:41.745
And what that means is like, okay, I'm being,
525
00:22:41.805 --> 00:22:43.585
I'm gonna generate, in this case, it's going
526
00:22:43.585 --> 00:22:44.825
to generate a question
527
00:22:44.925 --> 00:22:47.105
and then it's gonna check that file that I gave it
528
00:22:47.125 --> 00:22:49.665
and see if it's in line with that style
529
00:22:49.885 --> 00:22:51.305
and everything that I generated.
530
00:22:51.305 --> 00:22:52.545
And if it's not, it will adjust.
531
00:22:52.605 --> 00:22:54.185
Its, its losses and those type of things.
532
00:22:54.365 --> 00:22:55.545
Its weights accordingly. Try
533
00:22:55.545 --> 00:22:57.105
to get it more closer and then run it again.
534
00:22:57.445 --> 00:22:59.425
You don't have to use it. It will, it will still,
535
00:22:59.525 --> 00:23:00.905
uh, complete the job without it.
536
00:23:01.325 --> 00:23:04.225
The suffix here, this is just something so to let you know,
537
00:23:04.445 --> 00:23:06.585
um, that this is your, your model here.
538
00:23:06.585 --> 00:23:11.385
So I'm gonna say greater than, um, 105, uh, demo purposes.
539
00:23:12.415 --> 00:23:15.395
All right? And then down here we have some hyper parameters
540
00:23:15.395 --> 00:23:17.795
such as like batch size, learning rate, multiplier, number
541
00:23:17.795 --> 00:23:19.755
of e epochs of the type, like
542
00:23:19.755 --> 00:23:22.035
how many times you're gonna run through these things, try
543
00:23:22.035 --> 00:23:24.115
to limit it, how frequently it dust itself.
544
00:23:24.375 --> 00:23:26.955
So these are the things where really, if you're going
545
00:23:26.955 --> 00:23:29.395
to do something like this, these are the things you want
546
00:23:29.395 --> 00:23:31.555
to make sure that you understand and you know how to use.
547
00:23:31.695 --> 00:23:33.195
I'm not gonna get down into the weeds of this
548
00:23:33.435 --> 00:23:34.555
'cause we only have a little bit of time,
549
00:23:34.775 --> 00:23:36.475
but these are the type of things that, um,
550
00:23:36.775 --> 00:23:38.875
you really wanna bring that knowledge when you're going
551
00:23:38.875 --> 00:23:40.515
to go and try to fine tune something.
552
00:23:40.675 --> 00:23:42.675
'cause they can make a difference with your output.
553
00:23:42.735 --> 00:23:44.595
So for right now though, I'm just gonna leave everything
554
00:23:44.595 --> 00:23:48.085
as auto and I'm gonna create, so once I create this job,
555
00:23:48.085 --> 00:23:50.485
what's gonna happen is it's going to go ahead
556
00:23:50.485 --> 00:23:53.605
and set everything off if it is gonna validate my JS file
557
00:23:53.985 --> 00:23:55.405
and make sure that everything's in place.
558
00:23:55.425 --> 00:23:57.685
And if it is, it's gonna start the job. And as
559
00:24:03.545 --> 00:24:05.385
a result here though, is that you're gonna end up
560
00:24:05.975 --> 00:24:08.045
with a new fine tuned model.
561
00:24:15.405 --> 00:24:17.385
We can see that the job ran successfully.
562
00:24:17.945 --> 00:24:19.545
I I actually did two jobs yesterday.
563
00:24:19.585 --> 00:24:21.825
I ran one where I told it to do 10 epochs,
564
00:24:21.825 --> 00:24:24.025
and then the first one I did it three epochs.
565
00:24:24.025 --> 00:24:25.625
So three pass throughs and 10 pass throughs.
566
00:24:25.745 --> 00:24:26.785
'cause I wanted to see if there was
567
00:24:26.785 --> 00:24:27.825
a difference in the value there.
568
00:24:28.285 --> 00:24:29.865
And so what it does, yeah,
569
00:24:30.265 --> 00:24:33.945
I just wanna, um, intervene real quick.
570
00:24:34.225 --> 00:24:35.825
I think you keep freezing a little bit,
571
00:24:36.045 --> 00:24:38.705
so I don't know if we missed anything too important there.
572
00:24:38.925 --> 00:24:40.705
Oh, uh, 'cause my, uh, yeah, So
573
00:24:41.085 --> 00:24:44.065
My VPN just went out Just, just really quickly.
574
00:24:44.325 --> 00:24:48.785
Um, it was just about, um, that we're, uh, using a,
575
00:24:49.565 --> 00:24:52.945
uh, a model that we trained yesterday, uh,
576
00:24:52.965 --> 00:24:56.265
and that we upped the epochs a bit so that we could get, uh,
577
00:24:56.285 --> 00:24:59.545
better results based on our, uh, unique scenario.
578
00:25:00.395 --> 00:25:03.385
Chris, I, you, you look better, uh, since yeah,
579
00:25:03.395 --> 00:25:04.545
since Amanda it,
580
00:25:04.995 --> 00:25:05.995
So go ahead. It was the VPN.
581
00:25:05.995 --> 00:25:08.505
All right. All right.
582
00:25:08.505 --> 00:25:10.225
Okay, so I'm just gonna stay off the VPN.
583
00:25:10.335 --> 00:25:13.775
Okay, so, um, so you can see right here, so I,
584
00:25:13.775 --> 00:25:14.855
I've kicked off my job.
585
00:25:15.095 --> 00:25:16.655
I, I adjusted the fine tuning parameter.
586
00:25:16.655 --> 00:25:18.575
So again, so if any, so I'll start again.
587
00:25:18.635 --> 00:25:20.655
So I, I chose my base model.
588
00:25:20.965 --> 00:25:22.095
I'll just run through real quick.
589
00:25:22.415 --> 00:25:24.535
I uploaded my training document right here.
590
00:25:25.305 --> 00:25:27.485
Um, there's my validation that I spoke about.
591
00:25:28.435 --> 00:25:30.835
I, I can name it, I can add in a little, uh, character,
592
00:25:30.995 --> 00:25:31.995
a little string that lets me
593
00:25:31.995 --> 00:25:33.035
know that this is gonna be on my model.
594
00:25:33.055 --> 00:25:34.675
And then here are the hyper parameters down here.
595
00:25:34.675 --> 00:25:37.035
Then you create the job. So once you do those things,
596
00:25:37.055 --> 00:25:38.755
the job gets off and it starts running.
597
00:25:38.855 --> 00:25:40.275
It checks to make sure everything's good.
598
00:25:40.495 --> 00:25:42.155
And if it is, and then it starts running.
599
00:25:42.855 --> 00:25:45.155
So we could see right here, the jobs that I ran yesterday,
600
00:25:45.155 --> 00:25:47.075
again, I ran one for 10 epochs
601
00:25:47.075 --> 00:25:49.315
and one for, uh, three epochs just to kind
602
00:25:49.315 --> 00:25:52.195
of get a different, um, see, see what would happen.
603
00:25:52.655 --> 00:25:55.555
And so it used the GPT-3 0.5 turbo model,
604
00:25:55.855 --> 00:25:58.835
and then the GPT, uh, this was my output and model.
605
00:25:58.855 --> 00:26:00.075
So this is my fine tuned model.
606
00:26:00.135 --> 00:26:01.475
You can see open AI always starts
607
00:26:01.475 --> 00:26:03.195
with the ft, the base model.
608
00:26:03.695 --> 00:26:05.955
Um, and then it adds in your pro your, uh,
609
00:26:06.145 --> 00:26:07.395
project that you're using.
610
00:26:07.455 --> 00:26:08.875
So I'm using ITS project
611
00:26:09.175 --> 00:26:11.955
and then, uh, my little, uh, suffix that I had.
612
00:26:11.955 --> 00:26:15.075
And then at a, uh, then, uh, uh, an identifier at the end.
613
00:26:15.815 --> 00:26:17.275
So now I'm not gonna sit here
614
00:26:17.275 --> 00:26:18.395
and wait for this job to finish
615
00:26:18.395 --> 00:26:20.435
because depending on what you told it to do,
616
00:26:20.455 --> 00:26:22.355
it could take 15 minutes, it could take an hour,
617
00:26:22.355 --> 00:26:23.755
it could take a couple hours depending on
618
00:26:23.755 --> 00:26:24.835
how much training data you give
619
00:26:24.835 --> 00:26:26.795
and how much their batch sizes
620
00:26:26.915 --> 00:26:28.675
and all those hyper parameters that you gave it.
621
00:26:28.675 --> 00:26:29.875
So we're gonna let that thing run.
622
00:26:30.295 --> 00:26:32.195
But in the meantime, we're just going
623
00:26:32.195 --> 00:26:34.195
to look at the data that it generated.
624
00:26:34.215 --> 00:26:38.905
And so yesterday I ended up with three, three files here.
625
00:26:38.965 --> 00:26:42.585
So what we did is we generated the questions
626
00:26:43.505 --> 00:26:45.765
and then we then after we, so, and then
627
00:26:45.765 --> 00:26:47.925
after we did that, we, we created our training file.
628
00:26:48.015 --> 00:26:49.165
After we had the training file,
629
00:26:49.165 --> 00:26:50.485
we train, we fine tuned a model.
630
00:26:50.865 --> 00:26:52.245
And then what I would do is
631
00:26:52.245 --> 00:26:54.885
after I got the fine tuned model, I came back in here
632
00:26:55.225 --> 00:26:57.965
and I adjusted my, uh, question helper here that I have
633
00:26:58.465 --> 00:27:01.285
to go ahead and used the new model.
634
00:27:01.625 --> 00:27:02.645
So I then asked it
635
00:27:02.645 --> 00:27:06.045
to make a thousand questions using my fine tuned model.
636
00:27:06.155 --> 00:27:08.925
Alright? And so then it created another CSV file.
637
00:27:09.025 --> 00:27:11.205
And so then it created a thousand questions.
638
00:27:11.205 --> 00:27:12.845
And what we did was just kind of check
639
00:27:13.585 --> 00:27:15.485
did we see any difference in the length of the items?
640
00:27:15.625 --> 00:27:17.925
Now again, we're not giving it any instructions about the
641
00:27:17.925 --> 00:27:20.725
length, we're just giving it the, we're just,
642
00:27:20.935 --> 00:27:23.805
we're just fine tuning it on the styles of the questions
643
00:27:23.805 --> 00:27:26.045
that we want to generate and see if it picks up on that.
644
00:27:26.905 --> 00:27:28.925
So the results that we ended up with
645
00:27:28.925 --> 00:27:33.225
after those three test runs are right here.
646
00:27:35.105 --> 00:27:36.485
All right, so you see in the beginning,
647
00:27:36.485 --> 00:27:37.725
this was our original one.
648
00:27:38.105 --> 00:27:39.445
The, the smallest question
649
00:27:39.445 --> 00:27:41.445
that was generated was 34 characters.
650
00:27:41.585 --> 00:27:43.445
The largest question was 192.
651
00:27:43.505 --> 00:27:45.645
We had an average character count of 93,
652
00:27:46.065 --> 00:27:48.445
and the number of items that were less than, uh,
653
00:27:48.525 --> 00:27:51.045
105 characters was 706.
654
00:27:51.145 --> 00:27:53.645
So 706 of the items that I generated just
655
00:27:53.645 --> 00:27:57.525
with the GPT-3 0.5 model were 700, six of them were
656
00:27:57.525 --> 00:27:58.725
below 105 characters.
657
00:27:59.385 --> 00:28:02.405
So then I created a fine tune model with three epochs,
658
00:28:02.665 --> 00:28:05.085
and then I generated another thousand questions using
659
00:28:05.085 --> 00:28:06.165
that fine tune model.
660
00:28:06.545 --> 00:28:08.965
The results that I saw was, we did see a difference there.
661
00:28:09.105 --> 00:28:10.845
Um, mainly right here, the number
662
00:28:10.845 --> 00:28:13.565
of items less than 105 characters, it reduced.
663
00:28:14.105 --> 00:28:18.125
All right? So, um, 242 or something like that, alright?
664
00:28:18.125 --> 00:28:19.805
And my average character count went up
665
00:28:19.945 --> 00:28:21.565
and then I just tried it one more time.
666
00:28:21.905 --> 00:28:25.325
So you can actually fine tune, um, on a fine tune model.
667
00:28:25.385 --> 00:28:27.885
In this case, what I did was I just ran my fine tuning job
668
00:28:27.885 --> 00:28:29.565
against my base dataset twice, once
669
00:28:29.565 --> 00:28:31.125
with three epochs and once with 10 epochs.
670
00:28:31.305 --> 00:28:34.925
The ones, once I did 10 pass throughs, we had 317.
671
00:28:35.425 --> 00:28:36.685
So this is interesting,
672
00:28:36.705 --> 00:28:39.845
but it also raises a question about overfitting.
673
00:28:39.995 --> 00:28:43.805
Alright, so overfitting is when the, you get the model
674
00:28:43.825 --> 00:28:45.565
to be really good at a certain task,
675
00:28:45.705 --> 00:28:47.405
but now you've adjusted the weights
676
00:28:47.405 --> 00:28:49.605
and balances so much that it's not gonna be
677
00:28:49.605 --> 00:28:51.365
so good at generating questions that aren't
678
00:28:51.365 --> 00:28:52.685
for this specific task.
679
00:28:53.225 --> 00:28:56.405
So the fact that it's get, so the fact that I ran 10,
680
00:28:56.525 --> 00:28:58.685
I would be really suspicious about overfitting
681
00:28:58.685 --> 00:29:00.765
and what kind of items it could generate otherwise.
682
00:29:01.185 --> 00:29:04.405
But for the purposes of just this plain, uh, demonstration,
683
00:29:04.665 --> 00:29:06.965
it was interesting to see how it kind
684
00:29:06.965 --> 00:29:08.765
of got more in alignment with what we were expecting
685
00:29:09.275 --> 00:29:11.845
without me giving any additional prompts,
686
00:29:11.845 --> 00:29:13.525
context stuffing or anything like that.
687
00:29:13.665 --> 00:29:16.085
One shot, few shot learnings to the, to the model.
688
00:29:17.825 --> 00:29:22.405
So that, um, that the fact that it can, um,
689
00:29:23.035 --> 00:29:27.045
kind of infer how you're trying to train it, uh, is kind
690
00:29:27.045 --> 00:29:28.125
of a double-edged sword, right?
691
00:29:28.125 --> 00:29:31.725
Because we didn't say anything about using, you know,
692
00:29:31.825 --> 00:29:36.445
larger items, uh, when it's, uh, responding, uh, it,
693
00:29:36.505 --> 00:29:38.005
it just knows to do that, right?
694
00:29:38.005 --> 00:29:39.725
Because we gave all those examples,
695
00:29:39.825 --> 00:29:43.365
but there could easily be other characteristics
696
00:29:43.545 --> 00:29:44.845
of those items that we
697
00:29:45.125 --> 00:29:48.085
provided that are not necessarily obvious to us
698
00:29:48.555 --> 00:29:52.165
that we could, if we continue to, to use that training data,
699
00:29:52.585 --> 00:29:54.565
uh, it could start, you know, using those
700
00:29:54.745 --> 00:29:56.685
as characteristics in, in the items
701
00:29:56.685 --> 00:29:57.765
that it generates as well.
702
00:29:58.465 --> 00:30:01.505
Yep. Chris, we,
703
00:30:02.085 --> 00:30:03.745
Oh, you do have a question in the chat?
704
00:30:04.125 --> 00:30:05.665
So it says, curious to know,
705
00:30:05.775 --> 00:30:07.505
what are the biggest challenges you faced
706
00:30:07.505 --> 00:30:09.025
during the fine tuning process?
707
00:30:12.745 --> 00:30:14.845
Um, I think it was a lot of trial
708
00:30:14.845 --> 00:30:16.925
and error just coming, coming into afresh.
709
00:30:17.025 --> 00:30:19.725
We, we tried, we read about a lot of the things
710
00:30:19.725 --> 00:30:22.405
that we read about didn't really seem to pan out for us,
711
00:30:22.505 --> 00:30:24.285
and I wasn't sure if that would just due to lack
712
00:30:24.285 --> 00:30:26.005
of knowledge on our part or, um,
713
00:30:26.785 --> 00:30:29.525
or, uh, just it not, not working as expected.
714
00:30:29.665 --> 00:30:31.925
But one of the things that comes to mind, Kyle, is those,
715
00:30:31.945 --> 00:30:34.765
uh, those anti those anti weights that we were trying to do.
716
00:30:35.225 --> 00:30:37.285
And so what we were trying to do there is generate a
717
00:30:37.445 --> 00:30:39.645
training file where we would basically say,
718
00:30:39.955 --> 00:30:41.005
this is a generate.
719
00:30:41.185 --> 00:30:42.725
So I had my system instruction,
720
00:30:42.725 --> 00:30:44.125
which is you generate questions,
721
00:30:44.245 --> 00:30:45.965
I asked the generated question as a user,
722
00:30:46.265 --> 00:30:48.645
and then I gave, then I had it, my training file.
723
00:30:48.705 --> 00:30:50.445
Say, okay, here's your question.
724
00:30:50.865 --> 00:30:52.605
And then I tried doing positive
725
00:30:52.605 --> 00:30:54.045
and negative reinforcement after that.
726
00:30:54.185 --> 00:30:56.045
So I would say, this is a bad item
727
00:30:56.045 --> 00:30:58.405
because it is less than 105 characters,
728
00:30:58.425 --> 00:31:00.365
or this is a good item because it is more than
729
00:31:00.365 --> 00:31:01.405
105 characters.
730
00:31:01.865 --> 00:31:03.485
And then, um, and,
731
00:31:03.745 --> 00:31:05.605
and we didn't see any difference with that really.
732
00:31:05.625 --> 00:31:08.965
It might may like it, it made me, it was so insignificant
733
00:31:08.965 --> 00:31:11.085
that it just made me think it was coincidental other than,
734
00:31:11.105 --> 00:31:12.925
um, actual cause and response there.
735
00:31:13.385 --> 00:31:15.845
And, and where, where it gets challenging with that,
736
00:31:15.875 --> 00:31:18.205
it's like, okay, is it really not that or is it
737
00:31:18.205 --> 00:31:20.085
because you need to tweak your hyper parameters
738
00:31:20.105 --> 00:31:21.885
and do you need to do this thing a hundred times,
739
00:31:22.385 --> 00:31:25.085
10 different ways in order to get what your end result is?
740
00:31:25.425 --> 00:31:28.445
And so it's really just building that knowledge set
741
00:31:28.445 --> 00:31:30.965
that's required to really build a good,
742
00:31:31.625 --> 00:31:32.725
uh, fine tuned model.
743
00:31:32.825 --> 00:31:35.125
And Kyle brought up a great example there
744
00:31:35.125 --> 00:31:37.285
because you gotta also be careful,
745
00:31:37.355 --> 00:31:39.605
like you're not only working towards your your task,
746
00:31:39.905 --> 00:31:41.885
you want to make sure you don't break all the other tasks
747
00:31:41.885 --> 00:31:43.005
that you're not focusing on,
748
00:31:45.125 --> 00:31:46.125
Right? Um,
749
00:31:46.125 --> 00:31:48.515
and, and I would say the, the other, um,
750
00:31:49.455 --> 00:31:50.915
the other challenge that we face,
751
00:31:50.915 --> 00:31:53.475
and I think you kind of alluded to it, is, uh,
752
00:31:53.505 --> 00:31:56.475
when we are messing with this stuff, you know, as soon
753
00:31:56.475 --> 00:32:00.235
as OpenAI puts it out, um, you know, we're,
754
00:32:00.785 --> 00:32:02.675
they're learning to right, like this,
755
00:32:02.675 --> 00:32:04.875
this stuff is not in its final form.
756
00:32:05.215 --> 00:32:08.795
Uh, and it's quite possible that we're, uh, seeing things
757
00:32:09.305 --> 00:32:12.315
that are bugs on, you know, open AI's end
758
00:32:12.315 --> 00:32:13.595
and we don't know about it.
759
00:32:13.775 --> 00:32:16.995
We think that we are doing, uh, something wrong on,
760
00:32:17.095 --> 00:32:20.635
on our end, or maybe there's documentation about the way
761
00:32:20.635 --> 00:32:22.595
that things should work, but it's not,
762
00:32:22.855 --> 00:32:24.195
you know, quite there yet.
763
00:32:25.135 --> 00:32:27.355
Um, we have, uh, another question.
764
00:32:27.775 --> 00:32:30.235
Uh, how would you measure the success of a,
765
00:32:30.255 --> 00:32:31.635
of a fine tuned model,
766
00:32:34.325 --> 00:32:35.325
Man? Um, so
767
00:32:35.325 --> 00:32:37.025
I think the validation files
768
00:32:37.025 --> 00:32:38.185
really come into play there.
769
00:32:38.245 --> 00:32:39.685
So by having your test set,
770
00:32:39.785 --> 00:32:42.605
you're having a static comparison against everything, right?
771
00:32:43.005 --> 00:32:44.245
Whenever, whenever you do a comparison,
772
00:32:44.305 --> 00:32:45.765
you always wanna make sure that you're,
773
00:32:45.785 --> 00:32:48.845
you're comparing your results to the same expected result.
774
00:32:49.105 --> 00:32:50.925
And so by having that validation file
775
00:32:51.225 --> 00:32:54.285
and looking at the, the loss, the loss, uh, function
776
00:32:54.305 --> 00:32:55.645
and seeing what kind of values you get
777
00:32:55.645 --> 00:32:57.005
there, I think that's gonna be good.
778
00:32:57.005 --> 00:33:00.045
But there's also the, your, your angle, your end users, um,
779
00:33:00.145 --> 00:33:01.405
are, are they seeing the benefit
780
00:33:01.475 --> 00:33:02.725
that you intended to have there?
781
00:33:02.745 --> 00:33:04.965
So I think there's several different metrics, both, um,
782
00:33:04.965 --> 00:33:06.365
that can be done through calculations
783
00:33:06.585 --> 00:33:08.085
and also through user feedback.
784
00:33:09.815 --> 00:33:13.515
Uh, so we've got only about, uh, 13 minutes left.
785
00:33:13.595 --> 00:33:15.755
I wanna make sure that we have, uh, plenty of time
786
00:33:15.895 --> 00:33:17.155
for, for rag.
787
00:33:17.735 --> 00:33:20.115
Um, so let's get started there.
788
00:33:20.735 --> 00:33:24.035
Um, I think we wanted to start by defining, uh,
789
00:33:24.515 --> 00:33:28.355
a few concepts, Chris, uh, embeddings chunking strategies,
790
00:33:28.495 --> 00:33:30.875
uh, semantic search versus keyword search,
791
00:33:31.885 --> 00:33:32.885
Right? Yeah. So,
792
00:33:32.885 --> 00:33:35.675
so going back to like rag, what is rag,
793
00:33:35.675 --> 00:33:36.715
what do we want to do there?
794
00:33:36.775 --> 00:33:39.875
So with rag retrieval, augmented generation, what we want
795
00:33:39.875 --> 00:33:41.195
to do is we want to go
796
00:33:41.455 --> 00:33:45.355
and choose only the data that we want to include
797
00:33:45.355 --> 00:33:48.835
with our prompt as when we ask the generative model.
798
00:33:48.855 --> 00:33:51.435
So when we ask the generative model something, we only want
799
00:33:51.435 --> 00:33:53.275
to provide just the minimal amount of prompt
800
00:33:53.275 --> 00:33:55.155
with the most pertinent information, so
801
00:33:55.155 --> 00:33:56.355
that way we're most likely to
802
00:33:56.355 --> 00:33:57.475
get the result that's intended.
803
00:33:57.935 --> 00:33:59.635
So how do you do that? How do you get from
804
00:33:59.635 --> 00:34:00.675
point A to point B?
805
00:34:01.255 --> 00:34:03.795
So to start with, um, Kyle, Kyle
806
00:34:03.795 --> 00:34:07.435
and I's first, uh, uh, kind of getting our feet wet
807
00:34:07.435 --> 00:34:09.595
with this thing was with the, uh, a TP chat bot
808
00:34:09.595 --> 00:34:12.195
that we put together for, uh, the A TP conference.
809
00:34:12.255 --> 00:34:15.235
And so beforehand what we wanted to do was we kind
810
00:34:15.235 --> 00:34:16.955
of rolled our own rag solution just
811
00:34:16.955 --> 00:34:18.235
to kind of work our way through it.
812
00:34:18.255 --> 00:34:20.995
And so Amanda had put together what what's called
813
00:34:20.995 --> 00:34:22.115
as a A TP playbook.
814
00:34:22.255 --> 00:34:24.395
And so it has a lot of general information in it
815
00:34:24.395 --> 00:34:27.595
and a lot of really good ITS specific information in it
816
00:34:27.595 --> 00:34:28.755
that, that we all needed to use.
817
00:34:29.055 --> 00:34:31.995
Um, it's a big document, great document, big document.
818
00:34:31.995 --> 00:34:33.395
And so what we want to do is say, well,
819
00:34:33.615 --> 00:34:35.155
can we take this document along
820
00:34:35.155 --> 00:34:39.195
with the A TP session schedule and then do rag on it
821
00:34:39.195 --> 00:34:42.395
and create a little teams a TP chatbot in order to do it?
822
00:34:42.495 --> 00:34:44.915
And so to set about doing that, the first thing we had
823
00:34:44.915 --> 00:34:46.515
to do was, well, how do we index our data?
824
00:34:46.535 --> 00:34:49.395
How do, how do we make it so GPT can just read this
825
00:34:49.695 --> 00:34:52.515
so we're not sending the model the whole PDF
826
00:34:52.755 --> 00:34:53.835
document every single time.
827
00:34:54.455 --> 00:34:56.515
So what we needed to do was we needed
828
00:34:56.515 --> 00:34:58.155
to ingest this document in a way
829
00:34:58.155 --> 00:34:59.515
that could be consumed by the model.
830
00:35:00.415 --> 00:35:03.835
So the first thing that we did there was we broke it out.
831
00:35:03.835 --> 00:35:06.715
We came up with a quote unquote chunking strategy. Alright?
832
00:35:06.975 --> 00:35:09.515
And so a chunking strategy, essentially at the end
833
00:35:09.515 --> 00:35:11.795
of the day, what you're trying to do there is you're trying
834
00:35:11.795 --> 00:35:15.115
to take your big PDF document, your big Excel spreadsheet
835
00:35:15.295 --> 00:35:18.675
and break it up into pieces of information that make sense
836
00:35:19.135 --> 00:35:20.435
as a, as a standalone,
837
00:35:20.435 --> 00:35:22.115
they have their own semantic meaning to them.
838
00:35:22.375 --> 00:35:24.275
And that way you like it sets you up to be able
839
00:35:24.275 --> 00:35:25.875
to send these things to the provider.
840
00:35:26.535 --> 00:35:31.035
Um, so I'm gonna share my screen again here. Alright?
841
00:35:31.035 --> 00:35:34.155
And so I just have a little tiny example of it
842
00:35:34.915 --> 00:35:36.215
in an Excel spreadsheet.
843
00:35:36.365 --> 00:35:37.565
What we did though was, uh,
844
00:35:37.565 --> 00:35:39.685
we ended up using a Postgres SQL database
845
00:35:39.925 --> 00:35:42.645
'cause we really wanted to use, um, some native vector, um,
846
00:35:42.965 --> 00:35:45.685
functionality that you could, you could figure out in sql,
847
00:35:45.685 --> 00:35:47.165
but it was just easier through Postgres.
848
00:35:47.165 --> 00:35:49.165
So the first thing I did was I went
849
00:35:49.165 --> 00:35:51.085
through Amanda's document and I chucked it up.
850
00:35:51.125 --> 00:35:52.765
I chunked it up. Hey, yeah, you're
851
00:35:52.765 --> 00:35:53.765
Not sharing your screen yet. We
852
00:35:53.765 --> 00:35:55.125
can't see it yet. If you are
853
00:35:55.665 --> 00:35:58.005
It def Oh, I, 'cause I didn't hit the share button.
854
00:35:58.155 --> 00:36:01.445
Yeah. All right. Okay. You see it now? Yes. All right.
855
00:36:01.585 --> 00:36:03.005
Can you see my screen? All right. Okay.
856
00:36:03.385 --> 00:36:05.485
So what we did was we chunked it up into a
857
00:36:05.485 --> 00:36:06.525
lot of different pieces here.
858
00:36:06.525 --> 00:36:07.965
All right, so I'm just showing you six records
859
00:36:08.245 --> 00:36:10.085
'cause I didn't wanna show all the beautiful stuff Amanda
860
00:36:10.105 --> 00:36:11.805
had in there 'cause I probably a lot
861
00:36:11.805 --> 00:36:12.805
of it stuff we don't wanna show,
862
00:36:12.825 --> 00:36:13.845
but I just showed some,
863
00:36:13.955 --> 00:36:15.805
some basic information to give you an idea.
864
00:36:15.865 --> 00:36:19.005
So the first thing I did was chunk it. Alright, okay, cool.
865
00:36:19.125 --> 00:36:22.365
I have a database of all these strings. Now what do I do?
866
00:36:22.425 --> 00:36:24.365
How, how do I know which ones are relevant
867
00:36:24.365 --> 00:36:25.525
to the question being asked?
868
00:36:25.555 --> 00:36:28.085
Alright, so that's where embeddings come into play.
869
00:36:28.305 --> 00:36:30.605
So, and embedding, think about it, it's,
870
00:36:30.605 --> 00:36:32.725
so it's really taking that string of text
871
00:36:32.825 --> 00:36:35.125
and it's making it so that it's, it's,
872
00:36:35.125 --> 00:36:37.525
it's putting it in computer language at a really high level.
873
00:36:37.665 --> 00:36:40.245
So, uh, one way to think about it is it's like kind
874
00:36:40.245 --> 00:36:43.165
of coordinates to, to all the texts, all the,
875
00:36:43.555 --> 00:36:46.925
it's like provides semantic meaning in number form, alright?
876
00:36:46.925 --> 00:36:49.165
And so it's coordinates to that semantic value
877
00:36:49.465 --> 00:36:51.685
and then that way you can actually compare things
878
00:36:51.705 --> 00:36:53.805
and do like a co-sign similarity lookup,
879
00:36:53.805 --> 00:36:54.965
which I'll get to in a minute here.
880
00:36:54.965 --> 00:36:57.605
So the next thing I did was I used an embedding model.
881
00:36:57.765 --> 00:37:01.455
I think I might've used, um, ada, possibly a to oh
882
00:37:01.515 --> 00:37:03.135
to the open AI embedding model.
883
00:37:03.195 --> 00:37:05.575
And so for each one of these strings in my database,
884
00:37:05.935 --> 00:37:08.095
I set it up to the embedding endpoint.
885
00:37:08.095 --> 00:37:10.455
Alright, what does an embedding endpoint do?
886
00:37:10.485 --> 00:37:11.575
Well, it takes your string
887
00:37:11.795 --> 00:37:14.455
and it generates an embedding, which is pretty much a series
888
00:37:14.455 --> 00:37:17.775
of numbers that provide a semantic, uh, context
889
00:37:18.075 --> 00:37:19.615
around your string that you sent it.
890
00:37:19.615 --> 00:37:21.695
And so now I ended up with a database
891
00:37:21.695 --> 00:37:22.855
that had two columns here,
892
00:37:22.855 --> 00:37:25.015
a database table added two columns, one for my text
893
00:37:25.275 --> 00:37:26.615
and one for my embedding, right?
894
00:37:26.835 --> 00:37:28.895
You don't need to know how these embeddings work
895
00:37:28.915 --> 00:37:29.975
to use them, alright?
896
00:37:30.315 --> 00:37:31.895
But what we do now is, okay,
897
00:37:31.895 --> 00:37:34.215
so now I have these number values and I have a string.
898
00:37:34.725 --> 00:37:36.815
Okay, cool. Alright, so now I have a database.
899
00:37:36.915 --> 00:37:39.975
So how, again, how do I know, how does that help me get
900
00:37:39.975 --> 00:37:42.335
to the point where I can include these pieces of information
901
00:37:42.755 --> 00:37:45.175
to the model and and include these with my prompt?
902
00:37:45.205 --> 00:37:47.375
Alright, so let's say like, where,
903
00:37:47.425 --> 00:37:48.895
where are we staying at the hotel?
904
00:37:48.915 --> 00:37:50.975
You could see right here, Anaheim Marriott.
905
00:37:51.155 --> 00:37:53.815
How does it know out of these five pieces of text, that's
906
00:37:53.815 --> 00:37:55.015
that's the one that I need to do.
907
00:37:55.475 --> 00:37:57.455
That's where these lookups come in. All right?
908
00:37:57.675 --> 00:37:58.735
And so what Kyle
909
00:37:58.735 --> 00:38:02.055
and I did, we did a very, very basic one, um,
910
00:38:02.055 --> 00:38:04.135
that really showed us true colors when we compared it
911
00:38:04.135 --> 00:38:05.415
against some of the other more,
912
00:38:05.435 --> 00:38:06.735
uh, provider friendly models.
913
00:38:07.035 --> 00:38:09.455
Um, so we did a basic co-sign similarity.
914
00:38:09.455 --> 00:38:12.175
So the way it works is, so let's say I take my prompt.
915
00:38:12.445 --> 00:38:14.735
What hotel am I staying at for a TP?
916
00:38:15.275 --> 00:38:18.255
The first thing I do is it in my chat system.
917
00:38:18.315 --> 00:38:19.655
The way the chat bot, the first thing
918
00:38:19.655 --> 00:38:21.495
that chat bot does is it takes that question
919
00:38:21.955 --> 00:38:24.655
and doesn't try to answer it, it doesn't try to do anything.
920
00:38:24.755 --> 00:38:28.535
We take that string, where am I staying at a TP sends it up
921
00:38:28.535 --> 00:38:29.575
to the embedding endpoint
922
00:38:29.595 --> 00:38:31.735
and it gets an embedding back for that string.
923
00:38:32.155 --> 00:38:35.255
So now I call up and I have an embedding that embedding,
924
00:38:35.365 --> 00:38:38.765
we then use that to do a co-sign similarity lookup across
925
00:38:38.985 --> 00:38:40.365
all everything in the database.
926
00:38:40.365 --> 00:38:42.965
Alright? And so what it's gonna do is it's gonna take
927
00:38:42.965 --> 00:38:45.045
that string for where am I staying at a TP,
928
00:38:45.265 --> 00:38:47.005
and it's going to find, it's going
929
00:38:47.005 --> 00:38:50.245
to order everything in my database that is, um,
930
00:38:50.555 --> 00:38:53.685
closeness in an order of closeness to that semantic meaning.
931
00:38:53.685 --> 00:38:57.085
Alright? So ideally this row right here is gonna be
932
00:38:57.085 --> 00:38:58.125
at the top one, right?
933
00:38:58.505 --> 00:39:00.605
So what we've talked about here is kind of a, a,
934
00:39:00.925 --> 00:39:02.685
a lightweight chunking strategy, all right?
935
00:39:02.745 --> 00:39:05.965
And so again, our chunking strategy was also poor in the
936
00:39:05.965 --> 00:39:07.205
fact that we didn't have any overlap.
937
00:39:07.205 --> 00:39:09.725
When you define a chunking strategy, you wanna add,
938
00:39:09.725 --> 00:39:11.805
chunk it up into text, but you also wanna do a
939
00:39:11.925 --> 00:39:13.005
thing that's called overlap.
940
00:39:13.275 --> 00:39:16.785
What overlap does is it says, so let's say my chunk,
941
00:39:16.985 --> 00:39:19.225
I wanna define it as 800 tokens, alright?
942
00:39:19.405 --> 00:39:22.105
And then I define my overlap as 400 tokens.
943
00:39:22.365 --> 00:39:25.185
So when I create a chunk about the Anaheim Marriott Hotel,
944
00:39:25.185 --> 00:39:27.705
about where we're staying, when I get to my next piece
945
00:39:27.705 --> 00:39:29.825
of information that's chunked, I'm going
946
00:39:29.825 --> 00:39:32.705
to include the last 400 tokens of this one
947
00:39:33.085 --> 00:39:36.585
and include it as the first 400 tokens of my next chunk.
948
00:39:36.735 --> 00:39:38.745
Alright? So you're getting some overlap,
949
00:39:38.745 --> 00:39:39.985
you're getting some relationships,
950
00:39:40.205 --> 00:39:42.145
and it helps with that semantic lookup
951
00:39:42.145 --> 00:39:44.145
that you're gonna be doing in your semantic rankings.
952
00:39:44.145 --> 00:39:45.625
And so that's a chunking strategy.
953
00:39:45.685 --> 00:39:48.425
The last piece of the chunking strategy is, well,
954
00:39:48.425 --> 00:39:50.305
how many chunks do you want to include, right?
955
00:39:50.485 --> 00:39:53.185
So just because I order all these things by order
956
00:39:53.205 --> 00:39:56.425
of relevance it, I'm still not gonna send them all up.
957
00:39:56.485 --> 00:39:59.105
And so you gotta draw your line, where's your hard line?
958
00:39:59.105 --> 00:40:01.225
And so in our case, I think we ended up with like 10
959
00:40:01.225 --> 00:40:02.945
or 15 chunks that we wanted to send.
960
00:40:03.005 --> 00:40:06.585
So imagine I had a table here of like 500 rows,
961
00:40:06.585 --> 00:40:10.545
or a th I think we have like 3005, 3000 rows.
962
00:40:10.545 --> 00:40:11.705
Let's just say 3000 rows.
963
00:40:11.855 --> 00:40:14.145
Well, I gotta find the best 20 out
964
00:40:14.145 --> 00:40:15.905
of those 3000 I'm gonna send the model
965
00:40:15.905 --> 00:40:17.465
because I don't want overload it.
966
00:40:17.465 --> 00:40:20.185
Alright? So we do it, we did our cosign similarity
967
00:40:20.245 --> 00:40:21.265
and then we sent it up
968
00:40:21.365 --> 00:40:23.345
and we just crossed our fingers that the piece
969
00:40:23.345 --> 00:40:25.145
of information about Anaheim was in it,
970
00:40:25.245 --> 00:40:26.545
and then it would answer it,
971
00:40:26.805 --> 00:40:29.865
and then it, um, more often than not, it worked really well.
972
00:40:30.075 --> 00:40:31.225
Where, Hey, Chris, where are we?
973
00:40:31.375 --> 00:40:35.065
Yeah, he's got four minutes. Just fyi, man. All right. Okay.
974
00:40:35.065 --> 00:40:38.425
Alright, so, so, so that, that's the idea, all right?
975
00:40:38.685 --> 00:40:41.545
Um, so now how, how can you do that a little better, right?
976
00:40:41.545 --> 00:40:43.145
So that, that's a lot of work.
977
00:40:43.245 --> 00:40:44.705
You gotta come with a chunking strategy.
978
00:40:44.845 --> 00:40:46.585
You gotta fine tune, you gotta make sure it's good.
979
00:40:47.145 --> 00:40:49.265
Providers have already figured that out. Alright?
980
00:40:49.365 --> 00:40:52.265
So there's other, so something we, something
981
00:40:52.265 --> 00:40:54.465
that we've been looking at now, uh, that we're integrated
982
00:40:54.465 --> 00:40:56.945
with is, uh, open ai, uh, vector stores.
983
00:40:57.625 --> 00:40:59.785
AWS has a thing called knowledge bases. Alright?
984
00:41:00.005 --> 00:41:02.105
And so what they're doing with these vector stores
985
00:41:02.125 --> 00:41:04.185
and these knowledge bases is they're
986
00:41:04.185 --> 00:41:05.345
doing all this work for you.
987
00:41:05.345 --> 00:41:07.385
They're doing the ingestion part. Alright?
988
00:41:07.725 --> 00:41:10.905
So what you do is, um, so we have another example here
989
00:41:10.905 --> 00:41:12.025
of it being integrated.
990
00:41:12.885 --> 00:41:14.825
And so with, so if you take, again, staying
991
00:41:14.825 --> 00:41:17.985
with the open AI example, all right, well now I need
992
00:41:17.985 --> 00:41:19.145
to be on the VP n again.
993
00:41:19.405 --> 00:41:20.405
All right?
994
00:41:28.415 --> 00:41:30.795
So just interrupt me when you're back on the VPN Chris.
995
00:41:31.295 --> 00:41:32.955
Um, so what we've seen here is kind
996
00:41:32.955 --> 00:41:35.365
of a homegrown solution, uh, to rag.
997
00:41:35.545 --> 00:41:37.205
We built a database.
998
00:41:37.465 --> 00:41:40.045
We, uh, took all of our documents
999
00:41:40.045 --> 00:41:41.645
and we split 'em up into chunks.
1000
00:41:41.665 --> 00:41:42.965
We added 'em to the database.
1001
00:41:43.385 --> 00:41:46.525
The idea was when a question comes in, we'll see which
1002
00:41:46.525 --> 00:41:48.845
of those chunks is closest to the question, uh,
1003
00:41:48.865 --> 00:41:50.125
by comparing embeddings.
1004
00:41:50.185 --> 00:41:52.485
And then, uh, we'll take those rows
1005
00:41:52.945 --> 00:41:55.685
and we'll supply them as, as part of the prompt.
1006
00:41:56.755 --> 00:41:59.805
What we're gonna look at next is, uh,
1007
00:42:00.185 --> 00:42:02.045
OpenAI doing all of that for us.
1008
00:42:02.275 --> 00:42:04.005
Alls we have to do is supply the documents.
1009
00:42:04.005 --> 00:42:05.085
So we're gonna upload those
1010
00:42:05.585 --> 00:42:07.645
and then we're essentially gonna ask a question.
1011
00:42:08.195 --> 00:42:11.325
It's going to find the relevant information, it's going
1012
00:42:11.325 --> 00:42:13.965
to append it to, uh, the prompt that we send.
1013
00:42:14.145 --> 00:42:15.485
And it's really that easy.
1014
00:42:16.185 --> 00:42:18.565
Yep. Go ahead Chris. Yep. No, that's a great setup.
1015
00:42:18.785 --> 00:42:20.365
So in order to use the, um,
1016
00:42:20.505 --> 00:42:22.245
so the vector stores are only available
1017
00:42:22.245 --> 00:42:23.285
through the assistance.
1018
00:42:23.285 --> 00:42:24.965
So these agents that OpenAI has.
1019
00:42:24.985 --> 00:42:28.005
So an agent without kind of getting down to the weeds,
1020
00:42:28.065 --> 00:42:29.565
is just, it's a way that's going
1021
00:42:29.565 --> 00:42:31.765
to process your messages in a stateful sense.
1022
00:42:31.785 --> 00:42:34.405
So it keeps a history, it manages these threads.
1023
00:42:34.405 --> 00:42:36.845
And so I'm just gonna set up a, a little assistant here
1024
00:42:37.625 --> 00:42:39.285
to process my vector store
1025
00:42:39.285 --> 00:42:40.565
and I'm gonna give it some instructions.
1026
00:42:40.805 --> 00:42:43.405
Honestly, you are a helpful assistant
1027
00:42:43.945 --> 00:42:45.685
and then you could, you could give it any kind
1028
00:42:45.685 --> 00:42:47.845
of instructions here, whether whatever you want to do
1029
00:42:47.945 --> 00:42:51.125
and say that really loves dogs, all right?
1030
00:42:51.745 --> 00:42:54.045
And so then I could choose what kind of style that I want it
1031
00:42:54.045 --> 00:42:57.875
to work in, and then I can go ahead and choose my model
1032
00:42:57.935 --> 00:42:59.075
and create the assistant.
1033
00:42:59.175 --> 00:43:01.635
All right? So now what I've done is I actually went ahead
1034
00:43:01.635 --> 00:43:02.675
and I created a whole bunch
1035
00:43:02.675 --> 00:43:05.395
of documentation about a product that I made up, right?
1036
00:43:05.395 --> 00:43:08.275
Because to really drive this point home, I don't want
1037
00:43:08.275 --> 00:43:10.315
to use something that's already available on the internet
1038
00:43:10.315 --> 00:43:11.475
that the model's been trained on.
1039
00:43:11.495 --> 00:43:13.835
So I came up with a new product called Send It Later.
1040
00:43:14.185 --> 00:43:17.035
What it does is the idea behind the product is allows you
1041
00:43:17.035 --> 00:43:18.555
to schedule a message that could be sent
1042
00:43:18.555 --> 00:43:19.715
later on any platform.
1043
00:43:19.715 --> 00:43:20.995
It could be cross platform, multi
1044
00:43:20.995 --> 00:43:22.075
users, those type of things.
1045
00:43:22.135 --> 00:43:23.795
And so, wow, would I use this?
1046
00:43:23.825 --> 00:43:25.835
Well, let's say like Kyle's birthday's tomorrow
1047
00:43:25.935 --> 00:43:26.995
and I'm gonna be busy tomorrow.
1048
00:43:27.095 --> 00:43:28.195
No, I'm gonna forget about it.
1049
00:43:28.195 --> 00:43:30.235
So I'm just gonna write my message now, scheduled it
1050
00:43:30.235 --> 00:43:31.675
to be sent tomorrow, and then Kyle's gonna be
1051
00:43:31.675 --> 00:43:32.755
happy that I thought about 'em.
1052
00:43:32.755 --> 00:43:34.395
Alright? So that's the idea behind the product.
1053
00:43:34.455 --> 00:43:36.115
So I created a bunch of financial statements,
1054
00:43:36.115 --> 00:43:37.635
product summaries, those type of things,
1055
00:43:37.975 --> 00:43:39.995
and I added them to a vector store.
1056
00:43:39.995 --> 00:43:42.795
Alright? So I created my Chris test spot
1057
00:43:43.295 --> 00:43:45.755
and just to show you, um, I'm gonna ask it about,
1058
00:43:45.825 --> 00:43:48.395
tell me about Send It later, right?
1059
00:43:48.735 --> 00:43:52.275
And most likely being that it's ai, it's gonna try
1060
00:43:52.275 --> 00:43:53.435
and talk its way out of it.
1061
00:43:53.465 --> 00:43:54.675
It's gonna come up with something
1062
00:43:54.675 --> 00:43:56.075
that doesn't make any sense, right?
1063
00:43:56.375 --> 00:43:59.155
And so it's coming up with a service that's kind of similar,
1064
00:43:59.295 --> 00:44:01.875
but it's not really what mine is, alright?
1065
00:44:01.905 --> 00:44:04.195
It's not matching the documentation that I gave it.
1066
00:44:04.415 --> 00:44:07.795
So what I'm gonna do now is I'm gonna give my assistant all
1067
00:44:07.795 --> 00:44:09.835
of that rag everything that we talked about,
1068
00:44:09.935 --> 00:44:11.835
except I'm doing it through OpenAI,
1069
00:44:12.055 --> 00:44:14.675
who does it a lot better, who's using semantic ranking
1070
00:44:15.225 --> 00:44:18.475
keyword, a lot of things to get in place a whole lot better.
1071
00:44:18.935 --> 00:44:20.835
Top 10 results in what I was getting.
1072
00:44:20.975 --> 00:44:23.675
All right, so I have already set this up. I, yeah,
1073
00:44:24.415 --> 00:44:25.415
We are at time.
1074
00:44:26.175 --> 00:44:28.755
Um, we are at 1 45, so just
1075
00:44:28.835 --> 00:44:29.835
I have 1 43.
1076
00:44:30.065 --> 00:44:31.435
Alright? Oh, okay.
1077
00:44:31.495 --> 00:44:33.115
Am I early? Go ahead, keep going.
1078
00:44:33.145 --> 00:44:35.355
Okay, if you think so. So we, we set up,
1079
00:44:35.355 --> 00:44:36.475
we set up this vector store.
1080
00:44:36.555 --> 00:44:38.595
I called it Send It later, and I gave it four documents.
1081
00:44:38.595 --> 00:44:40.315
It ingested it, it did the embeddings,
1082
00:44:40.455 --> 00:44:42.355
and it did all of that information for me.
1083
00:44:42.375 --> 00:44:43.675
So now I'm just gonna go ahead
1084
00:44:43.675 --> 00:44:45.555
and I'm gonna hook it up to my assistant here.
1085
00:44:45.895 --> 00:44:47.195
All right. So I have my Chris test out,
1086
00:44:47.195 --> 00:44:48.395
I hook it up to my assistant.
1087
00:44:48.655 --> 00:44:50.035
So now when I come back
1088
00:44:50.055 --> 00:44:52.315
and I ask it a question about it, I could say,
1089
00:44:52.315 --> 00:44:53.995
tell me about Send it later.
1090
00:44:54.415 --> 00:44:57.395
So now what it's gonna do is it's gonna do that rag for me.
1091
00:44:57.425 --> 00:44:58.875
It's gonna go and it's actually gonna
1092
00:44:58.875 --> 00:44:59.995
read through the Vector store.
1093
00:45:00.095 --> 00:45:02.075
And you can see now it's actually giving me the real
1094
00:45:02.075 --> 00:45:03.195
information that I gave it
1095
00:45:03.455 --> 00:45:05.715
and it's citing the documents that it came from.
1096
00:45:05.865 --> 00:45:07.875
Alright, so what's a real, what's a,
1097
00:45:08.075 --> 00:45:09.835
what's a test industry application of this?
1098
00:45:09.945 --> 00:45:11.675
Well, I can go ahead at this point,
1099
00:45:11.885 --> 00:45:13.275
let's just say I took a whole bunch
1100
00:45:13.275 --> 00:45:15.475
and I, I wanted to create a quiz about how to use it.
1101
00:45:15.535 --> 00:45:20.215
So I could say, create a five question, multiple choice quiz
1102
00:45:21.475 --> 00:45:24.225
about how to use send it later, right?
1103
00:45:24.485 --> 00:45:28.385
So this is, this is something I could not do against a
1104
00:45:28.505 --> 00:45:29.625
standard GPT-4 model.
1105
00:45:29.685 --> 00:45:31.225
It doesn't know about it. But now
1106
00:45:31.225 --> 00:45:34.745
that it has this vector store that OpenAI did completely
1107
00:45:34.845 --> 00:45:36.585
for me, I just gave it the documents.
1108
00:45:36.805 --> 00:45:39.265
Now I got a five question test that's actually on
1109
00:45:39.785 --> 00:45:41.945
documentation and material that are relevant.
1110
00:45:42.685 --> 00:45:47.685
1 44.
1111
00:45:51.735 --> 00:45:52.505
Well done Chris.
1112
00:45:52.575 --> 00:45:55.305
Wonderful. I wanna thank everyone for being here today.
1113
00:45:55.325 --> 00:45:57.345
We will share the recording with you via email.
1114
00:45:57.685 --> 00:45:59.385
Uh, there will be a survey that pops up.
1115
00:45:59.445 --> 00:46:00.465
Let us know what you thought.
1116
00:46:00.485 --> 00:46:02.545
If there's something specific you'd like to see next time,
1117
00:46:02.545 --> 00:46:03.745
just tell us, uh,
1118
00:46:03.745 --> 00:46:06.545
and you can find our webinars at testis.com/webinars.
1119
00:46:06.725 --> 00:46:08.025
So we will see you soon.
1120
00:46:08.025 --> 00:46:09.425
Thank you again for being here,
1121
00:46:09.425 --> 00:46:10.665
and thank you to Chris and Kyle.
1122
00:46:12.315 --> 00:46:13.655
Bye everyone. Bye.