All Webinars | L.A.B.S. #7

AI in Practice: Part 2 | What Fine-Tuning an AI Model Really Means

Explore the essentials of fine-tuning AI models, clear up common misconceptions, and discover practical real-world applications. 

Level: Advanced🦅

Witness the enhanced performance of a fine-tuned model and get introduced to RAG and its significance.

+ 

Interested in partnering on a webinar? Share your ideas at webinars@testsys.com. 

1
00:00:05.395 --> 00:00:05.745
Great.

2
00:00:05.845 --> 00:00:08.545
Hi, everyone. Happy. Hi. Happy afternoon.

3
00:00:09.285 --> 00:00:11.305
We will get started in just a moment here.

4
00:00:11.305 --> 00:00:12.665
We're letting people into the room.

5
00:00:47.045 --> 00:00:49.375
Welcome everyone. We'll get started in just a moment.

6
00:01:15.465 --> 00:01:17.835
Welcome everyone. Just a moment. We'll get started.

7
00:01:18.025 --> 00:01:20.155
Just letting people time to, to come in

8
00:01:20.175 --> 00:01:22.595
and grab their lunch and join us.

9
00:01:46.005 --> 00:01:48.435
We're gonna go ahead and get started. So, hi everyone.

10
00:01:48.435 --> 00:01:49.875
Thank you for joining us today.

11
00:01:49.985 --> 00:01:53.475
This is part two of our ITS Summer Demo Days series,

12
00:01:53.895 --> 00:01:54.955
AI in Practice.

13
00:01:55.535 --> 00:01:58.515
I'm Amanda Crowley, the Director of Marketing here at ITS.

14
00:01:58.935 --> 00:02:00.835
Uh, I'll be your host for the series.

15
00:02:01.495 --> 00:02:04.115
Uh, two housekeeping things before we get started.

16
00:02:04.335 --> 00:02:06.395
The first is we'll be using the q

17
00:02:06.395 --> 00:02:08.995
and a feature, which is located at the bottom of Zoom.

18
00:02:09.375 --> 00:02:12.275
If you have comments or questions, anything you wanna ask

19
00:02:12.855 --> 00:02:15.315
our, um, two presenters, just put them there

20
00:02:15.375 --> 00:02:17.315
and we're gonna be answering them live time.

21
00:02:17.935 --> 00:02:19.595
Uh, second to that, the recording, uh,

22
00:02:19.595 --> 00:02:20.755
the webinar will be recorded.

23
00:02:21.375 --> 00:02:24.195
So, uh, we will share the link with you afterwards.

24
00:02:24.575 --> 00:02:25.995
So if you happen to have to drop

25
00:02:26.015 --> 00:02:27.515
or you wanna share this with your colleagues,

26
00:02:27.515 --> 00:02:29.315
that will definitely be available to you.

27
00:02:30.135 --> 00:02:32.635
So thank you so much for being here for with us today.

28
00:02:33.215 --> 00:02:34.715
Um, we have Chris Glacken.

29
00:02:34.845 --> 00:02:35.875
Chris is our Director

30
00:02:35.875 --> 00:02:38.035
of Innovative Technologies here at ITS,

31
00:02:38.455 --> 00:02:40.275
and joining him is Kyle Miller.

32
00:02:40.705 --> 00:02:43.875
Kyle is our manager of Item Workshop, uh,

33
00:02:43.875 --> 00:02:45.675
which is the ITS Item Bank.

34
00:02:46.055 --> 00:02:47.475
So thank you again. And with that,

35
00:02:47.505 --> 00:02:48.515
I'll turn it over to them.

36
00:02:50.145 --> 00:02:51.845
Thanks, Amanda. Hey, everyone.

37
00:02:52.465 --> 00:02:57.005
Um, so this, uh, webinar, uh, is titled, um,

38
00:02:57.115 --> 00:03:00.005
what does fine tuning, uh, really mean?

39
00:03:00.065 --> 00:03:02.285
Uh, we're gonna get into a few more things, uh,

40
00:03:02.285 --> 00:03:03.445
than just fine tuning.

41
00:03:04.075 --> 00:03:08.125
What we'd really like to cover is, uh, if, if you get into,

42
00:03:08.665 --> 00:03:11.605
uh, an ai, you get that, um, in your workplace,

43
00:03:11.665 --> 00:03:13.165
you start using it, um,

44
00:03:13.425 --> 00:03:17.565
and you find that it doesn't really, uh, meet all

45
00:03:17.645 --> 00:03:21.085
of your needs, um, what are your options, uh, for,

46
00:03:21.425 --> 00:03:22.805
uh, customization?

47
00:03:23.145 --> 00:03:26.885
So, um, Chris, why don't we start with, um,

48
00:03:27.325 --> 00:03:30.445
I think we have kind of four options for customization.

49
00:03:30.865 --> 00:03:34.845
Um, why don't we start with, uh, training, uh, a base model

50
00:03:35.305 --> 00:03:37.605
and, uh, then fine tuning that model.

51
00:03:37.785 --> 00:03:42.045
So, um, if you could, uh, kind of describe, um,

52
00:03:42.315 --> 00:03:44.165
what it is to train a model, what it is

53
00:03:44.165 --> 00:03:45.285
to fine tune a model,

54
00:03:45.505 --> 00:03:47.445
and, um, what the differences are,

55
00:03:47.545 --> 00:03:48.805
you know, between those two things.

56
00:03:50.055 --> 00:03:52.185
Sure. So, pre-training

57
00:03:52.185 --> 00:03:55.345
or training a model is something that if you're looking

58
00:03:55.345 --> 00:03:57.305
to do it, you're probably not on this call.

59
00:03:57.445 --> 00:03:59.545
Um, it, it requires a lot of resources

60
00:03:59.645 --> 00:04:00.665
and a lot of knowledge.

61
00:04:00.725 --> 00:04:02.785
Um, it requires a lot of money and a lot of tech.

62
00:04:02.805 --> 00:04:04.985
So if you think about pre-training, um,

63
00:04:05.605 --> 00:04:07.345
if you take the first model, it typically comes

64
00:04:07.345 --> 00:04:08.585
to mind is GPT, right?

65
00:04:08.665 --> 00:04:10.545
GPT-4, generative pre-trained model.

66
00:04:10.605 --> 00:04:12.425
So it's something that's trained on the

67
00:04:12.745 --> 00:04:14.305
whole internet, right?

68
00:04:14.325 --> 00:04:17.305
So it has all that documentation, all those teams

69
00:04:17.325 --> 00:04:18.585
behind it, all of that.

70
00:04:18.605 --> 00:04:20.705
And so to try to take on a task that yourself,

71
00:04:20.705 --> 00:04:23.025
that's a pretty big, that's a pretty big ask,

72
00:04:23.045 --> 00:04:24.185
and not a lot of, um,

73
00:04:24.185 --> 00:04:26.985
organizations are even gonna have the documentation to

74
00:04:26.985 --> 00:04:28.905
and text available to support something like that.

75
00:04:29.165 --> 00:04:31.265
So that's where fine tuning comes into play.

76
00:04:31.335 --> 00:04:33.545
What Fine Tune is gonna do, it's gonna take one

77
00:04:33.545 --> 00:04:35.865
of those foundational models, um, GPT-3

78
00:04:35.865 --> 00:04:37.745
that I even see they're offering G PT four

79
00:04:37.745 --> 00:04:39.185
and g PT 4.0 right now.

80
00:04:39.445 --> 00:04:41.385
But basically what you're gonna do is take one of those, um,

81
00:04:41.405 --> 00:04:45.545
models, foundational models, so GPT-4, GPT-3 0.5 turbo.

82
00:04:45.645 --> 00:04:47.585
You're gonna take those and you're gonna fine tune it

83
00:04:47.585 --> 00:04:48.745
and to fine tune it, what

84
00:04:48.745 --> 00:04:51.305
that means is you're really just adjusting the weights

85
00:04:51.405 --> 00:04:55.145
to kind of address some nuances and get some certain styles

86
00:04:55.145 --> 00:04:56.825
and expectations that you back certain

87
00:04:56.825 --> 00:04:58.145
formatting that you want back.

88
00:04:58.365 --> 00:05:00.065
So it's a whole lot more light, lightweight,

89
00:05:00.085 --> 00:05:01.985
it still requires a good bit of knowledge.

90
00:05:02.285 --> 00:05:04.065
Um, so you can't just blindly do it.

91
00:05:04.065 --> 00:05:06.345
It's gonna take certainly a bit of trial and error.

92
00:05:06.565 --> 00:05:09.625
Um, but it is, it's something that's more realistic

93
00:05:09.725 --> 00:05:12.505
for organizations looking to get a more, um,

94
00:05:13.095 --> 00:05:15.305
nuanced response back from the generative model,

95
00:05:17.045 --> 00:05:18.045
Right? So, um,

96
00:05:18.045 --> 00:05:20.875
training a, a foundational model, we,

97
00:05:20.975 --> 00:05:24.195
we probably don't need to, uh, talk about that, uh, anymore

98
00:05:24.255 --> 00:05:28.035
or dive any deeper, um, uh, that, like you said,

99
00:05:28.155 --> 00:05:31.355
that's reserved for the, the corporations that have millions

100
00:05:31.355 --> 00:05:32.835
of dollars to throw at ai.

101
00:05:33.415 --> 00:05:37.515
Um, so if fine tuning is not really working for you,

102
00:05:37.575 --> 00:05:40.355
you need something more than just getting a particular

103
00:05:40.525 --> 00:05:42.275
style, uh, out of an ai.

104
00:05:42.275 --> 00:05:45.795
You need to, uh, give it, uh, new facts, uh, new

105
00:05:46.305 --> 00:05:47.355
knowledge bases.

106
00:05:47.935 --> 00:05:51.915
Um, we have, uh, retrieval augmented generation,

107
00:05:52.335 --> 00:05:55.235
and we have, um, ontech stuffing.

108
00:05:55.535 --> 00:05:59.125
Uh, so, uh, let's talk about, let's talk about those next,

109
00:05:59.345 --> 00:06:00.525
um, context.

110
00:06:00.565 --> 00:06:02.965
Stuffing i I think is, is pretty easy.

111
00:06:03.145 --> 00:06:05.605
We can, um, we can just define that and then,

112
00:06:05.785 --> 00:06:07.205
and then move past it.

113
00:06:07.585 --> 00:06:11.405
Um, that's really, uh, just taking an entire document

114
00:06:11.545 --> 00:06:14.525
and putting it in your prompt along with the question, uh,

115
00:06:14.525 --> 00:06:16.805
that you have about that document, right?

116
00:06:16.905 --> 00:06:21.005
So if I wanted to know, uh, the names of the characters in,

117
00:06:21.105 --> 00:06:23.205
uh, the Novel War and Peace, right?

118
00:06:23.725 --> 00:06:25.045
I would just attach War

119
00:06:25.045 --> 00:06:26.605
and Peace to my, uh,

120
00:06:27.185 --> 00:06:29.965
to my prompt send the question, it would answer, right?

121
00:06:29.985 --> 00:06:33.445
So, um, let's talk about rag though retrieval,

122
00:06:33.565 --> 00:06:34.725
augmented generation.

123
00:06:35.265 --> 00:06:39.205
Um, how is that different, uh, from, from context stuffing

124
00:06:39.205 --> 00:06:41.045
and, and how's it different than fine tuning?

125
00:06:42.035 --> 00:06:44.405
Sure, yeah. So fine tuning.

126
00:06:44.835 --> 00:06:47.765
What what fine tuning is gonna do is is, like I mentioned,

127
00:06:47.765 --> 00:06:50.085
it's gonna give you the ability to kind of get certain

128
00:06:50.685 --> 00:06:52.885
expected formats back or styles back.

129
00:06:52.915 --> 00:06:55.525
What, what fine tuning is not gonna do is you're not gonna

130
00:06:55.525 --> 00:06:58.645
train it on a whole domain that doesn't already exist within

131
00:06:58.875 --> 00:07:01.165
what, what, what, what whatever's been trained on.

132
00:07:01.165 --> 00:07:02.685
So that's where RAG comes into play,

133
00:07:02.685 --> 00:07:04.365
and that's where context stuffing comes into play.

134
00:07:04.565 --> 00:07:05.885
'cause you're giving it those new inter

135
00:07:05.885 --> 00:07:06.925
information and facts.

136
00:07:07.275 --> 00:07:10.725
Some ways to look at fine tuning is you can give it all the

137
00:07:11.205 --> 00:07:13.845
lyrics to your favorite artist, your musical artist,

138
00:07:14.425 --> 00:07:17.565
and you can then turn around and use that fine tuned model,

139
00:07:17.945 --> 00:07:21.045
and you could have it generate responses, not even songs,

140
00:07:21.065 --> 00:07:23.325
but generate responses in the style of that artist.

141
00:07:23.665 --> 00:07:25.045
But what you can't do is turn around

142
00:07:25.045 --> 00:07:28.285
and ask it about the songs in that artist, um, catalog

143
00:07:28.705 --> 00:07:31.045
or, uh, recite lyrics back or things like that.

144
00:07:31.045 --> 00:07:32.765
That's not what Fine Tune is gonna give you,

145
00:07:33.025 --> 00:07:35.725
but that's something where if you give it context stuffing,

146
00:07:36.065 --> 00:07:38.045
um, if you give it additional information along

147
00:07:38.045 --> 00:07:39.485
with their prompt, you can do that.

148
00:07:39.905 --> 00:07:41.245
And so that, that's what, that's really

149
00:07:41.245 --> 00:07:42.645
where we're gonna find those differences.

150
00:07:42.785 --> 00:07:44.565
Now, if we take your example with the war

151
00:07:44.565 --> 00:07:46.325
and piece, right, with the context stuffing,

152
00:07:46.825 --> 00:07:49.445
you passed in a lot of tokens in order

153
00:07:49.465 --> 00:07:52.085
to get a couple names out of a document, right?

154
00:07:52.385 --> 00:07:56.045
And so what RAG is gonna do retrieval augmented generation.

155
00:07:56.185 --> 00:07:59.125
And so what that's gonna do is instead

156
00:07:59.125 --> 00:08:01.325
of just giving it the entirety of the war

157
00:08:01.325 --> 00:08:02.885
and piece, you're actually gonna have

158
00:08:02.885 --> 00:08:05.205
that already set up in an index database

159
00:08:05.555 --> 00:08:08.285
with all it's gonna be chunked out in a nice little pieces.

160
00:08:08.625 --> 00:08:11.085
And the Rag, what that rag does is it goes

161
00:08:11.085 --> 00:08:12.525
and identifies the chunks.

162
00:08:12.525 --> 00:08:14.925
It's gonna go and find the ones that are pertinent

163
00:08:14.925 --> 00:08:16.005
to the question in the hand,

164
00:08:16.265 --> 00:08:17.765
and then only attach those

165
00:08:17.785 --> 00:08:19.125
to the prompt that you're sending up.

166
00:08:19.145 --> 00:08:21.365
So it reduces, it makes things a lot more efficient

167
00:08:21.435 --> 00:08:23.445
because when you get into large tokens,

168
00:08:23.445 --> 00:08:24.925
even though you have some

169
00:08:24.925 --> 00:08:26.405
of the things out there like Gemini

170
00:08:26.405 --> 00:08:28.605
that are boasting like the millions of tokens,

171
00:08:28.605 --> 00:08:31.805
context windows and things, you're still not, at least,

172
00:08:31.865 --> 00:08:33.925
at least in the current stages of things there,

173
00:08:33.925 --> 00:08:35.285
you're still gonna get a loss

174
00:08:35.285 --> 00:08:36.885
of quality the more you put in there.

175
00:08:37.065 --> 00:08:39.325
And there's also the, uh, things you wanna think about

176
00:08:39.325 --> 00:08:40.685
with cost and efficiency there too.

177
00:08:40.685 --> 00:08:42.965
So Rag Rag really helps with that.

178
00:08:43.265 --> 00:08:45.005
Now, maybe in a couple years down the road,

179
00:08:45.245 --> 00:08:46.845
rag is gonna become less and less relevant

180
00:08:46.845 --> 00:08:48.485
as these model models get more powerful

181
00:08:48.485 --> 00:08:50.405
and you can just throw everything powerful and cheaper.

182
00:08:50.705 --> 00:08:53.325
But right now, rag Rag is really giving you that benefit,

183
00:08:54.505 --> 00:08:55.505
Right? And, and you

184
00:08:55.505 --> 00:08:58.765
do pay for the tokens that you supply.

185
00:08:59.105 --> 00:09:03.685
So, um, putting an entire novel into a prompt rather than

186
00:09:03.685 --> 00:09:06.285
just the relevant, uh, information, if you're doing

187
00:09:06.285 --> 00:09:08.565
that over and over again, it, it can definitely get,

188
00:09:08.745 --> 00:09:09.965
uh, expensive.

189
00:09:10.745 --> 00:09:14.845
Um, so let's dive into fine tuning

190
00:09:14.985 --> 00:09:17.685
and then let's dive into rag, uh, a bit later.

191
00:09:18.305 --> 00:09:21.445
Um, so for, for fine tuning, um,

192
00:09:22.505 --> 00:09:25.455
where might this work really well, uh, and

193
00:09:25.515 --> 00:09:27.495
and what are some common misconceptions

194
00:09:27.495 --> 00:09:28.855
about, about fine tuning?

195
00:09:29.415 --> 00:09:34.335
I, I hear, um, when I read about, um,

196
00:09:35.275 --> 00:09:37.135
you know, models being customized

197
00:09:37.315 --> 00:09:41.575
or, um, AI being customized for, uh, individuals

198
00:09:41.575 --> 00:09:42.655
or for corporations,

199
00:09:43.205 --> 00:09:45.255
fine tuning is always what you hear about.

200
00:09:45.325 --> 00:09:46.775
It's, it's the buzzword for

201
00:09:46.995 --> 00:09:48.735
how do you get AI to work for you.

202
00:09:49.395 --> 00:09:52.375
So, um, being that, uh, you

203
00:09:52.375 --> 00:09:55.055
and I have, uh, uh, tried and,

204
00:09:55.075 --> 00:09:56.975
and not done very well at, at, uh,

205
00:09:56.975 --> 00:10:00.335
getting information we want as a result of fine tuning, uh,

206
00:10:00.405 --> 00:10:03.015
what are some misconceptions about how fine tuning works

207
00:10:03.075 --> 00:10:05.535
and, and what, what it can really produce for you?

208
00:10:06.475 --> 00:10:08.935
So, fine tuning is really gonna come in

209
00:10:08.935 --> 00:10:09.975
handy in scenarios.

210
00:10:09.975 --> 00:10:13.255
Like if you, if you find yourself constantly providing one

211
00:10:13.255 --> 00:10:16.575
or multi-shot prompts, um, uh, a or one

212
00:10:16.575 --> 00:10:18.495
or multi-shot context into your prompts,

213
00:10:18.725 --> 00:10:21.215
fine tuning is going, probably gonna help you there.

214
00:10:21.315 --> 00:10:23.055
Now, what's a one shot or a multi-shot?

215
00:10:23.155 --> 00:10:25.775
So think about it like if I, if I have a question, right?

216
00:10:25.775 --> 00:10:28.375
And I wanna say, just generate an item, alright,

217
00:10:28.395 --> 00:10:31.375
gen generate me an item about any kind of domain, alright?

218
00:10:31.755 --> 00:10:33.255
And so when it generates that item,

219
00:10:34.085 --> 00:10:35.615
it's gonna give you a random format

220
00:10:35.675 --> 00:10:37.135
unless you give things more specific,

221
00:10:37.195 --> 00:10:38.775
you might even say generate a multiple

222
00:10:38.775 --> 00:10:39.815
choice item, all right?

223
00:10:39.915 --> 00:10:42.335
And so maybe it'll give you your, uh, your options

224
00:10:42.355 --> 00:10:44.655
as 1, 2, 3, 4, or A, B, C, D, right?

225
00:10:44.995 --> 00:10:47.895
And so in order to kind of tune that, what you do is you,

226
00:10:47.895 --> 00:10:49.735
you provided one or more multi shots

227
00:10:49.835 --> 00:10:51.415
or shots or multi shots, right?

228
00:10:51.415 --> 00:10:53.295
And so each shot is gonna be a context,

229
00:10:53.435 --> 00:10:54.695
and what you're doing is, okay,

230
00:10:55.775 --> 00:10:57.775
generate me a multiple choice question here.

231
00:10:57.795 --> 00:10:59.695
Here's an example of a multiple choice question.

232
00:10:59.695 --> 00:11:01.775
Here's another example of a multi-choice question, right?

233
00:11:01.775 --> 00:11:03.575
So you're providing an additional context,

234
00:11:04.075 --> 00:11:05.255
and then that'll work great.

235
00:11:05.315 --> 00:11:06.975
The, the generative model will see it

236
00:11:06.995 --> 00:11:08.135
and it'll say, oh, it wants it,

237
00:11:08.135 --> 00:11:09.415
you want this back in this format,

238
00:11:09.675 --> 00:11:10.935
let me go ahead and address that.

239
00:11:11.275 --> 00:11:13.375
But if you're just doing it repetitively over

240
00:11:13.375 --> 00:11:16.375
and over again, well now you're using your tokens in order

241
00:11:16.395 --> 00:11:17.535
to send it to the model.

242
00:11:17.755 --> 00:11:20.055
And, um, it's not as, uh, it's not as efficient and,

243
00:11:20.355 --> 00:11:22.295
and you're gonna, you're probably not gonna get

244
00:11:22.295 --> 00:11:23.495
that much latency, but you're,

245
00:11:23.495 --> 00:11:25.175
you might get a little latency there too.

246
00:11:25.235 --> 00:11:27.935
And so where fine tuning can come into play is you can

247
00:11:28.135 --> 00:11:31.175
actually tune the generative model in, in order, um,

248
00:11:31.275 --> 00:11:33.135
and give it some examples of your items.

249
00:11:33.195 --> 00:11:35.015
And so that way the next time you do ask it

250
00:11:35.015 --> 00:11:37.335
to generate an item, the idea is that you don't have

251
00:11:37.335 --> 00:11:39.775
to provide it all of those contexts in addition

252
00:11:39.775 --> 00:11:41.215
to your prompt, you can just act,

253
00:11:41.435 --> 00:11:43.255
ask the prompt and then get that back,

254
00:11:44.485 --> 00:11:45.485
Right? So you said,

255
00:11:45.485 --> 00:11:47.395
um, when we were talking about prepping

256
00:11:47.395 --> 00:11:50.115
for this webinar yesterday, uh, you said something

257
00:11:50.115 --> 00:11:51.955
that was really interesting that

258
00:11:52.545 --> 00:11:56.205
fine tuning a model is really like the ability

259
00:11:56.345 --> 00:12:00.325
to give it a thousand examples every time without having

260
00:12:00.325 --> 00:12:01.565
to supply them in the prompt.

261
00:12:01.825 --> 00:12:04.845
So you go through the, the process of, of doing

262
00:12:04.845 --> 00:12:06.965
that fine tuning once with a thousand examples,

263
00:12:07.585 --> 00:12:11.525
and from then on when you query that model,

264
00:12:12.065 --> 00:12:14.365
it knows about those thousand examples and it,

265
00:12:14.365 --> 00:12:15.685
and it will use those in,

266
00:12:15.685 --> 00:12:17.765
in generating the proper response. Is that right?

267
00:12:18.105 --> 00:12:19.405
Yep, yep.

268
00:12:20.235 --> 00:12:24.905
Okay, perfect. Um, how about as far as,

269
00:12:25.245 --> 00:12:29.345
uh, using it to, to get additional data, uh,

270
00:12:30.015 --> 00:12:31.145
into the model?

271
00:12:31.465 --> 00:12:33.265
I, I think you, you touched on this,

272
00:12:33.405 --> 00:12:37.625
but, um, let's be, uh, a little bit more, uh,

273
00:12:38.865 --> 00:12:41.505
explicit about, you know, what we've seen as, as far

274
00:12:41.505 --> 00:12:46.265
as using fine tuning to add additional data to,

275
00:12:46.765 --> 00:12:48.385
uh, to a model,

276
00:12:49.725 --> 00:12:50.725
Right? So,

277
00:12:50.725 --> 00:12:54.075
well, if, if we try to use it for, let's say, let's,

278
00:12:54.075 --> 00:12:56.155
let's just stick with item generation, right?

279
00:12:56.455 --> 00:12:59.555
And so maybe I wanna feed it a whole bunch of, uh, cases,

280
00:13:00.095 --> 00:13:01.755
um, that I, that I have, right?

281
00:13:01.815 --> 00:13:03.555
So let's just say a whole bunch of medical cases,

282
00:13:03.615 --> 00:13:05.395
and I wanted to generate items about these

283
00:13:05.395 --> 00:13:06.555
medical cases, okay?

284
00:13:06.935 --> 00:13:09.035
Uh, let's just keep it simple. Multiple choice items.

285
00:13:09.115 --> 00:13:10.475
I wanted to generate these items.

286
00:13:10.985 --> 00:13:14.235
Well, what it's going to do well is it's going to,

287
00:13:14.615 --> 00:13:17.395
if I fine tune it on a whole bunch of my medical cases,

288
00:13:17.585 --> 00:13:19.915
what it's gonna do well is it's gonna recognize the

289
00:13:19.915 --> 00:13:21.955
terminologies and the way I'm using certain words

290
00:13:21.975 --> 00:13:23.635
and the styles that I'm putting things together

291
00:13:23.975 --> 00:13:26.275
and kind of the structure of, of sentences.

292
00:13:26.415 --> 00:13:28.555
And it will gimme items that kind of match that.

293
00:13:28.905 --> 00:13:31.675
What it's not going to do, though, it is not going

294
00:13:31.675 --> 00:13:33.595
to be able to reference a certain case

295
00:13:33.855 --> 00:13:36.275
and, uh, ask me a specific item about that.

296
00:13:36.535 --> 00:13:38.755
For that you're going to wanna look more at like a rag

297
00:13:38.995 --> 00:13:40.715
approach, um, that we mentioned earlier, a context,

298
00:13:40.715 --> 00:13:41.755
something, something like that.

299
00:13:42.175 --> 00:13:44.035
So, so that, that's where it's not going

300
00:13:44.175 --> 00:13:45.635
to kind of do too well.

301
00:13:45.635 --> 00:13:48.515
It's, it's gonna give you more of that style, that syntax.

302
00:13:48.515 --> 00:13:51.555
Another, another example that comes to mind is, um,

303
00:13:51.585 --> 00:13:53.555
this was more in the early days, I don't hit this,

304
00:13:53.755 --> 00:13:55.995
I don't hit this too much, but back when I was first playing

305
00:13:55.995 --> 00:13:59.275
with the GPT-3 three model, I wa that's

306
00:13:59.275 --> 00:14:00.435
before they even had function calling,

307
00:14:00.435 --> 00:14:03.235
where you can get a more structured JSON approach back.

308
00:14:03.415 --> 00:14:06.435
So what I was doing was I was trying to, um,

309
00:14:06.785 --> 00:14:09.075
have the model recognize that the user wanted

310
00:14:09.075 --> 00:14:12.955
to do an action that I needed to do a function on, right?

311
00:14:12.975 --> 00:14:14.595
So, so now they have this all baked in

312
00:14:14.595 --> 00:14:16.475
and it, it just keeps getting better and better.

313
00:14:16.575 --> 00:14:20.715
But the way, what I was facing is every time that I asked it

314
00:14:20.715 --> 00:14:25.275
to, um, send me back A-J-S-O-N structure, so I really wanted

315
00:14:25.275 --> 00:14:26.755
to back a certain syntax, right?

316
00:14:26.835 --> 00:14:28.275
I wanted key value pairs

317
00:14:28.375 --> 00:14:29.835
and I wanted them in a certain order,

318
00:14:29.895 --> 00:14:32.035
and I wanted it to match the JSON structure.

319
00:14:32.135 --> 00:14:33.995
So little curly brackets at the beginning and,

320
00:14:33.995 --> 00:14:35.635
and all kind of set up with the quotes everywhere.

321
00:14:35.825 --> 00:14:38.475
What I was finding, in certain cases when I asked the

322
00:14:38.635 --> 00:14:41.595
question a certain way, it would gimme back a structure,

323
00:14:41.695 --> 00:14:43.235
but it wasn't syntactically correct,

324
00:14:43.285 --> 00:14:45.915
which just caused me a whole bunch of problems downstream.

325
00:14:46.415 --> 00:14:48.555
So fine tuning can help me there,

326
00:14:48.555 --> 00:14:50.995
because now again, you're not gonna have the problem,

327
00:14:51.030 --> 00:14:52.455
problem with JSON these days with the models.

328
00:14:52.555 --> 00:14:55.735
But back then fine tuning, what that helps you do is say,

329
00:14:55.735 --> 00:14:57.855
Hey, when I ask you a question, I want you

330
00:14:57.855 --> 00:14:59.375
to return the prompt in this way, right?

331
00:14:59.375 --> 00:15:01.055
You're giving it, you're giving it a sample

332
00:15:01.315 --> 00:15:03.255
and then a response for it to train on.

333
00:15:03.355 --> 00:15:05.375
And that helps you kind of tighten up those edge cases

334
00:15:05.375 --> 00:15:07.055
where maybe it wasn't giving you that back.

335
00:15:07.055 --> 00:15:09.615
So, so like styles and those types of things.

336
00:15:09.715 --> 00:15:11.415
And so, so those are ways where it can kind

337
00:15:11.415 --> 00:15:12.855
of help you there and not help you.

338
00:15:13.525 --> 00:15:17.935
Perfect. Um, let's get into a demo of, uh, fine tuning.

339
00:15:18.515 --> 00:15:21.895
Um, so just to, uh, set the stage here, uh, what,

340
00:15:21.895 --> 00:15:24.735
what we're going to do is we're gonna ask, uh, AI

341
00:15:24.755 --> 00:15:27.735
to generate us a thousand test questions.

342
00:15:28.385 --> 00:15:30.895
We're then going to review those test questions

343
00:15:30.915 --> 00:15:32.255
programmatically, uh,

344
00:15:32.255 --> 00:15:33.615
and we're gonna discard those

345
00:15:33.615 --> 00:15:35.535
that we don't feel are long enough.

346
00:15:36.115 --> 00:15:38.935
Uh, once we get, uh, all of the test questions

347
00:15:38.935 --> 00:15:41.255
that we do think are long enough, we're going

348
00:15:41.255 --> 00:15:42.455
to generate a training file.

349
00:15:42.665 --> 00:15:46.375
We're going to use that to, uh, fine tune and model,

350
00:15:47.325 --> 00:15:50.705
and then we're going to test that fine tuned model again,

351
00:15:50.925 --> 00:15:53.105
ask for another, uh, thousand questions

352
00:15:53.525 --> 00:15:55.025
and we'll see, uh, whether

353
00:15:55.025 --> 00:15:56.025
or not the length

354
00:15:56.325 --> 00:15:59.905
of those questions is is now longer since we have examples

355
00:16:00.055 --> 00:16:02.465
that we're providing of longer questions.

356
00:16:03.965 --> 00:16:07.025
The interesting thing here is that we are not going

357
00:16:07.025 --> 00:16:11.985
to instruct the AI that we want our questions to be longer.

358
00:16:12.455 --> 00:16:15.065
Alls we're going to do is ask for a thousand questions,

359
00:16:15.495 --> 00:16:17.785
discard the ones that are not of a certain length,

360
00:16:18.125 --> 00:16:20.065
and we're gonna train on, on the longer ones.

361
00:16:20.065 --> 00:16:23.385
And, and when we generate questions, again, we should see

362
00:16:23.695 --> 00:16:27.305
that we're now getting, uh, longer test questions back.

363
00:16:27.485 --> 00:16:30.625
So, uh, Chris, I'll, I'll turn it over to you for the demo.

364
00:16:31.295 --> 00:16:32.945
Okay? Sure. All right.

365
00:16:32.945 --> 00:16:35.705
So we're gonna start out, um, by first getting our,

366
00:16:35.765 --> 00:16:36.825
our base set of data.

367
00:16:37.005 --> 00:16:38.905
All right? We're gonna, we wanna, like Kyle said, we want

368
00:16:38.905 --> 00:16:40.785
to prompt GPT for a thousand questions.

369
00:16:40.945 --> 00:16:42.105
'cause I'm not gonna sit here and

370
00:16:42.105 --> 00:16:43.505
type out a thousand questions.

371
00:16:43.755 --> 00:16:45.505
We're not gonna give it any instructions

372
00:16:45.605 --> 00:16:46.945
or anything along those lines.

373
00:16:47.085 --> 00:16:49.305
So what we're doing here is, uh,

374
00:16:49.305 --> 00:16:50.945
we're just gonna do something pretty simple.

375
00:16:50.945 --> 00:16:52.905
We're gonna call the chat completions endpoint.

376
00:16:53.215 --> 00:16:55.945
I've generated a, uh, a list here of a couple topics

377
00:16:55.945 --> 00:16:57.425
that I want to generate questions on.

378
00:16:57.885 --> 00:16:59.385
And we're going to go ahead

379
00:16:59.485 --> 00:17:03.265
and generate these questions 10 at a time using the GPT-3

380
00:17:03.265 --> 00:17:04.385
0.5 turbo model.

381
00:17:04.385 --> 00:17:06.865
Refreshed is November 6th, alright?

382
00:17:07.285 --> 00:17:09.065
And, uh, it's just a little max token.

383
00:17:09.065 --> 00:17:12.105
So this is a base, if, if you've done any type of API call,

384
00:17:12.105 --> 00:17:14.585
this is very vanilla, nothing crazy going on here.

385
00:17:14.585 --> 00:17:15.785
And so we're gonna generate a question,

386
00:17:16.085 --> 00:17:18.745
and so we're gonna generate 10 questions at a time.

387
00:17:18.745 --> 00:17:20.145
That's what that end value is.

388
00:17:20.145 --> 00:17:22.745
So every time I send a request to the GPT endpoint,

389
00:17:23.045 --> 00:17:25.625
I'm gonna say, give me 10 questions using this model,

390
00:17:25.965 --> 00:17:28.065
and then I'm going to just do this 10 times

391
00:17:28.165 --> 00:17:30.545
and I'm going to end up with my end result there.

392
00:17:30.545 --> 00:17:32.425
We're gonna end up with a file that we're gonna write out.

393
00:17:33.085 --> 00:17:34.505
All right? So I'm gonna go ahead

394
00:17:34.505 --> 00:17:35.665
and just kind of start this off

395
00:17:36.405 --> 00:17:38.985
and we can see that it's gonna generate these questions.

396
00:17:40.655 --> 00:17:42.115
All right? So here's my little prompt.

397
00:17:42.455 --> 00:17:45.595
I'm gonna say, okay, I want a thousand, um, questions here.

398
00:17:45.805 --> 00:17:47.435
Write the file out to my desktop.

399
00:17:47.655 --> 00:17:49.165
And so for the purposes of the demo,

400
00:17:49.255 --> 00:17:51.245
we're first gonna write it out to a CSV, so

401
00:17:51.245 --> 00:17:52.845
that way we can compare and look at these things.

402
00:17:52.865 --> 00:17:55.005
But there's no reason I couldn't just do this all in one

403
00:17:55.005 --> 00:17:56.325
step with my JSON L file.

404
00:17:56.665 --> 00:17:58.525
All right? So I'm gonna kick this off,

405
00:17:58.545 --> 00:18:00.405
and so just to see that it's actually going

406
00:18:00.465 --> 00:18:02.565
and it's doing in, in real time, we'll just look at it

407
00:18:02.565 --> 00:18:04.005
with a little proxy debugger here

408
00:18:04.665 --> 00:18:08.445
and just see that it is making calls out to open AI

409
00:18:09.235 --> 00:18:10.925
windows is trying to snap me.

410
00:18:11.265 --> 00:18:12.405
All right, nonstop.

411
00:18:14.125 --> 00:18:15.705
All right, so we're just gonna scale that down.

412
00:18:15.805 --> 00:18:17.745
All right, so you can see right here I'm capturing,

413
00:18:17.745 --> 00:18:19.265
so I'm making my API calls.

414
00:18:19.605 --> 00:18:20.985
So within this API call,

415
00:18:21.045 --> 00:18:23.225
you could see it's just generated a question about history,

416
00:18:23.525 --> 00:18:25.985
and then it's coming back with all of these responses.

417
00:18:25.985 --> 00:18:28.505
And so it's giving me 10 questions every single time.

418
00:18:29.545 --> 00:18:31.205
All right? So we're just kind of capturing those

419
00:18:31.545 --> 00:18:35.405
and we're logging those into a CSV file that we're going

420
00:18:35.405 --> 00:18:36.525
to have on the desktop.

421
00:18:36.785 --> 00:18:38.805
And then from that CSV file, what we're going

422
00:18:38.805 --> 00:18:40.885
to do is we're going to then generate what's called

423
00:18:41.005 --> 00:18:42.045
a JL file.

424
00:18:42.345 --> 00:18:45.685
All right? So I have an example of the JL file here.

425
00:18:46.025 --> 00:18:47.965
And so the way it works with the, uh,

426
00:18:47.965 --> 00:18:51.565
more modern GPT miles is it's really using a chat structure.

427
00:18:51.945 --> 00:18:53.765
Now, you can, you, you can either do a single

428
00:18:53.795 --> 00:18:55.005
turn or a multi turn.

429
00:18:55.105 --> 00:18:57.445
So here you're gonna see that we have a multi turn example.

430
00:18:57.985 --> 00:19:00.405
So this is just a series of all the questions

431
00:19:00.405 --> 00:19:02.205
that we're going to pass up to it.

432
00:19:02.205 --> 00:19:03.125
So while it's running, and then

433
00:19:03.125 --> 00:19:04.605
I'll generate that Jason l file.

434
00:19:04.605 --> 00:19:07.165
But what we do is, so we got our thousand questions,

435
00:19:07.165 --> 00:19:09.485
which we'll look at when it's done processing, um,

436
00:19:09.485 --> 00:19:10.845
that's gonna take probably another minute.

437
00:19:11.185 --> 00:19:13.125
But the idea here is then we want to take

438
00:19:13.125 --> 00:19:16.005
that thousand questions and we wanna remove everything.

439
00:19:16.065 --> 00:19:17.805
So just for the purposes of this example,

440
00:19:17.805 --> 00:19:19.325
we're gonna move everything

441
00:19:19.325 --> 00:19:22.485
that's less than 105 characters long

442
00:19:22.485 --> 00:19:25.285
because we want to see if we can train the model to,

443
00:19:25.505 --> 00:19:28.285
to get the style of generating questions

444
00:19:28.285 --> 00:19:31.325
that are more than 105 without us having to instruct it

445
00:19:31.325 --> 00:19:32.845
or do anything along those lines.

446
00:19:33.225 --> 00:19:34.805
So out of those thousand questions,

447
00:19:34.975 --> 00:19:36.125
we're gonna weed out everything

448
00:19:36.125 --> 00:19:37.605
that's less than 105 characters.

449
00:19:37.705 --> 00:19:40.405
And then we're going to generate our training file.

450
00:19:40.425 --> 00:19:42.285
In this case, it's called a JSON l file.

451
00:19:42.285 --> 00:19:43.645
So it's in the JSON structure,

452
00:19:44.065 --> 00:19:46.405
but it's using a, a strict message format.

453
00:19:46.585 --> 00:19:48.525
So you can see right here I have a message array.

454
00:19:48.945 --> 00:19:51.925
And so in there, I, I'm, I send it a message

455
00:19:52.025 --> 00:19:53.125
as a system instruction.

456
00:19:53.125 --> 00:19:54.565
So I, I have that in my instruction,

457
00:19:54.865 --> 00:19:56.885
and then I have a user message that says,

458
00:19:57.035 --> 00:19:58.565
okay, here's my question.

459
00:19:59.265 --> 00:20:03.005
And then I have my, um, uh, I say, generate me a question,

460
00:20:03.025 --> 00:20:06.045
and then I have, as the assistant generating a question

461
00:20:06.345 --> 00:20:07.765
that's a thousand characters.

462
00:20:08.615 --> 00:20:10.395
All right? So that, that's my JSNL file.

463
00:20:10.455 --> 00:20:12.635
Now what I, um, so actually I'll hold

464
00:20:12.635 --> 00:20:14.355
until we get into the next piece there.

465
00:20:14.375 --> 00:20:16.595
All right, so now my file's created successfully.

466
00:20:16.935 --> 00:20:20.435
And so now I can go ahead and create a JL file from that.

467
00:20:20.495 --> 00:20:22.835
So if we go and we look at our CSV file here,

468
00:20:22.835 --> 00:20:27.665
that was generated, so I dropped it right on my desktop.

469
00:20:27.725 --> 00:20:31.725
So if I open that up, we'll actually see now we have

470
00:20:32.855 --> 00:20:37.105
a CSV file filled

471
00:20:37.105 --> 00:20:38.865
with 1000 questions.

472
00:20:39.175 --> 00:20:41.545
Alright? Some of these are really short, some

473
00:20:41.545 --> 00:20:42.825
of 'em are on the longer side.

474
00:20:43.045 --> 00:20:44.745
All right, but you see we have a thousand.

475
00:20:44.805 --> 00:20:47.525
So now what I'm going to do is now I'm going

476
00:20:47.525 --> 00:20:48.525
to take this document

477
00:20:48.945 --> 00:20:52.285
and I'm going to turn it into my Jason l file here.

478
00:20:52.545 --> 00:20:55.285
And by doing that, I should end up with something closer

479
00:20:55.425 --> 00:20:57.965
to 200 or some kind of subset of that.

480
00:20:57.985 --> 00:20:59.405
I'm not gonna have a thousand, I'm going

481
00:20:59.405 --> 00:21:00.485
to only weed out the ones

482
00:21:00.485 --> 00:21:02.125
that are longer than 105 characters

483
00:21:02.125 --> 00:21:04.405
because that's the behavior that we're going for here.

484
00:21:05.545 --> 00:21:08.475
All right? Okay.

485
00:21:08.475 --> 00:21:10.915
So I'm going to generate my JSON l file,

486
00:21:11.535 --> 00:21:13.435
and then, so the JS l file is going

487
00:21:13.435 --> 00:21:15.115
to look exactly like we had it.

488
00:21:15.575 --> 00:21:19.775
So that should be done now. Yep.

489
00:21:23.475 --> 00:21:25.175
All right. So you see we have a subset here

490
00:21:25.175 --> 00:21:26.575
of 530 questions.

491
00:21:26.995 --> 00:21:30.375
All right? So now what I can do is I can go into open ai.

492
00:21:30.375 --> 00:21:32.015
So I can do this through an API,

493
00:21:32.195 --> 00:21:34.535
but to make things a little more, um, user-friendly here,

494
00:21:34.555 --> 00:21:36.615
I'm just going to go through their fine tuning playground

495
00:21:36.615 --> 00:21:38.255
and I'm gonna start up a fine tuning job.

496
00:21:38.565 --> 00:21:40.575
Alright? So again, this could all be done

497
00:21:40.575 --> 00:21:42.695
through a system services API calls,

498
00:21:42.715 --> 00:21:44.295
but for the purposes of just demonstrating,

499
00:21:44.295 --> 00:21:46.295
I'm just doing this through their, uh, GUI here,

500
00:21:46.295 --> 00:21:47.455
their playground that they have.

501
00:21:47.915 --> 00:21:49.775
So I'm gonna start up a new fine tuning job.

502
00:21:49.775 --> 00:21:51.335
So I'm in their fine tuning playground.

503
00:21:51.715 --> 00:21:53.695
So what I wanna do is select my base model.

504
00:21:53.715 --> 00:21:55.615
So you see they have a couple models to choose from.

505
00:21:55.635 --> 00:21:57.415
So GPT-4 0.0 you need

506
00:21:57.415 --> 00:21:59.135
to request access for at this point in time.

507
00:21:59.195 --> 00:22:01.415
So I'm just going to do a GPT turbo

508
00:22:01.595 --> 00:22:03.255
1106 job with that model.

509
00:22:03.255 --> 00:22:06.135
Then you upload your JSL file, your training document.

510
00:22:06.635 --> 00:22:08.935
So I'm going to go ahead and grab that,

511
00:22:08.955 --> 00:22:10.735
and I'm just going to drop that in here.

512
00:22:12.355 --> 00:22:14.015
All right. And then validation data.

513
00:22:14.115 --> 00:22:16.815
So the validation data, what I could do is I'm,

514
00:22:16.815 --> 00:22:18.375
I'm not gonna do it in this demo, um,

515
00:22:18.615 --> 00:22:20.495
'cause I've, I've already run it on the backend anyway,

516
00:22:20.515 --> 00:22:24.175
but so what the validation data is, you can take a subset

517
00:22:24.395 --> 00:22:25.455
of your training data

518
00:22:25.835 --> 00:22:28.415
and then provide that as a validation file.

519
00:22:28.435 --> 00:22:30.695
And so what that'll do is every time

520
00:22:30.695 --> 00:22:33.135
that the job finishes running, its first pass through

521
00:22:33.135 --> 00:22:35.415
through your data, it'll run a validation check.

522
00:22:35.475 --> 00:22:36.935
And so it'll take those samples

523
00:22:37.315 --> 00:22:39.175
and so it'll run through the validation.

524
00:22:39.175 --> 00:22:41.745
And what that means is like, okay, I'm being,

525
00:22:41.805 --> 00:22:43.585
I'm gonna generate, in this case, it's going

526
00:22:43.585 --> 00:22:44.825
to generate a question

527
00:22:44.925 --> 00:22:47.105
and then it's gonna check that file that I gave it

528
00:22:47.125 --> 00:22:49.665
and see if it's in line with that style

529
00:22:49.885 --> 00:22:51.305
and everything that I generated.

530
00:22:51.305 --> 00:22:52.545
And if it's not, it will adjust.

531
00:22:52.605 --> 00:22:54.185
Its, its losses and those type of things.

532
00:22:54.365 --> 00:22:55.545
Its weights accordingly. Try

533
00:22:55.545 --> 00:22:57.105
to get it more closer and then run it again.

534
00:22:57.445 --> 00:22:59.425
You don't have to use it. It will, it will still,

535
00:22:59.525 --> 00:23:00.905
uh, complete the job without it.

536
00:23:01.325 --> 00:23:04.225
The suffix here, this is just something so to let you know,

537
00:23:04.445 --> 00:23:06.585
um, that this is your, your model here.

538
00:23:06.585 --> 00:23:11.385
So I'm gonna say greater than, um, 105, uh, demo purposes.

539
00:23:12.415 --> 00:23:15.395
All right? And then down here we have some hyper parameters

540
00:23:15.395 --> 00:23:17.795
such as like batch size, learning rate, multiplier, number

541
00:23:17.795 --> 00:23:19.755
of e epochs of the type, like

542
00:23:19.755 --> 00:23:22.035
how many times you're gonna run through these things, try

543
00:23:22.035 --> 00:23:24.115
to limit it, how frequently it dust itself.

544
00:23:24.375 --> 00:23:26.955
So these are the things where really, if you're going

545
00:23:26.955 --> 00:23:29.395
to do something like this, these are the things you want

546
00:23:29.395 --> 00:23:31.555
to make sure that you understand and you know how to use.

547
00:23:31.695 --> 00:23:33.195
I'm not gonna get down into the weeds of this

548
00:23:33.435 --> 00:23:34.555
'cause we only have a little bit of time,

549
00:23:34.775 --> 00:23:36.475
but these are the type of things that, um,

550
00:23:36.775 --> 00:23:38.875
you really wanna bring that knowledge when you're going

551
00:23:38.875 --> 00:23:40.515
to go and try to fine tune something.

552
00:23:40.675 --> 00:23:42.675
'cause they can make a difference with your output.

553
00:23:42.735 --> 00:23:44.595
So for right now though, I'm just gonna leave everything

554
00:23:44.595 --> 00:23:48.085
as auto and I'm gonna create, so once I create this job,

555
00:23:48.085 --> 00:23:50.485
what's gonna happen is it's going to go ahead

556
00:23:50.485 --> 00:23:53.605
and set everything off if it is gonna validate my JS file

557
00:23:53.985 --> 00:23:55.405
and make sure that everything's in place.

558
00:23:55.425 --> 00:23:57.685
And if it is, it's gonna start the job. And as

559
00:24:03.545 --> 00:24:05.385
a result here though, is that you're gonna end up

560
00:24:05.975 --> 00:24:08.045
with a new fine tuned model.

561
00:24:15.405 --> 00:24:17.385
We can see that the job ran successfully.

562
00:24:17.945 --> 00:24:19.545
I I actually did two jobs yesterday.

563
00:24:19.585 --> 00:24:21.825
I ran one where I told it to do 10 epochs,

564
00:24:21.825 --> 00:24:24.025
and then the first one I did it three epochs.

565
00:24:24.025 --> 00:24:25.625
So three pass throughs and 10 pass throughs.

566
00:24:25.745 --> 00:24:26.785
'cause I wanted to see if there was

567
00:24:26.785 --> 00:24:27.825
a difference in the value there.

568
00:24:28.285 --> 00:24:29.865
And so what it does, yeah,

569
00:24:30.265 --> 00:24:33.945
I just wanna, um, intervene real quick.

570
00:24:34.225 --> 00:24:35.825
I think you keep freezing a little bit,

571
00:24:36.045 --> 00:24:38.705
so I don't know if we missed anything too important there.

572
00:24:38.925 --> 00:24:40.705
Oh, uh, 'cause my, uh, yeah, So

573
00:24:41.085 --> 00:24:44.065
My VPN just went out Just, just really quickly.

574
00:24:44.325 --> 00:24:48.785
Um, it was just about, um, that we're, uh, using a,

575
00:24:49.565 --> 00:24:52.945
uh, a model that we trained yesterday, uh,

576
00:24:52.965 --> 00:24:56.265
and that we upped the epochs a bit so that we could get, uh,

577
00:24:56.285 --> 00:24:59.545
better results based on our, uh, unique scenario.

578
00:25:00.395 --> 00:25:03.385
Chris, I, you, you look better, uh, since yeah,

579
00:25:03.395 --> 00:25:04.545
since Amanda it,

580
00:25:04.995 --> 00:25:05.995
So go ahead. It was the VPN.

581
00:25:05.995 --> 00:25:08.505
All right. All right.

582
00:25:08.505 --> 00:25:10.225
Okay, so I'm just gonna stay off the VPN.

583
00:25:10.335 --> 00:25:13.775
Okay, so, um, so you can see right here, so I,

584
00:25:13.775 --> 00:25:14.855
I've kicked off my job.

585
00:25:15.095 --> 00:25:16.655
I, I adjusted the fine tuning parameter.

586
00:25:16.655 --> 00:25:18.575
So again, so if any, so I'll start again.

587
00:25:18.635 --> 00:25:20.655
So I, I chose my base model.

588
00:25:20.965 --> 00:25:22.095
I'll just run through real quick.

589
00:25:22.415 --> 00:25:24.535
I uploaded my training document right here.

590
00:25:25.305 --> 00:25:27.485
Um, there's my validation that I spoke about.

591
00:25:28.435 --> 00:25:30.835
I, I can name it, I can add in a little, uh, character,

592
00:25:30.995 --> 00:25:31.995
a little string that lets me

593
00:25:31.995 --> 00:25:33.035
know that this is gonna be on my model.

594
00:25:33.055 --> 00:25:34.675
And then here are the hyper parameters down here.

595
00:25:34.675 --> 00:25:37.035
Then you create the job. So once you do those things,

596
00:25:37.055 --> 00:25:38.755
the job gets off and it starts running.

597
00:25:38.855 --> 00:25:40.275
It checks to make sure everything's good.

598
00:25:40.495 --> 00:25:42.155
And if it is, and then it starts running.

599
00:25:42.855 --> 00:25:45.155
So we could see right here, the jobs that I ran yesterday,

600
00:25:45.155 --> 00:25:47.075
again, I ran one for 10 epochs

601
00:25:47.075 --> 00:25:49.315
and one for, uh, three epochs just to kind

602
00:25:49.315 --> 00:25:52.195
of get a different, um, see, see what would happen.

603
00:25:52.655 --> 00:25:55.555
And so it used the GPT-3 0.5 turbo model,

604
00:25:55.855 --> 00:25:58.835
and then the GPT, uh, this was my output and model.

605
00:25:58.855 --> 00:26:00.075
So this is my fine tuned model.

606
00:26:00.135 --> 00:26:01.475
You can see open AI always starts

607
00:26:01.475 --> 00:26:03.195
with the ft, the base model.

608
00:26:03.695 --> 00:26:05.955
Um, and then it adds in your pro your, uh,

609
00:26:06.145 --> 00:26:07.395
project that you're using.

610
00:26:07.455 --> 00:26:08.875
So I'm using ITS project

611
00:26:09.175 --> 00:26:11.955
and then, uh, my little, uh, suffix that I had.

612
00:26:11.955 --> 00:26:15.075
And then at a, uh, then, uh, uh, an identifier at the end.

613
00:26:15.815 --> 00:26:17.275
So now I'm not gonna sit here

614
00:26:17.275 --> 00:26:18.395
and wait for this job to finish

615
00:26:18.395 --> 00:26:20.435
because depending on what you told it to do,

616
00:26:20.455 --> 00:26:22.355
it could take 15 minutes, it could take an hour,

617
00:26:22.355 --> 00:26:23.755
it could take a couple hours depending on

618
00:26:23.755 --> 00:26:24.835
how much training data you give

619
00:26:24.835 --> 00:26:26.795
and how much their batch sizes

620
00:26:26.915 --> 00:26:28.675
and all those hyper parameters that you gave it.

621
00:26:28.675 --> 00:26:29.875
So we're gonna let that thing run.

622
00:26:30.295 --> 00:26:32.195
But in the meantime, we're just going

623
00:26:32.195 --> 00:26:34.195
to look at the data that it generated.

624
00:26:34.215 --> 00:26:38.905
And so yesterday I ended up with three, three files here.

625
00:26:38.965 --> 00:26:42.585
So what we did is we generated the questions

626
00:26:43.505 --> 00:26:45.765
and then we then after we, so, and then

627
00:26:45.765 --> 00:26:47.925
after we did that, we, we created our training file.

628
00:26:48.015 --> 00:26:49.165
After we had the training file,

629
00:26:49.165 --> 00:26:50.485
we train, we fine tuned a model.

630
00:26:50.865 --> 00:26:52.245
And then what I would do is

631
00:26:52.245 --> 00:26:54.885
after I got the fine tuned model, I came back in here

632
00:26:55.225 --> 00:26:57.965
and I adjusted my, uh, question helper here that I have

633
00:26:58.465 --> 00:27:01.285
to go ahead and used the new model.

634
00:27:01.625 --> 00:27:02.645
So I then asked it

635
00:27:02.645 --> 00:27:06.045
to make a thousand questions using my fine tuned model.

636
00:27:06.155 --> 00:27:08.925
Alright? And so then it created another CSV file.

637
00:27:09.025 --> 00:27:11.205
And so then it created a thousand questions.

638
00:27:11.205 --> 00:27:12.845
And what we did was just kind of check

639
00:27:13.585 --> 00:27:15.485
did we see any difference in the length of the items?

640
00:27:15.625 --> 00:27:17.925
Now again, we're not giving it any instructions about the

641
00:27:17.925 --> 00:27:20.725
length, we're just giving it the, we're just,

642
00:27:20.935 --> 00:27:23.805
we're just fine tuning it on the styles of the questions

643
00:27:23.805 --> 00:27:26.045
that we want to generate and see if it picks up on that.

644
00:27:26.905 --> 00:27:28.925
So the results that we ended up with

645
00:27:28.925 --> 00:27:33.225
after those three test runs are right here.

646
00:27:35.105 --> 00:27:36.485
All right, so you see in the beginning,

647
00:27:36.485 --> 00:27:37.725
this was our original one.

648
00:27:38.105 --> 00:27:39.445
The, the smallest question

649
00:27:39.445 --> 00:27:41.445
that was generated was 34 characters.

650
00:27:41.585 --> 00:27:43.445
The largest question was 192.

651
00:27:43.505 --> 00:27:45.645
We had an average character count of 93,

652
00:27:46.065 --> 00:27:48.445
and the number of items that were less than, uh,

653
00:27:48.525 --> 00:27:51.045
105 characters was 706.

654
00:27:51.145 --> 00:27:53.645
So 706 of the items that I generated just

655
00:27:53.645 --> 00:27:57.525
with the GPT-3 0.5 model were 700, six of them were

656
00:27:57.525 --> 00:27:58.725
below 105 characters.

657
00:27:59.385 --> 00:28:02.405
So then I created a fine tune model with three epochs,

658
00:28:02.665 --> 00:28:05.085
and then I generated another thousand questions using

659
00:28:05.085 --> 00:28:06.165
that fine tune model.

660
00:28:06.545 --> 00:28:08.965
The results that I saw was, we did see a difference there.

661
00:28:09.105 --> 00:28:10.845
Um, mainly right here, the number

662
00:28:10.845 --> 00:28:13.565
of items less than 105 characters, it reduced.

663
00:28:14.105 --> 00:28:18.125
All right? So, um, 242 or something like that, alright?

664
00:28:18.125 --> 00:28:19.805
And my average character count went up

665
00:28:19.945 --> 00:28:21.565
and then I just tried it one more time.

666
00:28:21.905 --> 00:28:25.325
So you can actually fine tune, um, on a fine tune model.

667
00:28:25.385 --> 00:28:27.885
In this case, what I did was I just ran my fine tuning job

668
00:28:27.885 --> 00:28:29.565
against my base dataset twice, once

669
00:28:29.565 --> 00:28:31.125
with three epochs and once with 10 epochs.

670
00:28:31.305 --> 00:28:34.925
The ones, once I did 10 pass throughs, we had 317.

671
00:28:35.425 --> 00:28:36.685
So this is interesting,

672
00:28:36.705 --> 00:28:39.845
but it also raises a question about overfitting.

673
00:28:39.995 --> 00:28:43.805
Alright, so overfitting is when the, you get the model

674
00:28:43.825 --> 00:28:45.565
to be really good at a certain task,

675
00:28:45.705 --> 00:28:47.405
but now you've adjusted the weights

676
00:28:47.405 --> 00:28:49.605
and balances so much that it's not gonna be

677
00:28:49.605 --> 00:28:51.365
so good at generating questions that aren't

678
00:28:51.365 --> 00:28:52.685
for this specific task.

679
00:28:53.225 --> 00:28:56.405
So the fact that it's get, so the fact that I ran 10,

680
00:28:56.525 --> 00:28:58.685
I would be really suspicious about overfitting

681
00:28:58.685 --> 00:29:00.765
and what kind of items it could generate otherwise.

682
00:29:01.185 --> 00:29:04.405
But for the purposes of just this plain, uh, demonstration,

683
00:29:04.665 --> 00:29:06.965
it was interesting to see how it kind

684
00:29:06.965 --> 00:29:08.765
of got more in alignment with what we were expecting

685
00:29:09.275 --> 00:29:11.845
without me giving any additional prompts,

686
00:29:11.845 --> 00:29:13.525
context stuffing or anything like that.

687
00:29:13.665 --> 00:29:16.085
One shot, few shot learnings to the, to the model.

688
00:29:17.825 --> 00:29:22.405
So that, um, that the fact that it can, um,

689
00:29:23.035 --> 00:29:27.045
kind of infer how you're trying to train it, uh, is kind

690
00:29:27.045 --> 00:29:28.125
of a double-edged sword, right?

691
00:29:28.125 --> 00:29:31.725
Because we didn't say anything about using, you know,

692
00:29:31.825 --> 00:29:36.445
larger items, uh, when it's, uh, responding, uh, it,

693
00:29:36.505 --> 00:29:38.005
it just knows to do that, right?

694
00:29:38.005 --> 00:29:39.725
Because we gave all those examples,

695
00:29:39.825 --> 00:29:43.365
but there could easily be other characteristics

696
00:29:43.545 --> 00:29:44.845
of those items that we

697
00:29:45.125 --> 00:29:48.085
provided that are not necessarily obvious to us

698
00:29:48.555 --> 00:29:52.165
that we could, if we continue to, to use that training data,

699
00:29:52.585 --> 00:29:54.565
uh, it could start, you know, using those

700
00:29:54.745 --> 00:29:56.685
as characteristics in, in the items

701
00:29:56.685 --> 00:29:57.765
that it generates as well.

702
00:29:58.465 --> 00:30:01.505
Yep. Chris, we,

703
00:30:02.085 --> 00:30:03.745
Oh, you do have a question in the chat?

704
00:30:04.125 --> 00:30:05.665
So it says, curious to know,

705
00:30:05.775 --> 00:30:07.505
what are the biggest challenges you faced

706
00:30:07.505 --> 00:30:09.025
during the fine tuning process?

707
00:30:12.745 --> 00:30:14.845
Um, I think it was a lot of trial

708
00:30:14.845 --> 00:30:16.925
and error just coming, coming into afresh.

709
00:30:17.025 --> 00:30:19.725
We, we tried, we read about a lot of the things

710
00:30:19.725 --> 00:30:22.405
that we read about didn't really seem to pan out for us,

711
00:30:22.505 --> 00:30:24.285
and I wasn't sure if that would just due to lack

712
00:30:24.285 --> 00:30:26.005
of knowledge on our part or, um,

713
00:30:26.785 --> 00:30:29.525
or, uh, just it not, not working as expected.

714
00:30:29.665 --> 00:30:31.925
But one of the things that comes to mind, Kyle, is those,

715
00:30:31.945 --> 00:30:34.765
uh, those anti those anti weights that we were trying to do.

716
00:30:35.225 --> 00:30:37.285
And so what we were trying to do there is generate a

717
00:30:37.445 --> 00:30:39.645
training file where we would basically say,

718
00:30:39.955 --> 00:30:41.005
this is a generate.

719
00:30:41.185 --> 00:30:42.725
So I had my system instruction,

720
00:30:42.725 --> 00:30:44.125
which is you generate questions,

721
00:30:44.245 --> 00:30:45.965
I asked the generated question as a user,

722
00:30:46.265 --> 00:30:48.645
and then I gave, then I had it, my training file.

723
00:30:48.705 --> 00:30:50.445
Say, okay, here's your question.

724
00:30:50.865 --> 00:30:52.605
And then I tried doing positive

725
00:30:52.605 --> 00:30:54.045
and negative reinforcement after that.

726
00:30:54.185 --> 00:30:56.045
So I would say, this is a bad item

727
00:30:56.045 --> 00:30:58.405
because it is less than 105 characters,

728
00:30:58.425 --> 00:31:00.365
or this is a good item because it is more than

729
00:31:00.365 --> 00:31:01.405
105 characters.

730
00:31:01.865 --> 00:31:03.485
And then, um, and,

731
00:31:03.745 --> 00:31:05.605
and we didn't see any difference with that really.

732
00:31:05.625 --> 00:31:08.965
It might may like it, it made me, it was so insignificant

733
00:31:08.965 --> 00:31:11.085
that it just made me think it was coincidental other than,

734
00:31:11.105 --> 00:31:12.925
um, actual cause and response there.

735
00:31:13.385 --> 00:31:15.845
And, and where, where it gets challenging with that,

736
00:31:15.875 --> 00:31:18.205
it's like, okay, is it really not that or is it

737
00:31:18.205 --> 00:31:20.085
because you need to tweak your hyper parameters

738
00:31:20.105 --> 00:31:21.885
and do you need to do this thing a hundred times,

739
00:31:22.385 --> 00:31:25.085
10 different ways in order to get what your end result is?

740
00:31:25.425 --> 00:31:28.445
And so it's really just building that knowledge set

741
00:31:28.445 --> 00:31:30.965
that's required to really build a good,

742
00:31:31.625 --> 00:31:32.725
uh, fine tuned model.

743
00:31:32.825 --> 00:31:35.125
And Kyle brought up a great example there

744
00:31:35.125 --> 00:31:37.285
because you gotta also be careful,

745
00:31:37.355 --> 00:31:39.605
like you're not only working towards your your task,

746
00:31:39.905 --> 00:31:41.885
you want to make sure you don't break all the other tasks

747
00:31:41.885 --> 00:31:43.005
that you're not focusing on,

748
00:31:45.125 --> 00:31:46.125
Right? Um,

749
00:31:46.125 --> 00:31:48.515
and, and I would say the, the other, um,

750
00:31:49.455 --> 00:31:50.915
the other challenge that we face,

751
00:31:50.915 --> 00:31:53.475
and I think you kind of alluded to it, is, uh,

752
00:31:53.505 --> 00:31:56.475
when we are messing with this stuff, you know, as soon

753
00:31:56.475 --> 00:32:00.235
as OpenAI puts it out, um, you know, we're,

754
00:32:00.785 --> 00:32:02.675
they're learning to right, like this,

755
00:32:02.675 --> 00:32:04.875
this stuff is not in its final form.

756
00:32:05.215 --> 00:32:08.795
Uh, and it's quite possible that we're, uh, seeing things

757
00:32:09.305 --> 00:32:12.315
that are bugs on, you know, open AI's end

758
00:32:12.315 --> 00:32:13.595
and we don't know about it.

759
00:32:13.775 --> 00:32:16.995
We think that we are doing, uh, something wrong on,

760
00:32:17.095 --> 00:32:20.635
on our end, or maybe there's documentation about the way

761
00:32:20.635 --> 00:32:22.595
that things should work, but it's not,

762
00:32:22.855 --> 00:32:24.195
you know, quite there yet.

763
00:32:25.135 --> 00:32:27.355
Um, we have, uh, another question.

764
00:32:27.775 --> 00:32:30.235
Uh, how would you measure the success of a,

765
00:32:30.255 --> 00:32:31.635
of a fine tuned model,

766
00:32:34.325 --> 00:32:35.325
Man? Um, so

767
00:32:35.325 --> 00:32:37.025
I think the validation files

768
00:32:37.025 --> 00:32:38.185
really come into play there.

769
00:32:38.245 --> 00:32:39.685
So by having your test set,

770
00:32:39.785 --> 00:32:42.605
you're having a static comparison against everything, right?

771
00:32:43.005 --> 00:32:44.245
Whenever, whenever you do a comparison,

772
00:32:44.305 --> 00:32:45.765
you always wanna make sure that you're,

773
00:32:45.785 --> 00:32:48.845
you're comparing your results to the same expected result.

774
00:32:49.105 --> 00:32:50.925
And so by having that validation file

775
00:32:51.225 --> 00:32:54.285
and looking at the, the loss, the loss, uh, function

776
00:32:54.305 --> 00:32:55.645
and seeing what kind of values you get

777
00:32:55.645 --> 00:32:57.005
there, I think that's gonna be good.

778
00:32:57.005 --> 00:33:00.045
But there's also the, your, your angle, your end users, um,

779
00:33:00.145 --> 00:33:01.405
are, are they seeing the benefit

780
00:33:01.475 --> 00:33:02.725
that you intended to have there?

781
00:33:02.745 --> 00:33:04.965
So I think there's several different metrics, both, um,

782
00:33:04.965 --> 00:33:06.365
that can be done through calculations

783
00:33:06.585 --> 00:33:08.085
and also through user feedback.

784
00:33:09.815 --> 00:33:13.515
Uh, so we've got only about, uh, 13 minutes left.

785
00:33:13.595 --> 00:33:15.755
I wanna make sure that we have, uh, plenty of time

786
00:33:15.895 --> 00:33:17.155
for, for rag.

787
00:33:17.735 --> 00:33:20.115
Um, so let's get started there.

788
00:33:20.735 --> 00:33:24.035
Um, I think we wanted to start by defining, uh,

789
00:33:24.515 --> 00:33:28.355
a few concepts, Chris, uh, embeddings chunking strategies,

790
00:33:28.495 --> 00:33:30.875
uh, semantic search versus keyword search,

791
00:33:31.885 --> 00:33:32.885
Right? Yeah. So,

792
00:33:32.885 --> 00:33:35.675
so going back to like rag, what is rag,

793
00:33:35.675 --> 00:33:36.715
what do we want to do there?

794
00:33:36.775 --> 00:33:39.875
So with rag retrieval, augmented generation, what we want

795
00:33:39.875 --> 00:33:41.195
to do is we want to go

796
00:33:41.455 --> 00:33:45.355
and choose only the data that we want to include

797
00:33:45.355 --> 00:33:48.835
with our prompt as when we ask the generative model.

798
00:33:48.855 --> 00:33:51.435
So when we ask the generative model something, we only want

799
00:33:51.435 --> 00:33:53.275
to provide just the minimal amount of prompt

800
00:33:53.275 --> 00:33:55.155
with the most pertinent information, so

801
00:33:55.155 --> 00:33:56.355
that way we're most likely to

802
00:33:56.355 --> 00:33:57.475
get the result that's intended.

803
00:33:57.935 --> 00:33:59.635
So how do you do that? How do you get from

804
00:33:59.635 --> 00:34:00.675
point A to point B?

805
00:34:01.255 --> 00:34:03.795
So to start with, um, Kyle, Kyle

806
00:34:03.795 --> 00:34:07.435
and I's first, uh, uh, kind of getting our feet wet

807
00:34:07.435 --> 00:34:09.595
with this thing was with the, uh, a TP chat bot

808
00:34:09.595 --> 00:34:12.195
that we put together for, uh, the A TP conference.

809
00:34:12.255 --> 00:34:15.235
And so beforehand what we wanted to do was we kind

810
00:34:15.235 --> 00:34:16.955
of rolled our own rag solution just

811
00:34:16.955 --> 00:34:18.235
to kind of work our way through it.

812
00:34:18.255 --> 00:34:20.995
And so Amanda had put together what what's called

813
00:34:20.995 --> 00:34:22.115
as a A TP playbook.

814
00:34:22.255 --> 00:34:24.395
And so it has a lot of general information in it

815
00:34:24.395 --> 00:34:27.595
and a lot of really good ITS specific information in it

816
00:34:27.595 --> 00:34:28.755
that, that we all needed to use.

817
00:34:29.055 --> 00:34:31.995
Um, it's a big document, great document, big document.

818
00:34:31.995 --> 00:34:33.395
And so what we want to do is say, well,

819
00:34:33.615 --> 00:34:35.155
can we take this document along

820
00:34:35.155 --> 00:34:39.195
with the A TP session schedule and then do rag on it

821
00:34:39.195 --> 00:34:42.395
and create a little teams a TP chatbot in order to do it?

822
00:34:42.495 --> 00:34:44.915
And so to set about doing that, the first thing we had

823
00:34:44.915 --> 00:34:46.515
to do was, well, how do we index our data?

824
00:34:46.535 --> 00:34:49.395
How do, how do we make it so GPT can just read this

825
00:34:49.695 --> 00:34:52.515
so we're not sending the model the whole PDF

826
00:34:52.755 --> 00:34:53.835
document every single time.

827
00:34:54.455 --> 00:34:56.515
So what we needed to do was we needed

828
00:34:56.515 --> 00:34:58.155
to ingest this document in a way

829
00:34:58.155 --> 00:34:59.515
that could be consumed by the model.

830
00:35:00.415 --> 00:35:03.835
So the first thing that we did there was we broke it out.

831
00:35:03.835 --> 00:35:06.715
We came up with a quote unquote chunking strategy. Alright?

832
00:35:06.975 --> 00:35:09.515
And so a chunking strategy, essentially at the end

833
00:35:09.515 --> 00:35:11.795
of the day, what you're trying to do there is you're trying

834
00:35:11.795 --> 00:35:15.115
to take your big PDF document, your big Excel spreadsheet

835
00:35:15.295 --> 00:35:18.675
and break it up into pieces of information that make sense

836
00:35:19.135 --> 00:35:20.435
as a, as a standalone,

837
00:35:20.435 --> 00:35:22.115
they have their own semantic meaning to them.

838
00:35:22.375 --> 00:35:24.275
And that way you like it sets you up to be able

839
00:35:24.275 --> 00:35:25.875
to send these things to the provider.

840
00:35:26.535 --> 00:35:31.035
Um, so I'm gonna share my screen again here. Alright?

841
00:35:31.035 --> 00:35:34.155
And so I just have a little tiny example of it

842
00:35:34.915 --> 00:35:36.215
in an Excel spreadsheet.

843
00:35:36.365 --> 00:35:37.565
What we did though was, uh,

844
00:35:37.565 --> 00:35:39.685
we ended up using a Postgres SQL database

845
00:35:39.925 --> 00:35:42.645
'cause we really wanted to use, um, some native vector, um,

846
00:35:42.965 --> 00:35:45.685
functionality that you could, you could figure out in sql,

847
00:35:45.685 --> 00:35:47.165
but it was just easier through Postgres.

848
00:35:47.165 --> 00:35:49.165
So the first thing I did was I went

849
00:35:49.165 --> 00:35:51.085
through Amanda's document and I chucked it up.

850
00:35:51.125 --> 00:35:52.765
I chunked it up. Hey, yeah, you're

851
00:35:52.765 --> 00:35:53.765
Not sharing your screen yet. We

852
00:35:53.765 --> 00:35:55.125
can't see it yet. If you are

853
00:35:55.665 --> 00:35:58.005
It def Oh, I, 'cause I didn't hit the share button.

854
00:35:58.155 --> 00:36:01.445
Yeah. All right. Okay. You see it now? Yes. All right.

855
00:36:01.585 --> 00:36:03.005
Can you see my screen? All right. Okay.

856
00:36:03.385 --> 00:36:05.485
So what we did was we chunked it up into a

857
00:36:05.485 --> 00:36:06.525
lot of different pieces here.

858
00:36:06.525 --> 00:36:07.965
All right, so I'm just showing you six records

859
00:36:08.245 --> 00:36:10.085
'cause I didn't wanna show all the beautiful stuff Amanda

860
00:36:10.105 --> 00:36:11.805
had in there 'cause I probably a lot

861
00:36:11.805 --> 00:36:12.805
of it stuff we don't wanna show,

862
00:36:12.825 --> 00:36:13.845
but I just showed some,

863
00:36:13.955 --> 00:36:15.805
some basic information to give you an idea.

864
00:36:15.865 --> 00:36:19.005
So the first thing I did was chunk it. Alright, okay, cool.

865
00:36:19.125 --> 00:36:22.365
I have a database of all these strings. Now what do I do?

866
00:36:22.425 --> 00:36:24.365
How, how do I know which ones are relevant

867
00:36:24.365 --> 00:36:25.525
to the question being asked?

868
00:36:25.555 --> 00:36:28.085
Alright, so that's where embeddings come into play.

869
00:36:28.305 --> 00:36:30.605
So, and embedding, think about it, it's,

870
00:36:30.605 --> 00:36:32.725
so it's really taking that string of text

871
00:36:32.825 --> 00:36:35.125
and it's making it so that it's, it's,

872
00:36:35.125 --> 00:36:37.525
it's putting it in computer language at a really high level.

873
00:36:37.665 --> 00:36:40.245
So, uh, one way to think about it is it's like kind

874
00:36:40.245 --> 00:36:43.165
of coordinates to, to all the texts, all the,

875
00:36:43.555 --> 00:36:46.925
it's like provides semantic meaning in number form, alright?

876
00:36:46.925 --> 00:36:49.165
And so it's coordinates to that semantic value

877
00:36:49.465 --> 00:36:51.685
and then that way you can actually compare things

878
00:36:51.705 --> 00:36:53.805
and do like a co-sign similarity lookup,

879
00:36:53.805 --> 00:36:54.965
which I'll get to in a minute here.

880
00:36:54.965 --> 00:36:57.605
So the next thing I did was I used an embedding model.

881
00:36:57.765 --> 00:37:01.455
I think I might've used, um, ada, possibly a to oh

882
00:37:01.515 --> 00:37:03.135
to the open AI embedding model.

883
00:37:03.195 --> 00:37:05.575
And so for each one of these strings in my database,

884
00:37:05.935 --> 00:37:08.095
I set it up to the embedding endpoint.

885
00:37:08.095 --> 00:37:10.455
Alright, what does an embedding endpoint do?

886
00:37:10.485 --> 00:37:11.575
Well, it takes your string

887
00:37:11.795 --> 00:37:14.455
and it generates an embedding, which is pretty much a series

888
00:37:14.455 --> 00:37:17.775
of numbers that provide a semantic, uh, context

889
00:37:18.075 --> 00:37:19.615
around your string that you sent it.

890
00:37:19.615 --> 00:37:21.695
And so now I ended up with a database

891
00:37:21.695 --> 00:37:22.855
that had two columns here,

892
00:37:22.855 --> 00:37:25.015
a database table added two columns, one for my text

893
00:37:25.275 --> 00:37:26.615
and one for my embedding, right?

894
00:37:26.835 --> 00:37:28.895
You don't need to know how these embeddings work

895
00:37:28.915 --> 00:37:29.975
to use them, alright?

896
00:37:30.315 --> 00:37:31.895
But what we do now is, okay,

897
00:37:31.895 --> 00:37:34.215
so now I have these number values and I have a string.

898
00:37:34.725 --> 00:37:36.815
Okay, cool. Alright, so now I have a database.

899
00:37:36.915 --> 00:37:39.975
So how, again, how do I know, how does that help me get

900
00:37:39.975 --> 00:37:42.335
to the point where I can include these pieces of information

901
00:37:42.755 --> 00:37:45.175
to the model and and include these with my prompt?

902
00:37:45.205 --> 00:37:47.375
Alright, so let's say like, where,

903
00:37:47.425 --> 00:37:48.895
where are we staying at the hotel?

904
00:37:48.915 --> 00:37:50.975
You could see right here, Anaheim Marriott.

905
00:37:51.155 --> 00:37:53.815
How does it know out of these five pieces of text, that's

906
00:37:53.815 --> 00:37:55.015
that's the one that I need to do.

907
00:37:55.475 --> 00:37:57.455
That's where these lookups come in. All right?

908
00:37:57.675 --> 00:37:58.735
And so what Kyle

909
00:37:58.735 --> 00:38:02.055
and I did, we did a very, very basic one, um,

910
00:38:02.055 --> 00:38:04.135
that really showed us true colors when we compared it

911
00:38:04.135 --> 00:38:05.415
against some of the other more,

912
00:38:05.435 --> 00:38:06.735
uh, provider friendly models.

913
00:38:07.035 --> 00:38:09.455
Um, so we did a basic co-sign similarity.

914
00:38:09.455 --> 00:38:12.175
So the way it works is, so let's say I take my prompt.

915
00:38:12.445 --> 00:38:14.735
What hotel am I staying at for a TP?

916
00:38:15.275 --> 00:38:18.255
The first thing I do is it in my chat system.

917
00:38:18.315 --> 00:38:19.655
The way the chat bot, the first thing

918
00:38:19.655 --> 00:38:21.495
that chat bot does is it takes that question

919
00:38:21.955 --> 00:38:24.655
and doesn't try to answer it, it doesn't try to do anything.

920
00:38:24.755 --> 00:38:28.535
We take that string, where am I staying at a TP sends it up

921
00:38:28.535 --> 00:38:29.575
to the embedding endpoint

922
00:38:29.595 --> 00:38:31.735
and it gets an embedding back for that string.

923
00:38:32.155 --> 00:38:35.255
So now I call up and I have an embedding that embedding,

924
00:38:35.365 --> 00:38:38.765
we then use that to do a co-sign similarity lookup across

925
00:38:38.985 --> 00:38:40.365
all everything in the database.

926
00:38:40.365 --> 00:38:42.965
Alright? And so what it's gonna do is it's gonna take

927
00:38:42.965 --> 00:38:45.045
that string for where am I staying at a TP,

928
00:38:45.265 --> 00:38:47.005
and it's going to find, it's going

929
00:38:47.005 --> 00:38:50.245
to order everything in my database that is, um,

930
00:38:50.555 --> 00:38:53.685
closeness in an order of closeness to that semantic meaning.

931
00:38:53.685 --> 00:38:57.085
Alright? So ideally this row right here is gonna be

932
00:38:57.085 --> 00:38:58.125
at the top one, right?

933
00:38:58.505 --> 00:39:00.605
So what we've talked about here is kind of a, a,

934
00:39:00.925 --> 00:39:02.685
a lightweight chunking strategy, all right?

935
00:39:02.745 --> 00:39:05.965
And so again, our chunking strategy was also poor in the

936
00:39:05.965 --> 00:39:07.205
fact that we didn't have any overlap.

937
00:39:07.205 --> 00:39:09.725
When you define a chunking strategy, you wanna add,

938
00:39:09.725 --> 00:39:11.805
chunk it up into text, but you also wanna do a

939
00:39:11.925 --> 00:39:13.005
thing that's called overlap.

940
00:39:13.275 --> 00:39:16.785
What overlap does is it says, so let's say my chunk,

941
00:39:16.985 --> 00:39:19.225
I wanna define it as 800 tokens, alright?

942
00:39:19.405 --> 00:39:22.105
And then I define my overlap as 400 tokens.

943
00:39:22.365 --> 00:39:25.185
So when I create a chunk about the Anaheim Marriott Hotel,

944
00:39:25.185 --> 00:39:27.705
about where we're staying, when I get to my next piece

945
00:39:27.705 --> 00:39:29.825
of information that's chunked, I'm going

946
00:39:29.825 --> 00:39:32.705
to include the last 400 tokens of this one

947
00:39:33.085 --> 00:39:36.585
and include it as the first 400 tokens of my next chunk.

948
00:39:36.735 --> 00:39:38.745
Alright? So you're getting some overlap,

949
00:39:38.745 --> 00:39:39.985
you're getting some relationships,

950
00:39:40.205 --> 00:39:42.145
and it helps with that semantic lookup

951
00:39:42.145 --> 00:39:44.145
that you're gonna be doing in your semantic rankings.

952
00:39:44.145 --> 00:39:45.625
And so that's a chunking strategy.

953
00:39:45.685 --> 00:39:48.425
The last piece of the chunking strategy is, well,

954
00:39:48.425 --> 00:39:50.305
how many chunks do you want to include, right?

955
00:39:50.485 --> 00:39:53.185
So just because I order all these things by order

956
00:39:53.205 --> 00:39:56.425
of relevance it, I'm still not gonna send them all up.

957
00:39:56.485 --> 00:39:59.105
And so you gotta draw your line, where's your hard line?

958
00:39:59.105 --> 00:40:01.225
And so in our case, I think we ended up with like 10

959
00:40:01.225 --> 00:40:02.945
or 15 chunks that we wanted to send.

960
00:40:03.005 --> 00:40:06.585
So imagine I had a table here of like 500 rows,

961
00:40:06.585 --> 00:40:10.545
or a th I think we have like 3005, 3000 rows.

962
00:40:10.545 --> 00:40:11.705
Let's just say 3000 rows.

963
00:40:11.855 --> 00:40:14.145
Well, I gotta find the best 20 out

964
00:40:14.145 --> 00:40:15.905
of those 3000 I'm gonna send the model

965
00:40:15.905 --> 00:40:17.465
because I don't want overload it.

966
00:40:17.465 --> 00:40:20.185
Alright? So we do it, we did our cosign similarity

967
00:40:20.245 --> 00:40:21.265
and then we sent it up

968
00:40:21.365 --> 00:40:23.345
and we just crossed our fingers that the piece

969
00:40:23.345 --> 00:40:25.145
of information about Anaheim was in it,

970
00:40:25.245 --> 00:40:26.545
and then it would answer it,

971
00:40:26.805 --> 00:40:29.865
and then it, um, more often than not, it worked really well.

972
00:40:30.075 --> 00:40:31.225
Where, Hey, Chris, where are we?

973
00:40:31.375 --> 00:40:35.065
Yeah, he's got four minutes. Just fyi, man. All right. Okay.

974
00:40:35.065 --> 00:40:38.425
Alright, so, so, so that, that's the idea, all right?

975
00:40:38.685 --> 00:40:41.545
Um, so now how, how can you do that a little better, right?

976
00:40:41.545 --> 00:40:43.145
So that, that's a lot of work.

977
00:40:43.245 --> 00:40:44.705
You gotta come with a chunking strategy.

978
00:40:44.845 --> 00:40:46.585
You gotta fine tune, you gotta make sure it's good.

979
00:40:47.145 --> 00:40:49.265
Providers have already figured that out. Alright?

980
00:40:49.365 --> 00:40:52.265
So there's other, so something we, something

981
00:40:52.265 --> 00:40:54.465
that we've been looking at now, uh, that we're integrated

982
00:40:54.465 --> 00:40:56.945
with is, uh, open ai, uh, vector stores.

983
00:40:57.625 --> 00:40:59.785
AWS has a thing called knowledge bases. Alright?

984
00:41:00.005 --> 00:41:02.105
And so what they're doing with these vector stores

985
00:41:02.125 --> 00:41:04.185
and these knowledge bases is they're

986
00:41:04.185 --> 00:41:05.345
doing all this work for you.

987
00:41:05.345 --> 00:41:07.385
They're doing the ingestion part. Alright?

988
00:41:07.725 --> 00:41:10.905
So what you do is, um, so we have another example here

989
00:41:10.905 --> 00:41:12.025
of it being integrated.

990
00:41:12.885 --> 00:41:14.825
And so with, so if you take, again, staying

991
00:41:14.825 --> 00:41:17.985
with the open AI example, all right, well now I need

992
00:41:17.985 --> 00:41:19.145
to be on the VP n again.

993
00:41:19.405 --> 00:41:20.405
All right?

994
00:41:28.415 --> 00:41:30.795
So just interrupt me when you're back on the VPN Chris.

995
00:41:31.295 --> 00:41:32.955
Um, so what we've seen here is kind

996
00:41:32.955 --> 00:41:35.365
of a homegrown solution, uh, to rag.

997
00:41:35.545 --> 00:41:37.205
We built a database.

998
00:41:37.465 --> 00:41:40.045
We, uh, took all of our documents

999
00:41:40.045 --> 00:41:41.645
and we split 'em up into chunks.

1000
00:41:41.665 --> 00:41:42.965
We added 'em to the database.

1001
00:41:43.385 --> 00:41:46.525
The idea was when a question comes in, we'll see which

1002
00:41:46.525 --> 00:41:48.845
of those chunks is closest to the question, uh,

1003
00:41:48.865 --> 00:41:50.125
by comparing embeddings.

1004
00:41:50.185 --> 00:41:52.485
And then, uh, we'll take those rows

1005
00:41:52.945 --> 00:41:55.685
and we'll supply them as, as part of the prompt.

1006
00:41:56.755 --> 00:41:59.805
What we're gonna look at next is, uh,

1007
00:42:00.185 --> 00:42:02.045
OpenAI doing all of that for us.

1008
00:42:02.275 --> 00:42:04.005
Alls we have to do is supply the documents.

1009
00:42:04.005 --> 00:42:05.085
So we're gonna upload those

1010
00:42:05.585 --> 00:42:07.645
and then we're essentially gonna ask a question.

1011
00:42:08.195 --> 00:42:11.325
It's going to find the relevant information, it's going

1012
00:42:11.325 --> 00:42:13.965
to append it to, uh, the prompt that we send.

1013
00:42:14.145 --> 00:42:15.485
And it's really that easy.

1014
00:42:16.185 --> 00:42:18.565
Yep. Go ahead Chris. Yep. No, that's a great setup.

1015
00:42:18.785 --> 00:42:20.365
So in order to use the, um,

1016
00:42:20.505 --> 00:42:22.245
so the vector stores are only available

1017
00:42:22.245 --> 00:42:23.285
through the assistance.

1018
00:42:23.285 --> 00:42:24.965
So these agents that OpenAI has.

1019
00:42:24.985 --> 00:42:28.005
So an agent without kind of getting down to the weeds,

1020
00:42:28.065 --> 00:42:29.565
is just, it's a way that's going

1021
00:42:29.565 --> 00:42:31.765
to process your messages in a stateful sense.

1022
00:42:31.785 --> 00:42:34.405
So it keeps a history, it manages these threads.

1023
00:42:34.405 --> 00:42:36.845
And so I'm just gonna set up a, a little assistant here

1024
00:42:37.625 --> 00:42:39.285
to process my vector store

1025
00:42:39.285 --> 00:42:40.565
and I'm gonna give it some instructions.

1026
00:42:40.805 --> 00:42:43.405
Honestly, you are a helpful assistant

1027
00:42:43.945 --> 00:42:45.685
and then you could, you could give it any kind

1028
00:42:45.685 --> 00:42:47.845
of instructions here, whether whatever you want to do

1029
00:42:47.945 --> 00:42:51.125
and say that really loves dogs, all right?

1030
00:42:51.745 --> 00:42:54.045
And so then I could choose what kind of style that I want it

1031
00:42:54.045 --> 00:42:57.875
to work in, and then I can go ahead and choose my model

1032
00:42:57.935 --> 00:42:59.075
and create the assistant.

1033
00:42:59.175 --> 00:43:01.635
All right? So now what I've done is I actually went ahead

1034
00:43:01.635 --> 00:43:02.675
and I created a whole bunch

1035
00:43:02.675 --> 00:43:05.395
of documentation about a product that I made up, right?

1036
00:43:05.395 --> 00:43:08.275
Because to really drive this point home, I don't want

1037
00:43:08.275 --> 00:43:10.315
to use something that's already available on the internet

1038
00:43:10.315 --> 00:43:11.475
that the model's been trained on.

1039
00:43:11.495 --> 00:43:13.835
So I came up with a new product called Send It Later.

1040
00:43:14.185 --> 00:43:17.035
What it does is the idea behind the product is allows you

1041
00:43:17.035 --> 00:43:18.555
to schedule a message that could be sent

1042
00:43:18.555 --> 00:43:19.715
later on any platform.

1043
00:43:19.715 --> 00:43:20.995
It could be cross platform, multi

1044
00:43:20.995 --> 00:43:22.075
users, those type of things.

1045
00:43:22.135 --> 00:43:23.795
And so, wow, would I use this?

1046
00:43:23.825 --> 00:43:25.835
Well, let's say like Kyle's birthday's tomorrow

1047
00:43:25.935 --> 00:43:26.995
and I'm gonna be busy tomorrow.

1048
00:43:27.095 --> 00:43:28.195
No, I'm gonna forget about it.

1049
00:43:28.195 --> 00:43:30.235
So I'm just gonna write my message now, scheduled it

1050
00:43:30.235 --> 00:43:31.675
to be sent tomorrow, and then Kyle's gonna be

1051
00:43:31.675 --> 00:43:32.755
happy that I thought about 'em.

1052
00:43:32.755 --> 00:43:34.395
Alright? So that's the idea behind the product.

1053
00:43:34.455 --> 00:43:36.115
So I created a bunch of financial statements,

1054
00:43:36.115 --> 00:43:37.635
product summaries, those type of things,

1055
00:43:37.975 --> 00:43:39.995
and I added them to a vector store.

1056
00:43:39.995 --> 00:43:42.795
Alright? So I created my Chris test spot

1057
00:43:43.295 --> 00:43:45.755
and just to show you, um, I'm gonna ask it about,

1058
00:43:45.825 --> 00:43:48.395
tell me about Send It later, right?

1059
00:43:48.735 --> 00:43:52.275
And most likely being that it's ai, it's gonna try

1060
00:43:52.275 --> 00:43:53.435
and talk its way out of it.

1061
00:43:53.465 --> 00:43:54.675
It's gonna come up with something

1062
00:43:54.675 --> 00:43:56.075
that doesn't make any sense, right?

1063
00:43:56.375 --> 00:43:59.155
And so it's coming up with a service that's kind of similar,

1064
00:43:59.295 --> 00:44:01.875
but it's not really what mine is, alright?

1065
00:44:01.905 --> 00:44:04.195
It's not matching the documentation that I gave it.

1066
00:44:04.415 --> 00:44:07.795
So what I'm gonna do now is I'm gonna give my assistant all

1067
00:44:07.795 --> 00:44:09.835
of that rag everything that we talked about,

1068
00:44:09.935 --> 00:44:11.835
except I'm doing it through OpenAI,

1069
00:44:12.055 --> 00:44:14.675
who does it a lot better, who's using semantic ranking

1070
00:44:15.225 --> 00:44:18.475
keyword, a lot of things to get in place a whole lot better.

1071
00:44:18.935 --> 00:44:20.835
Top 10 results in what I was getting.

1072
00:44:20.975 --> 00:44:23.675
All right, so I have already set this up. I, yeah,

1073
00:44:24.415 --> 00:44:25.415
We are at time.

1074
00:44:26.175 --> 00:44:28.755
Um, we are at 1 45, so just

1075
00:44:28.835 --> 00:44:29.835
I have 1 43.

1076
00:44:30.065 --> 00:44:31.435
Alright? Oh, okay.

1077
00:44:31.495 --> 00:44:33.115
Am I early? Go ahead, keep going.

1078
00:44:33.145 --> 00:44:35.355
Okay, if you think so. So we, we set up,

1079
00:44:35.355 --> 00:44:36.475
we set up this vector store.

1080
00:44:36.555 --> 00:44:38.595
I called it Send It later, and I gave it four documents.

1081
00:44:38.595 --> 00:44:40.315
It ingested it, it did the embeddings,

1082
00:44:40.455 --> 00:44:42.355
and it did all of that information for me.

1083
00:44:42.375 --> 00:44:43.675
So now I'm just gonna go ahead

1084
00:44:43.675 --> 00:44:45.555
and I'm gonna hook it up to my assistant here.

1085
00:44:45.895 --> 00:44:47.195
All right. So I have my Chris test out,

1086
00:44:47.195 --> 00:44:48.395
I hook it up to my assistant.

1087
00:44:48.655 --> 00:44:50.035
So now when I come back

1088
00:44:50.055 --> 00:44:52.315
and I ask it a question about it, I could say,

1089
00:44:52.315 --> 00:44:53.995
tell me about Send it later.

1090
00:44:54.415 --> 00:44:57.395
So now what it's gonna do is it's gonna do that rag for me.

1091
00:44:57.425 --> 00:44:58.875
It's gonna go and it's actually gonna

1092
00:44:58.875 --> 00:44:59.995
read through the Vector store.

1093
00:45:00.095 --> 00:45:02.075
And you can see now it's actually giving me the real

1094
00:45:02.075 --> 00:45:03.195
information that I gave it

1095
00:45:03.455 --> 00:45:05.715
and it's citing the documents that it came from.

1096
00:45:05.865 --> 00:45:07.875
Alright, so what's a real, what's a,

1097
00:45:08.075 --> 00:45:09.835
what's a test industry application of this?

1098
00:45:09.945 --> 00:45:11.675
Well, I can go ahead at this point,

1099
00:45:11.885 --> 00:45:13.275
let's just say I took a whole bunch

1100
00:45:13.275 --> 00:45:15.475
and I, I wanted to create a quiz about how to use it.

1101
00:45:15.535 --> 00:45:20.215
So I could say, create a five question, multiple choice quiz

1102
00:45:21.475 --> 00:45:24.225
about how to use send it later, right?

1103
00:45:24.485 --> 00:45:28.385
So this is, this is something I could not do against a

1104
00:45:28.505 --> 00:45:29.625
standard GPT-4 model.

1105
00:45:29.685 --> 00:45:31.225
It doesn't know about it. But now

1106
00:45:31.225 --> 00:45:34.745
that it has this vector store that OpenAI did completely

1107
00:45:34.845 --> 00:45:36.585
for me, I just gave it the documents.

1108
00:45:36.805 --> 00:45:39.265
Now I got a five question test that's actually on

1109
00:45:39.785 --> 00:45:41.945
documentation and material that are relevant.

1110
00:45:42.685 --> 00:45:47.685
1 44.

1111
00:45:51.735 --> 00:45:52.505
Well done Chris.

1112
00:45:52.575 --> 00:45:55.305
Wonderful. I wanna thank everyone for being here today.

1113
00:45:55.325 --> 00:45:57.345
We will share the recording with you via email.

1114
00:45:57.685 --> 00:45:59.385
Uh, there will be a survey that pops up.

1115
00:45:59.445 --> 00:46:00.465
Let us know what you thought.

1116
00:46:00.485 --> 00:46:02.545
If there's something specific you'd like to see next time,

1117
00:46:02.545 --> 00:46:03.745
just tell us, uh,

1118
00:46:03.745 --> 00:46:06.545
and you can find our webinars at testis.com/webinars.

1119
00:46:06.725 --> 00:46:08.025
So we will see you soon.

1120
00:46:08.025 --> 00:46:09.425
Thank you again for being here,

1121
00:46:09.425 --> 00:46:10.665
and thank you to Chris and Kyle.

1122
00:46:12.315 --> 00:46:13.655
Bye everyone. Bye.