muellerzr HF staff commited on
Commit
c43c604
β€’
1 Parent(s): b0d1496

Add notebook

Browse files
Files changed (1) hide show
  1. Accelerate.ipynb +612 -0
Accelerate.ipynb ADDED
@@ -0,0 +1,612 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "ff5c7a97-02d5-4aea-8bd5-59be5e62bf01",
6
+ "metadata": {},
7
+ "source": [
8
+ "---\n",
9
+ "title: \"Accelerate, Three Powerful Sublibraries for PyTorch\"\n",
10
+ "author: \"Zachary Mueller\"\n",
11
+ "format: \n",
12
+ " revealjs:\n",
13
+ " theme: moon\n",
14
+ " fig-format: png\n",
15
+ "---"
16
+ ]
17
+ },
18
+ {
19
+ "attachments": {},
20
+ "cell_type": "markdown",
21
+ "id": "f2333422",
22
+ "metadata": {},
23
+ "source": [
24
+ "## Test Gradio {background-iframe=\"https://muellerzr-accelerate-presentation.hf.space\"}"
25
+ ]
26
+ },
27
+ {
28
+ "cell_type": "markdown",
29
+ "id": "45e61402-f734-4500-8eb6-fcdd6f17a0d4",
30
+ "metadata": {},
31
+ "source": [
32
+ "## Who am I?\n",
33
+ "\n",
34
+ "- Zachary Mueller\n",
35
+ "- Deep Learning Software Engineer at πŸ€—\n",
36
+ "- API design geek"
37
+ ]
38
+ },
39
+ {
40
+ "cell_type": "markdown",
41
+ "id": "8f9864d2-5787-4af3-a08d-b372e5851a0f",
42
+ "metadata": {},
43
+ "source": [
44
+ "## What is πŸ€— Accelerate?"
45
+ ]
46
+ },
47
+ {
48
+ "cell_type": "markdown",
49
+ "id": "166b148a-e2f0-46b0-bc61-ac6e81da5ac5",
50
+ "metadata": {},
51
+ "source": [
52
+ "```{mermaid}\n",
53
+ "%%| fig-height: 6\n",
54
+ "graph LR\n",
55
+ " A{\"πŸ€— Accelerate#32;\"}\n",
56
+ " A --> B[\"Launching<br>Interface#32;\"]\n",
57
+ " A --> C[\"Training Library#32;\"]\n",
58
+ " A --> D[\"Big Model<br>Inference#32;\"]\n",
59
+ "```"
60
+ ]
61
+ },
62
+ {
63
+ "cell_type": "markdown",
64
+ "id": "84d6fd12-18cd-4448-9123-821133673b95",
65
+ "metadata": {},
66
+ "source": [
67
+ "# A Launching Interface\n",
68
+ "\n",
69
+ "Can't I just use `python do_the_thing.py`?"
70
+ ]
71
+ },
72
+ {
73
+ "cell_type": "markdown",
74
+ "id": "e5488645-daa3-4353-be9f-7af765a52666",
75
+ "metadata": {},
76
+ "source": [
77
+ "## A Launching Interface\n",
78
+ "\n",
79
+ "Launching scripts in different environments is complicated:"
80
+ ]
81
+ },
82
+ {
83
+ "cell_type": "markdown",
84
+ "id": "ce856633-1909-4f18-9610-e934194dd584",
85
+ "metadata": {},
86
+ "source": [
87
+ "- ```bash \n",
88
+ "python script.py\n",
89
+ "```\n",
90
+ "\n",
91
+ "- ```bash \n",
92
+ "torchrun --nnodes=1 --nproc_per_node=2 script.py\n",
93
+ "```\n",
94
+ "\n",
95
+ "- ```bash \n",
96
+ "deepspeed --num_gpus=2 script.py\n",
97
+ "```\n",
98
+ "\n",
99
+ "And more!"
100
+ ]
101
+ },
102
+ {
103
+ "cell_type": "markdown",
104
+ "id": "4e6414d0-f8f8-4bd2-b06f-fe7f848320f1",
105
+ "metadata": {
106
+ "tags": []
107
+ },
108
+ "source": [
109
+ "## A Launching Interface\n",
110
+ "\n",
111
+ "But it doesn't have to be:"
112
+ ]
113
+ },
114
+ {
115
+ "cell_type": "markdown",
116
+ "id": "5dfd30c0-7240-4a13-9b51-061c4762b37e",
117
+ "metadata": {},
118
+ "source": [
119
+ "```bash\n",
120
+ "accelerate launch script.py\n",
121
+ "```\n",
122
+ "\n",
123
+ "A single command to launch with `DeepSpeed`, Fully Sharded Data Parallelism, across single and multi CPUs and GPUs, and to train on TPUs[^1] too! \n",
124
+ "\n",
125
+ "[^1]: Without needing to modify your code and create a `_mp_fn`"
126
+ ]
127
+ },
128
+ {
129
+ "cell_type": "markdown",
130
+ "id": "c0760c9a-4307-4143-9adc-bf1ce2ed4460",
131
+ "metadata": {},
132
+ "source": [
133
+ "## A Launching Interface\n",
134
+ "\n",
135
+ "Generate a device-specific configuration through `accelerate config`\n",
136
+ "\n",
137
+ "![](CLI.gif)"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "markdown",
142
+ "id": "b0f1dc7a-ec43-48ba-b0a0-1331981733d0",
143
+ "metadata": {},
144
+ "source": [
145
+ "## A Launching Interface\n",
146
+ "\n",
147
+ "Or don't. `accelerate config` doesn't *have* to be done!\n",
148
+ "\n",
149
+ "```bash\n",
150
+ "torchrun --nnodes=1 --nproc_per_node=2 script.py\n",
151
+ "accelerate launch --multi_gpu --nproc_per_node=2 script.py\n",
152
+ "```\n",
153
+ "\n",
154
+ "A quick default configuration can be made too:\n",
155
+ "\n",
156
+ "```bash \n",
157
+ "accelerate config default\n",
158
+ "```"
159
+ ]
160
+ },
161
+ {
162
+ "cell_type": "markdown",
163
+ "id": "ff8d2c3d-5a08-4e5b-9896-1a0bcb77b5a6",
164
+ "metadata": {},
165
+ "source": [
166
+ "## A Launching Interface"
167
+ ]
168
+ },
169
+ {
170
+ "cell_type": "markdown",
171
+ "id": "a395af44-96f8-4f3a-ac47-3f65a6062d24",
172
+ "metadata": {},
173
+ "source": [
174
+ "With the `notebook_launcher` it's also possible to launch code directly from your Jupyter environment too!"
175
+ ]
176
+ },
177
+ {
178
+ "cell_type": "markdown",
179
+ "id": "99b14b46-6be5-4ef4-a3ee-82876b1d7802",
180
+ "metadata": {},
181
+ "source": [
182
+ "```python\n",
183
+ "from accelerate import notebook_launcher\n",
184
+ "notebook_launcher(\n",
185
+ " training_loop_function, \n",
186
+ " args, \n",
187
+ " num_processes=2\n",
188
+ ")\n",
189
+ "```"
190
+ ]
191
+ },
192
+ {
193
+ "cell_type": "markdown",
194
+ "id": "a50e27a7-4235-4695-bf99-59c0f3d0e451",
195
+ "metadata": {},
196
+ "source": [
197
+ "```python\n",
198
+ "Launching training on 2 GPUs.\n",
199
+ "epoch 0: 88.12\n",
200
+ "epoch 1: 91.73\n",
201
+ "epoch 2: 92.58\n",
202
+ "epoch 3: 93.90\n",
203
+ "epoch 4: 94.71\n",
204
+ "```"
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "markdown",
209
+ "id": "2db4e66d-d8b0-4f3f-9236-e86c1c3ea5d2",
210
+ "metadata": {},
211
+ "source": [
212
+ "# A Training Library\n",
213
+ "\n",
214
+ "Okay, will `accelerate launch` make `do_the_thing.py` use all my GPUs magically?"
215
+ ]
216
+ },
217
+ {
218
+ "cell_type": "markdown",
219
+ "id": "1cd093ef-d3ce-4ea4-89a1-be145fbe5cc0",
220
+ "metadata": {},
221
+ "source": [
222
+ "## A Training Library\n",
223
+ "\n",
224
+ "- Just showed that its possible using `accelerate launch` to *launch* a python script in various distributed environments\n",
225
+ "- This does *not* mean that the script will just \"use\" that code and still run on the new compute efficiently.\n",
226
+ "- Training on different computes often means *many* lines of code changed for each specific compute.\n",
227
+ "- πŸ€— `accelerate` solves this by ensuring the same code can be ran on a CPU or GPU, multiples, and on TPUs!"
228
+ ]
229
+ },
230
+ {
231
+ "cell_type": "markdown",
232
+ "id": "c0b12eb9-feeb-4040-a784-8e78966165be",
233
+ "metadata": {},
234
+ "source": [
235
+ "## A Training Library\n",
236
+ "\n",
237
+ "\n",
238
+ "```{.python}\n",
239
+ "for batch in dataloader:\n",
240
+ " optimizer.zero_grad()\n",
241
+ " inputs, targets = batch\n",
242
+ " inputs = inputs.to(device)\n",
243
+ " targets = targets.to(device)\n",
244
+ " outputs = model(inputs)\n",
245
+ " loss = loss_function(outputs, targets)\n",
246
+ " loss.backward()\n",
247
+ " optimizer.step()\n",
248
+ " scheduler.step()\n",
249
+ "```"
250
+ ]
251
+ },
252
+ {
253
+ "cell_type": "markdown",
254
+ "id": "bbb72602-f86f-42f6-ab44-05fbd0dfcecd",
255
+ "metadata": {},
256
+ "source": [
257
+ "## A Training Library {.smaller}"
258
+ ]
259
+ },
260
+ {
261
+ "cell_type": "markdown",
262
+ "id": "b5f90b84-fff5-4c14-bde7-d1efbcc37781",
263
+ "metadata": {},
264
+ "source": [
265
+ ":::: {.columns}\n",
266
+ "::: {.column width=\"43%\"}\n",
267
+ "<br><br><br>\n",
268
+ "```{.python code-line-numbers=\"5-6,9\"}\n",
269
+ "# For alignment purposes\n",
270
+ "for batch in dataloader:\n",
271
+ " optimizer.zero_grad()\n",
272
+ " inputs, targets = batch\n",
273
+ " inputs = inputs.to(device)\n",
274
+ " targets = targets.to(device)\n",
275
+ " outputs = model(inputs)\n",
276
+ " loss = loss_function(outputs, targets)\n",
277
+ " loss.backward()\n",
278
+ " optimizer.step()\n",
279
+ " scheduler.step()\n",
280
+ "```\n",
281
+ ":::\n",
282
+ "::: {.column width=\"57%\"}\n",
283
+ "```{.python code-line-numbers=\"1-7,12-13,16\"}\n",
284
+ "from accelerate import Accelerator\n",
285
+ "accelerator = Accelerator()\n",
286
+ "dataloader, model, optimizer scheduler = (\n",
287
+ " accelerator.prepare(\n",
288
+ " dataloader, model, optimizer, scheduler\n",
289
+ " )\n",
290
+ ")\n",
291
+ "\n",
292
+ "for batch in dataloader:\n",
293
+ " optimizer.zero_grad()\n",
294
+ " inputs, targets = batch\n",
295
+ " # inputs = inputs.to(device)\n",
296
+ " # targets = targets.to(device)\n",
297
+ " outputs = model(inputs)\n",
298
+ " loss = loss_function(outputs, targets)\n",
299
+ " accelerator.backward(loss) # loss.backward()\n",
300
+ " optimizer.step()\n",
301
+ " scheduler.step()\n",
302
+ "```\n",
303
+ ":::\n",
304
+ "\n",
305
+ "::::"
306
+ ]
307
+ },
308
+ {
309
+ "cell_type": "markdown",
310
+ "id": "60c90913-2542-4b1d-8121-b2228c8a2ef7",
311
+ "metadata": {
312
+ "tags": []
313
+ },
314
+ "source": [
315
+ "## A Training Library\n",
316
+ "\n",
317
+ "What all happened in `Accelerator.prepare`?\n",
318
+ "\n",
319
+ "::: {.incremental}\n",
320
+ "1. `Accelerator` looked at the configuration\n",
321
+ "2. The `dataloader` was converted into one that can dispatch each batch onto a seperate GPU\n",
322
+ "3. The `model` was wrapped with the appropriate DDP wrapper from either `torch.distributed` or `torch_xla`\n",
323
+ "4. The `optimizer` and `scheduler` were both converted into an `AcceleratedOptimizer` and `AcceleratedScheduler` which knows how to handle any distributed scenario\n",
324
+ ":::"
325
+ ]
326
+ },
327
+ {
328
+ "cell_type": "markdown",
329
+ "id": "59400a16-bce7-4a0a-8548-effd3c4c6cae",
330
+ "metadata": {},
331
+ "source": [
332
+ "## A Training Library, Mixed Precision\n",
333
+ "\n",
334
+ "πŸ€— `accelerate` also supports *automatic mixed precision*. \n",
335
+ "\n",
336
+ "Through a single flag to the `Accelerator` object when calling `accelerator.backward()` the mixed precision of your choosing (such as `bf16` or `fp16`) will be applied:\n",
337
+ "\n",
338
+ "```{.python code-line-numbers=\"2,9\"}\n",
339
+ "from accelerate import Accelerator\n",
340
+ "accelerator = Accelerator(mixed_precision=\"fp16\")\n",
341
+ "...\n",
342
+ "for batch in dataloader:\n",
343
+ " optimizer.zero_grad()\n",
344
+ " inputs, targets = batch\n",
345
+ " outputs = model(inputs)\n",
346
+ " loss = loss_function(outputs, targets)\n",
347
+ " accelerator.backward(loss)\n",
348
+ " optimizer.step()\n",
349
+ " scheduler.step()\n",
350
+ "```"
351
+ ]
352
+ },
353
+ {
354
+ "cell_type": "markdown",
355
+ "id": "fde7ae10-4fbd-4e25-8f5d-9d47c849966d",
356
+ "metadata": {},
357
+ "source": [
358
+ "## A Training Library, Gradient Accumulation\n",
359
+ "\n",
360
+ "Gradient accumulation in distributed setups often need extra care to ensure gradients are aligned when they need to be and the backward pass is computationally efficient.\n",
361
+ "\n",
362
+ "πŸ€— `accelerate` can just easily handle this for you:\n",
363
+ "\n",
364
+ "```{.python code-line-numbers=\"2,5\"}\n",
365
+ "from accelerate import Accelerator\n",
366
+ "accelerator = Accelerator(gradient_accumulation_steps=4)\n",
367
+ "...\n",
368
+ "for batch in dataloader:\n",
369
+ " with accelerator.accumulate(model)\n",
370
+ " optimizer.zero_grad()\n",
371
+ " inputs, targets = batch\n",
372
+ " outputs = model(inputs)\n",
373
+ " loss = loss_function(outputs, targets)\n",
374
+ " accelerator.backward(loss)\n",
375
+ " optimizer.step()\n",
376
+ " scheduler.step()\n",
377
+ "```"
378
+ ]
379
+ },
380
+ {
381
+ "cell_type": "markdown",
382
+ "id": "13f2d1e7-1e50-4a28-b7b4-55e09e15c176",
383
+ "metadata": {},
384
+ "source": [
385
+ "## A Training Library, Gradient Accumulation\n",
386
+ "\n",
387
+ "```{.python code-line-numbers=\"5-7,10,11,12,15\"}\n",
388
+ "ddp_model, dataloader = accelerator.prepare(model, dataloader)\n",
389
+ "\n",
390
+ "for index, batch in enumerate(dataloader):\n",
391
+ " inputs, targets = batch\n",
392
+ " if index != (len(dataloader)-1) or (index % 4) != 0:\n",
393
+ " # Gradients don't sync\n",
394
+ " with accelerator.no_sync(model):\n",
395
+ " outputs = ddp_model(inputs)\n",
396
+ " loss = loss_func(outputs, targets)\n",
397
+ " accelerator.backward(loss)\n",
398
+ " else:\n",
399
+ " # Gradients finally sync\n",
400
+ " outputs = ddp_model(inputs)\n",
401
+ " loss = loss_func(outputs)\n",
402
+ " accelerator.backward(loss)\n",
403
+ "```"
404
+ ]
405
+ },
406
+ {
407
+ "cell_type": "markdown",
408
+ "id": "93575b12-8000-4e8c-81fb-74af415fd76b",
409
+ "metadata": {},
410
+ "source": [
411
+ "# Big Model Inference\n",
412
+ "\n",
413
+ "Stable Diffusion taking the world by storm"
414
+ ]
415
+ },
416
+ {
417
+ "cell_type": "markdown",
418
+ "id": "b3026c5d-c051-4eac-a4be-af6559294225",
419
+ "metadata": {},
420
+ "source": [
421
+ "## Bigger Models == Higher Compute\n",
422
+ "\n",
423
+ "As more large models were being released, Hugging Face quickly realized there must be a way to continue our decentralization of Machine Learning and have the day-to-day programmer be able to leverage these big models.\n",
424
+ "\n",
425
+ "Born out of this effort by Sylvain Gugger: \n",
426
+ "\n",
427
+ "πŸ€— Accelerate: Big Model Inference."
428
+ ]
429
+ },
430
+ {
431
+ "cell_type": "markdown",
432
+ "id": "303925bf-ce22-4e71-a239-69eb419d54d3",
433
+ "metadata": {},
434
+ "source": [
435
+ "## The Basic Premise\n",
436
+ "\n",
437
+ "::: {.incremental}\n",
438
+ "* In PyTorch, there exists the `meta` device. \n",
439
+ "\n",
440
+ "* Super small footprint to load in huge models quickly by not loading in their weights immediatly.\n",
441
+ "\n",
442
+ "* As an input gets passed through each layer, we can load and unload *parts* of the PyTorch model quickly so that only a small portion of the big model is loaded in at a single time.\n",
443
+ "\n",
444
+ "* The end result? Stable Diffusion v1 can be ran on < 800mb of vRAM\n",
445
+ ":::"
446
+ ]
447
+ },
448
+ {
449
+ "cell_type": "markdown",
450
+ "id": "c6eef166-c64b-4229-9575-b197c3c03c59",
451
+ "metadata": {},
452
+ "source": [
453
+ "## The Code\n",
454
+ "\n",
455
+ "Generally you start with something like so:\n",
456
+ "\n",
457
+ "```python\n",
458
+ "import torch\n",
459
+ "\n",
460
+ "my_model = ModelClass(...)\n",
461
+ "state_dict = torch.load(checkpoint_file)\n",
462
+ "my_model.load_state_dict(state_dict)\n",
463
+ "```\n",
464
+ "\n",
465
+ "But this has issues:\n",
466
+ "\n",
467
+ "1. The full version of the model is loaded at `3`\n",
468
+ "2. Another version of the model is loaded into memory at `4`\n",
469
+ "\n",
470
+ "If a 6 *billion* parameter model is being loaded, each model class has a dictionary of 24GB so 48GB of vRAM is needed"
471
+ ]
472
+ },
473
+ {
474
+ "cell_type": "markdown",
475
+ "id": "53651488-7303-4aa3-83bb-ea7331938a01",
476
+ "metadata": {},
477
+ "source": [
478
+ "## Empty Model Weights\n",
479
+ "\n",
480
+ "We can fix step 1 by loading in an empty model skeleton at first:\n",
481
+ "\n",
482
+ "```{.python code-line-numbers=\"2,4-5\"}\n",
483
+ "from accelerate import init_empty_weights\n",
484
+ "\n",
485
+ "with init_empty_weights():\n",
486
+ " my_model = ModelClass(...)\n",
487
+ "state_dict = torch.load(checkpoint_file)\n",
488
+ "my_model.load_state_dict(state_dict)\n",
489
+ "```\n",
490
+ "\n",
491
+ "::: {.callout-important appearance=\"default\"}\n",
492
+ "## This code will not run\n",
493
+ "It is likely that just calling `my_model(x)` will fail as not all tensor operations are supported on the `meta` device.\n",
494
+ ":::"
495
+ ]
496
+ },
497
+ {
498
+ "cell_type": "markdown",
499
+ "id": "94a2b99a-b154-4cc3-93fd-431ba78ecfdf",
500
+ "metadata": {},
501
+ "source": [
502
+ "## Sharded Checkpoints - The Concept\n",
503
+ "\n",
504
+ "The next step is to have \"Sharded Checkpoints\" saved for your model.\n",
505
+ "\n",
506
+ "Basically smaller chunks of your model weights stored that can be brought in at any particular time. \n",
507
+ "\n",
508
+ "This reduces the amount of memory step 2 takes in since we can just load in a \"chunk\" of the model at a time, then swap it out for a new chunk through PyTorch hooks"
509
+ ]
510
+ },
511
+ {
512
+ "cell_type": "markdown",
513
+ "id": "11a55882-8bab-4d6b-b8ca-bfc886351156",
514
+ "metadata": {},
515
+ "source": [
516
+ "## Sharded Checkpoints - The Code\n",
517
+ "\n",
518
+ "```{.python code-line-numbers=\"1,6-8\"}\n",
519
+ "from accelerate import init_empty_weights, load_checkpoint_and_dispatch\n",
520
+ "\n",
521
+ "with init_empty_weights():\n",
522
+ " my_model = ModelClass(...)\n",
523
+ "\n",
524
+ "my_model = load_checkpoint_and_dispatch(\n",
525
+ " y_model, \"sharted-weights\", device_map=\"auto\"\n",
526
+ ")\n",
527
+ "```\n",
528
+ "`device_map=\"auto\"` will tell πŸ€— Accelerate that it should determine where to put each layer of the model:\n",
529
+ "\n",
530
+ "1. Maximum space on the GPU(s)\n",
531
+ "2. Maximum space on the CPU(s)\n",
532
+ "3. Utilize disk space through memory-mapped tensors"
533
+ ]
534
+ },
535
+ {
536
+ "cell_type": "markdown",
537
+ "id": "6796c0ac-77e4-4f88-b01a-25f428b29a87",
538
+ "metadata": {},
539
+ "source": [
540
+ "## Big Model Inference Put Together\n",
541
+ "\n",
542
+ "```{.python}\n",
543
+ "from accelerate import init_empty_weights, load_checkpoint_and_dispatch\n",
544
+ "\n",
545
+ "with init_empty_weights():\n",
546
+ " my_model = ModelClass(...)\n",
547
+ "\n",
548
+ "my_model = load_checkpoint_and_dispatch(\n",
549
+ " y_model, \"sharted-weights\", device_map=\"auto\"\n",
550
+ ")\n",
551
+ "my_model.eval()\n",
552
+ "\n",
553
+ "for batch in dataloader:\n",
554
+ " output = my_model(batch)\n",
555
+ "```"
556
+ ]
557
+ },
558
+ {
559
+ "cell_type": "markdown",
560
+ "id": "6f5122b2-f4fe-4237-aff2-d2a69f85b692",
561
+ "metadata": {},
562
+ "source": [
563
+ "# Thanks for Listening!"
564
+ ]
565
+ },
566
+ {
567
+ "cell_type": "markdown",
568
+ "id": "52f29e81-2e55-42d0-8e9d-83e692714909",
569
+ "metadata": {},
570
+ "source": [
571
+ "## Some Handy Resources\n",
572
+ "\n",
573
+ "- [πŸ€— Accelerate documentation](https://hf.co/docs/accelerate)\n",
574
+ "- [Launching distributed code](https://huggingface.co/docs/accelerate/basic_tutorials/launch)\n",
575
+ "- [Distributed code and Jupyter Notebooks](https://huggingface.co/docs/accelerate/basic_tutorials/notebook)\n",
576
+ "- [Migrating to πŸ€— Accelerate easily](https://huggingface.co/docs/accelerate/basic_tutorials/migration)\n",
577
+ "- [Big Model Inference tutorial](https://huggingface.co/docs/accelerate/usage_guides/big_modeling)\n",
578
+ "- [DeepSpeed and πŸ€— Accelerate](https://huggingface.co/docs/accelerate/usage_guides/deepspeed)\n",
579
+ "- [Fully Sharded Data Parallelism and πŸ€— Accelerate](https://huggingface.co/docs/accelerate/usage_guides/fsdp)"
580
+ ]
581
+ },
582
+ {
583
+ "cell_type": "code",
584
+ "execution_count": null,
585
+ "id": "b9f6a92d-1275-470b-aa27-ff2be450d616",
586
+ "metadata": {},
587
+ "outputs": [],
588
+ "source": []
589
+ }
590
+ ],
591
+ "metadata": {
592
+ "kernelspec": {
593
+ "display_name": "Python 3 (ipykernel)",
594
+ "language": "python",
595
+ "name": "python3"
596
+ },
597
+ "language_info": {
598
+ "codemirror_mode": {
599
+ "name": "ipython",
600
+ "version": 3
601
+ },
602
+ "file_extension": ".py",
603
+ "mimetype": "text/x-python",
604
+ "name": "python",
605
+ "nbconvert_exporter": "python",
606
+ "pygments_lexer": "ipython3",
607
+ "version": "3.8.10"
608
+ }
609
+ },
610
+ "nbformat": 4,
611
+ "nbformat_minor": 5
612
+ }