Isaak Carter Augustus commited on
Commit
f2da02c
1 Parent(s): 22daade

Upload 3 files

Browse files
first_working_creation_with_custom_encoders.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a018435767944a31ac38c17dfadd98f681af86227195db88c1763dcd3786ee9
3
+ size 2468592399
josie_architecture.txt ADDED
@@ -0,0 +1,1002 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ JOSIE(
2
+ (encoder): Encoder(
3
+ (modality_preprocessors): ModuleDict(
4
+ (vision): RGBDTPreprocessor(
5
+ (cls_token): tensor((1, 1, 768), requires_grad=False)
6
+
7
+ (rgbt_stem): PatchEmbedGeneric(
8
+ (proj): Sequential(
9
+ (0): PadIm2Video()
10
+ (1): Conv3d(3, 768, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
11
+ )
12
+ )
13
+ (pos_embedding_helper): SpatioTemporalPosEmbeddingHelper(
14
+ (pos_embed): tensor((1, 7681, 768), requires_grad=False)
15
+
16
+ )
17
+ )
18
+ (audio): AudioPreprocessor(
19
+ (cls_token): tensor((1, 1, 768), requires_grad=False)
20
+
21
+ (rgbt_stem): PatchEmbedGeneric(
22
+ (proj): Conv2d(1, 768, kernel_size=(16, 16), stride=(10, 10), bias=False)
23
+ (norm_layer): RMSNorm()
24
+ )
25
+ (pos_embedding_helper): SpatioTemporalPosEmbeddingHelper(
26
+ (pos_embed): tensor((1, 229, 768), requires_grad=False)
27
+
28
+ )
29
+ )
30
+ (depth): RGBDTPreprocessor(
31
+ (cls_token): tensor((1, 1, 384), requires_grad=False)
32
+
33
+ (depth_stem): PatchEmbedGeneric(
34
+ (proj): Conv2d(1, 384, kernel_size=(16, 16), stride=(16, 16), bias=False)
35
+ (norm_layer): RMSNorm()
36
+ )
37
+ (pos_embedding_helper): SpatioTemporalPosEmbeddingHelper(
38
+ (pos_embed): tensor((1, 197, 384), requires_grad=False)
39
+
40
+ )
41
+ )
42
+ (thermal): ThermalPreprocessor(
43
+ (cls_token): tensor((1, 1, 768), requires_grad=False)
44
+
45
+ (rgbt_stem): PatchEmbedGeneric(
46
+ (proj): Conv2d(1, 768, kernel_size=(16, 16), stride=(16, 16), bias=False)
47
+ (norm_layer): RMSNorm()
48
+ )
49
+ (pos_embedding_helper): SpatioTemporalPosEmbeddingHelper(
50
+ (pos_embed): tensor((1, 197, 768), requires_grad=False)
51
+
52
+ )
53
+ )
54
+ )
55
+ (modality_transformers): ModuleDict(
56
+ (vision): EncoderTransformer(
57
+ (pre_transformer_layer): Sequential(
58
+ (0): RMSNorm()
59
+ (1): EinOpsRearrange()
60
+ )
61
+ (post_transformer_layer): EinOpsRearrange()
62
+ (blocks): ModuleList(
63
+ (0): EncoderTransformerBlock(
64
+ (attn): MultiheadAttention(
65
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
66
+ )
67
+ (drop_path): Identity()
68
+ (norm1): RMSNorm()
69
+ (norm2): RMSNorm()
70
+ (mlp): MLP(
71
+ (w1): Linear(in_features=768, out_features=512, bias=False)
72
+ (w2): Linear(in_features=512, out_features=768, bias=False)
73
+ (w3): Linear(in_features=768, out_features=512, bias=False)
74
+ )
75
+ )
76
+ (1): EncoderTransformerBlock(
77
+ (attn): MultiheadAttention(
78
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
79
+ )
80
+ (drop_path): Identity()
81
+ (norm1): RMSNorm()
82
+ (norm2): RMSNorm()
83
+ (mlp): MLP(
84
+ (w1): Linear(in_features=768, out_features=512, bias=False)
85
+ (w2): Linear(in_features=512, out_features=768, bias=False)
86
+ (w3): Linear(in_features=768, out_features=512, bias=False)
87
+ )
88
+ )
89
+ (2): EncoderTransformerBlock(
90
+ (attn): MultiheadAttention(
91
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
92
+ )
93
+ (drop_path): Identity()
94
+ (norm1): RMSNorm()
95
+ (norm2): RMSNorm()
96
+ (mlp): MLP(
97
+ (w1): Linear(in_features=768, out_features=512, bias=False)
98
+ (w2): Linear(in_features=512, out_features=768, bias=False)
99
+ (w3): Linear(in_features=768, out_features=512, bias=False)
100
+ )
101
+ )
102
+ (3): EncoderTransformerBlock(
103
+ (attn): MultiheadAttention(
104
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
105
+ )
106
+ (drop_path): Identity()
107
+ (norm1): RMSNorm()
108
+ (norm2): RMSNorm()
109
+ (mlp): MLP(
110
+ (w1): Linear(in_features=768, out_features=512, bias=False)
111
+ (w2): Linear(in_features=512, out_features=768, bias=False)
112
+ (w3): Linear(in_features=768, out_features=512, bias=False)
113
+ )
114
+ )
115
+ (4): EncoderTransformerBlock(
116
+ (attn): MultiheadAttention(
117
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
118
+ )
119
+ (drop_path): Identity()
120
+ (norm1): RMSNorm()
121
+ (norm2): RMSNorm()
122
+ (mlp): MLP(
123
+ (w1): Linear(in_features=768, out_features=512, bias=False)
124
+ (w2): Linear(in_features=512, out_features=768, bias=False)
125
+ (w3): Linear(in_features=768, out_features=512, bias=False)
126
+ )
127
+ )
128
+ (5): EncoderTransformerBlock(
129
+ (attn): MultiheadAttention(
130
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
131
+ )
132
+ (drop_path): Identity()
133
+ (norm1): RMSNorm()
134
+ (norm2): RMSNorm()
135
+ (mlp): MLP(
136
+ (w1): Linear(in_features=768, out_features=512, bias=False)
137
+ (w2): Linear(in_features=512, out_features=768, bias=False)
138
+ (w3): Linear(in_features=768, out_features=512, bias=False)
139
+ )
140
+ )
141
+ (6): EncoderTransformerBlock(
142
+ (attn): MultiheadAttention(
143
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
144
+ )
145
+ (drop_path): Identity()
146
+ (norm1): RMSNorm()
147
+ (norm2): RMSNorm()
148
+ (mlp): MLP(
149
+ (w1): Linear(in_features=768, out_features=512, bias=False)
150
+ (w2): Linear(in_features=512, out_features=768, bias=False)
151
+ (w3): Linear(in_features=768, out_features=512, bias=False)
152
+ )
153
+ )
154
+ (7): EncoderTransformerBlock(
155
+ (attn): MultiheadAttention(
156
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
157
+ )
158
+ (drop_path): Identity()
159
+ (norm1): RMSNorm()
160
+ (norm2): RMSNorm()
161
+ (mlp): MLP(
162
+ (w1): Linear(in_features=768, out_features=512, bias=False)
163
+ (w2): Linear(in_features=512, out_features=768, bias=False)
164
+ (w3): Linear(in_features=768, out_features=512, bias=False)
165
+ )
166
+ )
167
+ (8): EncoderTransformerBlock(
168
+ (attn): MultiheadAttention(
169
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
170
+ )
171
+ (drop_path): Identity()
172
+ (norm1): RMSNorm()
173
+ (norm2): RMSNorm()
174
+ (mlp): MLP(
175
+ (w1): Linear(in_features=768, out_features=512, bias=False)
176
+ (w2): Linear(in_features=512, out_features=768, bias=False)
177
+ (w3): Linear(in_features=768, out_features=512, bias=False)
178
+ )
179
+ )
180
+ (9): EncoderTransformerBlock(
181
+ (attn): MultiheadAttention(
182
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
183
+ )
184
+ (drop_path): Identity()
185
+ (norm1): RMSNorm()
186
+ (norm2): RMSNorm()
187
+ (mlp): MLP(
188
+ (w1): Linear(in_features=768, out_features=512, bias=False)
189
+ (w2): Linear(in_features=512, out_features=768, bias=False)
190
+ (w3): Linear(in_features=768, out_features=512, bias=False)
191
+ )
192
+ )
193
+ (10): EncoderTransformerBlock(
194
+ (attn): MultiheadAttention(
195
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
196
+ )
197
+ (drop_path): Identity()
198
+ (norm1): RMSNorm()
199
+ (norm2): RMSNorm()
200
+ (mlp): MLP(
201
+ (w1): Linear(in_features=768, out_features=512, bias=False)
202
+ (w2): Linear(in_features=512, out_features=768, bias=False)
203
+ (w3): Linear(in_features=768, out_features=512, bias=False)
204
+ )
205
+ )
206
+ (11): EncoderTransformerBlock(
207
+ (attn): MultiheadAttention(
208
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
209
+ )
210
+ (drop_path): Identity()
211
+ (norm1): RMSNorm()
212
+ (norm2): RMSNorm()
213
+ (mlp): MLP(
214
+ (w1): Linear(in_features=768, out_features=512, bias=False)
215
+ (w2): Linear(in_features=512, out_features=768, bias=False)
216
+ (w3): Linear(in_features=768, out_features=512, bias=False)
217
+ )
218
+ )
219
+ )
220
+ )
221
+ (audio): EncoderTransformer(
222
+ (pre_transformer_layer): Sequential(
223
+ (0): RMSNorm()
224
+ (1): EinOpsRearrange()
225
+ )
226
+ (post_transformer_layer): EinOpsRearrange()
227
+ (blocks): ModuleList(
228
+ (0): EncoderTransformerBlock(
229
+ (attn): MultiheadAttention(
230
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
231
+ )
232
+ (drop_path): Identity()
233
+ (norm1): RMSNorm()
234
+ (norm2): RMSNorm()
235
+ (mlp): MLP(
236
+ (w1): Linear(in_features=768, out_features=512, bias=False)
237
+ (w2): Linear(in_features=512, out_features=768, bias=False)
238
+ (w3): Linear(in_features=768, out_features=512, bias=False)
239
+ )
240
+ )
241
+ (1): EncoderTransformerBlock(
242
+ (attn): MultiheadAttention(
243
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
244
+ )
245
+ (drop_path): DropPath(drop_prob=0.009)
246
+ (norm1): RMSNorm()
247
+ (norm2): RMSNorm()
248
+ (mlp): MLP(
249
+ (w1): Linear(in_features=768, out_features=512, bias=False)
250
+ (w2): Linear(in_features=512, out_features=768, bias=False)
251
+ (w3): Linear(in_features=768, out_features=512, bias=False)
252
+ )
253
+ )
254
+ (2): EncoderTransformerBlock(
255
+ (attn): MultiheadAttention(
256
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
257
+ )
258
+ (drop_path): DropPath(drop_prob=0.018)
259
+ (norm1): RMSNorm()
260
+ (norm2): RMSNorm()
261
+ (mlp): MLP(
262
+ (w1): Linear(in_features=768, out_features=512, bias=False)
263
+ (w2): Linear(in_features=512, out_features=768, bias=False)
264
+ (w3): Linear(in_features=768, out_features=512, bias=False)
265
+ )
266
+ )
267
+ (3): EncoderTransformerBlock(
268
+ (attn): MultiheadAttention(
269
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
270
+ )
271
+ (drop_path): DropPath(drop_prob=0.027)
272
+ (norm1): RMSNorm()
273
+ (norm2): RMSNorm()
274
+ (mlp): MLP(
275
+ (w1): Linear(in_features=768, out_features=512, bias=False)
276
+ (w2): Linear(in_features=512, out_features=768, bias=False)
277
+ (w3): Linear(in_features=768, out_features=512, bias=False)
278
+ )
279
+ )
280
+ (4): EncoderTransformerBlock(
281
+ (attn): MultiheadAttention(
282
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
283
+ )
284
+ (drop_path): DropPath(drop_prob=0.036)
285
+ (norm1): RMSNorm()
286
+ (norm2): RMSNorm()
287
+ (mlp): MLP(
288
+ (w1): Linear(in_features=768, out_features=512, bias=False)
289
+ (w2): Linear(in_features=512, out_features=768, bias=False)
290
+ (w3): Linear(in_features=768, out_features=512, bias=False)
291
+ )
292
+ )
293
+ (5): EncoderTransformerBlock(
294
+ (attn): MultiheadAttention(
295
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
296
+ )
297
+ (drop_path): DropPath(drop_prob=0.045)
298
+ (norm1): RMSNorm()
299
+ (norm2): RMSNorm()
300
+ (mlp): MLP(
301
+ (w1): Linear(in_features=768, out_features=512, bias=False)
302
+ (w2): Linear(in_features=512, out_features=768, bias=False)
303
+ (w3): Linear(in_features=768, out_features=512, bias=False)
304
+ )
305
+ )
306
+ (6): EncoderTransformerBlock(
307
+ (attn): MultiheadAttention(
308
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
309
+ )
310
+ (drop_path): DropPath(drop_prob=0.055)
311
+ (norm1): RMSNorm()
312
+ (norm2): RMSNorm()
313
+ (mlp): MLP(
314
+ (w1): Linear(in_features=768, out_features=512, bias=False)
315
+ (w2): Linear(in_features=512, out_features=768, bias=False)
316
+ (w3): Linear(in_features=768, out_features=512, bias=False)
317
+ )
318
+ )
319
+ (7): EncoderTransformerBlock(
320
+ (attn): MultiheadAttention(
321
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
322
+ )
323
+ (drop_path): DropPath(drop_prob=0.064)
324
+ (norm1): RMSNorm()
325
+ (norm2): RMSNorm()
326
+ (mlp): MLP(
327
+ (w1): Linear(in_features=768, out_features=512, bias=False)
328
+ (w2): Linear(in_features=512, out_features=768, bias=False)
329
+ (w3): Linear(in_features=768, out_features=512, bias=False)
330
+ )
331
+ )
332
+ (8): EncoderTransformerBlock(
333
+ (attn): MultiheadAttention(
334
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
335
+ )
336
+ (drop_path): DropPath(drop_prob=0.073)
337
+ (norm1): RMSNorm()
338
+ (norm2): RMSNorm()
339
+ (mlp): MLP(
340
+ (w1): Linear(in_features=768, out_features=512, bias=False)
341
+ (w2): Linear(in_features=512, out_features=768, bias=False)
342
+ (w3): Linear(in_features=768, out_features=512, bias=False)
343
+ )
344
+ )
345
+ (9): EncoderTransformerBlock(
346
+ (attn): MultiheadAttention(
347
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
348
+ )
349
+ (drop_path): DropPath(drop_prob=0.082)
350
+ (norm1): RMSNorm()
351
+ (norm2): RMSNorm()
352
+ (mlp): MLP(
353
+ (w1): Linear(in_features=768, out_features=512, bias=False)
354
+ (w2): Linear(in_features=512, out_features=768, bias=False)
355
+ (w3): Linear(in_features=768, out_features=512, bias=False)
356
+ )
357
+ )
358
+ (10): EncoderTransformerBlock(
359
+ (attn): MultiheadAttention(
360
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
361
+ )
362
+ (drop_path): DropPath(drop_prob=0.091)
363
+ (norm1): RMSNorm()
364
+ (norm2): RMSNorm()
365
+ (mlp): MLP(
366
+ (w1): Linear(in_features=768, out_features=512, bias=False)
367
+ (w2): Linear(in_features=512, out_features=768, bias=False)
368
+ (w3): Linear(in_features=768, out_features=512, bias=False)
369
+ )
370
+ )
371
+ (11): EncoderTransformerBlock(
372
+ (attn): MultiheadAttention(
373
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
374
+ )
375
+ (drop_path): DropPath(drop_prob=0.100)
376
+ (norm1): RMSNorm()
377
+ (norm2): RMSNorm()
378
+ (mlp): MLP(
379
+ (w1): Linear(in_features=768, out_features=512, bias=False)
380
+ (w2): Linear(in_features=512, out_features=768, bias=False)
381
+ (w3): Linear(in_features=768, out_features=512, bias=False)
382
+ )
383
+ )
384
+ )
385
+ )
386
+ (depth): EncoderTransformer(
387
+ (pre_transformer_layer): Sequential(
388
+ (0): RMSNorm()
389
+ (1): EinOpsRearrange()
390
+ )
391
+ (post_transformer_layer): EinOpsRearrange()
392
+ (blocks): ModuleList(
393
+ (0): EncoderTransformerBlock(
394
+ (attn): MultiheadAttention(
395
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=384, out_features=384, bias=True)
396
+ )
397
+ (drop_path): Identity()
398
+ (norm1): RMSNorm()
399
+ (norm2): RMSNorm()
400
+ (mlp): MLP(
401
+ (w1): Linear(in_features=384, out_features=256, bias=False)
402
+ (w2): Linear(in_features=256, out_features=384, bias=False)
403
+ (w3): Linear(in_features=384, out_features=256, bias=False)
404
+ )
405
+ )
406
+ (1): EncoderTransformerBlock(
407
+ (attn): MultiheadAttention(
408
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=384, out_features=384, bias=True)
409
+ )
410
+ (drop_path): Identity()
411
+ (norm1): RMSNorm()
412
+ (norm2): RMSNorm()
413
+ (mlp): MLP(
414
+ (w1): Linear(in_features=384, out_features=256, bias=False)
415
+ (w2): Linear(in_features=256, out_features=384, bias=False)
416
+ (w3): Linear(in_features=384, out_features=256, bias=False)
417
+ )
418
+ )
419
+ (2): EncoderTransformerBlock(
420
+ (attn): MultiheadAttention(
421
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=384, out_features=384, bias=True)
422
+ )
423
+ (drop_path): Identity()
424
+ (norm1): RMSNorm()
425
+ (norm2): RMSNorm()
426
+ (mlp): MLP(
427
+ (w1): Linear(in_features=384, out_features=256, bias=False)
428
+ (w2): Linear(in_features=256, out_features=384, bias=False)
429
+ (w3): Linear(in_features=384, out_features=256, bias=False)
430
+ )
431
+ )
432
+ (3): EncoderTransformerBlock(
433
+ (attn): MultiheadAttention(
434
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=384, out_features=384, bias=True)
435
+ )
436
+ (drop_path): Identity()
437
+ (norm1): RMSNorm()
438
+ (norm2): RMSNorm()
439
+ (mlp): MLP(
440
+ (w1): Linear(in_features=384, out_features=256, bias=False)
441
+ (w2): Linear(in_features=256, out_features=384, bias=False)
442
+ (w3): Linear(in_features=384, out_features=256, bias=False)
443
+ )
444
+ )
445
+ (4): EncoderTransformerBlock(
446
+ (attn): MultiheadAttention(
447
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=384, out_features=384, bias=True)
448
+ )
449
+ (drop_path): Identity()
450
+ (norm1): RMSNorm()
451
+ (norm2): RMSNorm()
452
+ (mlp): MLP(
453
+ (w1): Linear(in_features=384, out_features=256, bias=False)
454
+ (w2): Linear(in_features=256, out_features=384, bias=False)
455
+ (w3): Linear(in_features=384, out_features=256, bias=False)
456
+ )
457
+ )
458
+ (5): EncoderTransformerBlock(
459
+ (attn): MultiheadAttention(
460
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=384, out_features=384, bias=True)
461
+ )
462
+ (drop_path): Identity()
463
+ (norm1): RMSNorm()
464
+ (norm2): RMSNorm()
465
+ (mlp): MLP(
466
+ (w1): Linear(in_features=384, out_features=256, bias=False)
467
+ (w2): Linear(in_features=256, out_features=384, bias=False)
468
+ (w3): Linear(in_features=384, out_features=256, bias=False)
469
+ )
470
+ )
471
+ )
472
+ )
473
+ (thermal): EncoderTransformer(
474
+ (pre_transformer_layer): Sequential(
475
+ (0): RMSNorm()
476
+ (1): EinOpsRearrange()
477
+ )
478
+ (post_transformer_layer): EinOpsRearrange()
479
+ (blocks): ModuleList(
480
+ (0): EncoderTransformerBlock(
481
+ (attn): MultiheadAttention(
482
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
483
+ )
484
+ (drop_path): Identity()
485
+ (norm1): RMSNorm()
486
+ (norm2): RMSNorm()
487
+ (mlp): MLP(
488
+ (w1): Linear(in_features=768, out_features=512, bias=False)
489
+ (w2): Linear(in_features=512, out_features=768, bias=False)
490
+ (w3): Linear(in_features=768, out_features=512, bias=False)
491
+ )
492
+ )
493
+ (1): EncoderTransformerBlock(
494
+ (attn): MultiheadAttention(
495
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
496
+ )
497
+ (drop_path): Identity()
498
+ (norm1): RMSNorm()
499
+ (norm2): RMSNorm()
500
+ (mlp): MLP(
501
+ (w1): Linear(in_features=768, out_features=512, bias=False)
502
+ (w2): Linear(in_features=512, out_features=768, bias=False)
503
+ (w3): Linear(in_features=768, out_features=512, bias=False)
504
+ )
505
+ )
506
+ (2): EncoderTransformerBlock(
507
+ (attn): MultiheadAttention(
508
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
509
+ )
510
+ (drop_path): Identity()
511
+ (norm1): RMSNorm()
512
+ (norm2): RMSNorm()
513
+ (mlp): MLP(
514
+ (w1): Linear(in_features=768, out_features=512, bias=False)
515
+ (w2): Linear(in_features=512, out_features=768, bias=False)
516
+ (w3): Linear(in_features=768, out_features=512, bias=False)
517
+ )
518
+ )
519
+ (3): EncoderTransformerBlock(
520
+ (attn): MultiheadAttention(
521
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
522
+ )
523
+ (drop_path): Identity()
524
+ (norm1): RMSNorm()
525
+ (norm2): RMSNorm()
526
+ (mlp): MLP(
527
+ (w1): Linear(in_features=768, out_features=512, bias=False)
528
+ (w2): Linear(in_features=512, out_features=768, bias=False)
529
+ (w3): Linear(in_features=768, out_features=512, bias=False)
530
+ )
531
+ )
532
+ (4): EncoderTransformerBlock(
533
+ (attn): MultiheadAttention(
534
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
535
+ )
536
+ (drop_path): Identity()
537
+ (norm1): RMSNorm()
538
+ (norm2): RMSNorm()
539
+ (mlp): MLP(
540
+ (w1): Linear(in_features=768, out_features=512, bias=False)
541
+ (w2): Linear(in_features=512, out_features=768, bias=False)
542
+ (w3): Linear(in_features=768, out_features=512, bias=False)
543
+ )
544
+ )
545
+ (5): EncoderTransformerBlock(
546
+ (attn): MultiheadAttention(
547
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
548
+ )
549
+ (drop_path): Identity()
550
+ (norm1): RMSNorm()
551
+ (norm2): RMSNorm()
552
+ (mlp): MLP(
553
+ (w1): Linear(in_features=768, out_features=512, bias=False)
554
+ (w2): Linear(in_features=512, out_features=768, bias=False)
555
+ (w3): Linear(in_features=768, out_features=512, bias=False)
556
+ )
557
+ )
558
+ )
559
+ )
560
+ )
561
+ (modality_heads): ModuleDict(
562
+ (vision): Sequential(
563
+ (0): RMSNorm()
564
+ (1): SelectElement()
565
+ (2): Linear(in_features=768, out_features=1024, bias=False)
566
+ )
567
+ (audio): Sequential(
568
+ (0): RMSNorm()
569
+ (1): SelectElement()
570
+ (2): Linear(in_features=768, out_features=1024, bias=False)
571
+ )
572
+ (depth): Sequential(
573
+ (0): RMSNorm()
574
+ (1): SelectElement()
575
+ (2): Linear(in_features=384, out_features=1024, bias=False)
576
+ )
577
+ (thermal): Sequential(
578
+ (0): RMSNorm()
579
+ (1): SelectElement()
580
+ (2): Linear(in_features=768, out_features=1024, bias=False)
581
+ )
582
+ )
583
+ )
584
+ (reasoner): Qwen2ForCausalLM(
585
+ (model): Qwen2Model(
586
+ (embed_tokens): Embedding(151936, 896)
587
+ (layers): ModuleList(
588
+ (0): Qwen2DecoderLayer(
589
+ (self_attn): Qwen2Attention(
590
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
591
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
592
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
593
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
594
+ (rotary_emb): Qwen2RotaryEmbedding()
595
+ )
596
+ (mlp): Qwen2MLP(
597
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
598
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
599
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
600
+ (act_fn): SiLU()
601
+ )
602
+ (input_layernorm): Qwen2RMSNorm()
603
+ (post_attention_layernorm): Qwen2RMSNorm()
604
+ )
605
+ (1): Qwen2DecoderLayer(
606
+ (self_attn): Qwen2Attention(
607
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
608
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
609
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
610
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
611
+ (rotary_emb): Qwen2RotaryEmbedding()
612
+ )
613
+ (mlp): Qwen2MLP(
614
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
615
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
616
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
617
+ (act_fn): SiLU()
618
+ )
619
+ (input_layernorm): Qwen2RMSNorm()
620
+ (post_attention_layernorm): Qwen2RMSNorm()
621
+ )
622
+ (2): Qwen2DecoderLayer(
623
+ (self_attn): Qwen2Attention(
624
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
625
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
626
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
627
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
628
+ (rotary_emb): Qwen2RotaryEmbedding()
629
+ )
630
+ (mlp): Qwen2MLP(
631
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
632
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
633
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
634
+ (act_fn): SiLU()
635
+ )
636
+ (input_layernorm): Qwen2RMSNorm()
637
+ (post_attention_layernorm): Qwen2RMSNorm()
638
+ )
639
+ (3): Qwen2DecoderLayer(
640
+ (self_attn): Qwen2Attention(
641
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
642
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
643
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
644
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
645
+ (rotary_emb): Qwen2RotaryEmbedding()
646
+ )
647
+ (mlp): Qwen2MLP(
648
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
649
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
650
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
651
+ (act_fn): SiLU()
652
+ )
653
+ (input_layernorm): Qwen2RMSNorm()
654
+ (post_attention_layernorm): Qwen2RMSNorm()
655
+ )
656
+ (4): Qwen2DecoderLayer(
657
+ (self_attn): Qwen2Attention(
658
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
659
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
660
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
661
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
662
+ (rotary_emb): Qwen2RotaryEmbedding()
663
+ )
664
+ (mlp): Qwen2MLP(
665
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
666
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
667
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
668
+ (act_fn): SiLU()
669
+ )
670
+ (input_layernorm): Qwen2RMSNorm()
671
+ (post_attention_layernorm): Qwen2RMSNorm()
672
+ )
673
+ (5): Qwen2DecoderLayer(
674
+ (self_attn): Qwen2Attention(
675
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
676
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
677
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
678
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
679
+ (rotary_emb): Qwen2RotaryEmbedding()
680
+ )
681
+ (mlp): Qwen2MLP(
682
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
683
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
684
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
685
+ (act_fn): SiLU()
686
+ )
687
+ (input_layernorm): Qwen2RMSNorm()
688
+ (post_attention_layernorm): Qwen2RMSNorm()
689
+ )
690
+ (6): Qwen2DecoderLayer(
691
+ (self_attn): Qwen2Attention(
692
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
693
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
694
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
695
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
696
+ (rotary_emb): Qwen2RotaryEmbedding()
697
+ )
698
+ (mlp): Qwen2MLP(
699
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
700
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
701
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
702
+ (act_fn): SiLU()
703
+ )
704
+ (input_layernorm): Qwen2RMSNorm()
705
+ (post_attention_layernorm): Qwen2RMSNorm()
706
+ )
707
+ (7): Qwen2DecoderLayer(
708
+ (self_attn): Qwen2Attention(
709
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
710
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
711
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
712
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
713
+ (rotary_emb): Qwen2RotaryEmbedding()
714
+ )
715
+ (mlp): Qwen2MLP(
716
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
717
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
718
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
719
+ (act_fn): SiLU()
720
+ )
721
+ (input_layernorm): Qwen2RMSNorm()
722
+ (post_attention_layernorm): Qwen2RMSNorm()
723
+ )
724
+ (8): Qwen2DecoderLayer(
725
+ (self_attn): Qwen2Attention(
726
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
727
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
728
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
729
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
730
+ (rotary_emb): Qwen2RotaryEmbedding()
731
+ )
732
+ (mlp): Qwen2MLP(
733
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
734
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
735
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
736
+ (act_fn): SiLU()
737
+ )
738
+ (input_layernorm): Qwen2RMSNorm()
739
+ (post_attention_layernorm): Qwen2RMSNorm()
740
+ )
741
+ (9): Qwen2DecoderLayer(
742
+ (self_attn): Qwen2Attention(
743
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
744
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
745
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
746
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
747
+ (rotary_emb): Qwen2RotaryEmbedding()
748
+ )
749
+ (mlp): Qwen2MLP(
750
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
751
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
752
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
753
+ (act_fn): SiLU()
754
+ )
755
+ (input_layernorm): Qwen2RMSNorm()
756
+ (post_attention_layernorm): Qwen2RMSNorm()
757
+ )
758
+ (10): Qwen2DecoderLayer(
759
+ (self_attn): Qwen2Attention(
760
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
761
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
762
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
763
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
764
+ (rotary_emb): Qwen2RotaryEmbedding()
765
+ )
766
+ (mlp): Qwen2MLP(
767
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
768
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
769
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
770
+ (act_fn): SiLU()
771
+ )
772
+ (input_layernorm): Qwen2RMSNorm()
773
+ (post_attention_layernorm): Qwen2RMSNorm()
774
+ )
775
+ (11): Qwen2DecoderLayer(
776
+ (self_attn): Qwen2Attention(
777
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
778
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
779
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
780
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
781
+ (rotary_emb): Qwen2RotaryEmbedding()
782
+ )
783
+ (mlp): Qwen2MLP(
784
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
785
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
786
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
787
+ (act_fn): SiLU()
788
+ )
789
+ (input_layernorm): Qwen2RMSNorm()
790
+ (post_attention_layernorm): Qwen2RMSNorm()
791
+ )
792
+ (12): Qwen2DecoderLayer(
793
+ (self_attn): Qwen2Attention(
794
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
795
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
796
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
797
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
798
+ (rotary_emb): Qwen2RotaryEmbedding()
799
+ )
800
+ (mlp): Qwen2MLP(
801
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
802
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
803
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
804
+ (act_fn): SiLU()
805
+ )
806
+ (input_layernorm): Qwen2RMSNorm()
807
+ (post_attention_layernorm): Qwen2RMSNorm()
808
+ )
809
+ (13): Qwen2DecoderLayer(
810
+ (self_attn): Qwen2Attention(
811
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
812
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
813
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
814
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
815
+ (rotary_emb): Qwen2RotaryEmbedding()
816
+ )
817
+ (mlp): Qwen2MLP(
818
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
819
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
820
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
821
+ (act_fn): SiLU()
822
+ )
823
+ (input_layernorm): Qwen2RMSNorm()
824
+ (post_attention_layernorm): Qwen2RMSNorm()
825
+ )
826
+ (14): Qwen2DecoderLayer(
827
+ (self_attn): Qwen2Attention(
828
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
829
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
830
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
831
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
832
+ (rotary_emb): Qwen2RotaryEmbedding()
833
+ )
834
+ (mlp): Qwen2MLP(
835
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
836
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
837
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
838
+ (act_fn): SiLU()
839
+ )
840
+ (input_layernorm): Qwen2RMSNorm()
841
+ (post_attention_layernorm): Qwen2RMSNorm()
842
+ )
843
+ (15): Qwen2DecoderLayer(
844
+ (self_attn): Qwen2Attention(
845
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
846
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
847
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
848
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
849
+ (rotary_emb): Qwen2RotaryEmbedding()
850
+ )
851
+ (mlp): Qwen2MLP(
852
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
853
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
854
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
855
+ (act_fn): SiLU()
856
+ )
857
+ (input_layernorm): Qwen2RMSNorm()
858
+ (post_attention_layernorm): Qwen2RMSNorm()
859
+ )
860
+ (16): Qwen2DecoderLayer(
861
+ (self_attn): Qwen2Attention(
862
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
863
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
864
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
865
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
866
+ (rotary_emb): Qwen2RotaryEmbedding()
867
+ )
868
+ (mlp): Qwen2MLP(
869
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
870
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
871
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
872
+ (act_fn): SiLU()
873
+ )
874
+ (input_layernorm): Qwen2RMSNorm()
875
+ (post_attention_layernorm): Qwen2RMSNorm()
876
+ )
877
+ (17): Qwen2DecoderLayer(
878
+ (self_attn): Qwen2Attention(
879
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
880
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
881
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
882
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
883
+ (rotary_emb): Qwen2RotaryEmbedding()
884
+ )
885
+ (mlp): Qwen2MLP(
886
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
887
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
888
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
889
+ (act_fn): SiLU()
890
+ )
891
+ (input_layernorm): Qwen2RMSNorm()
892
+ (post_attention_layernorm): Qwen2RMSNorm()
893
+ )
894
+ (18): Qwen2DecoderLayer(
895
+ (self_attn): Qwen2Attention(
896
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
897
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
898
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
899
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
900
+ (rotary_emb): Qwen2RotaryEmbedding()
901
+ )
902
+ (mlp): Qwen2MLP(
903
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
904
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
905
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
906
+ (act_fn): SiLU()
907
+ )
908
+ (input_layernorm): Qwen2RMSNorm()
909
+ (post_attention_layernorm): Qwen2RMSNorm()
910
+ )
911
+ (19): Qwen2DecoderLayer(
912
+ (self_attn): Qwen2Attention(
913
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
914
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
915
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
916
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
917
+ (rotary_emb): Qwen2RotaryEmbedding()
918
+ )
919
+ (mlp): Qwen2MLP(
920
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
921
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
922
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
923
+ (act_fn): SiLU()
924
+ )
925
+ (input_layernorm): Qwen2RMSNorm()
926
+ (post_attention_layernorm): Qwen2RMSNorm()
927
+ )
928
+ (20): Qwen2DecoderLayer(
929
+ (self_attn): Qwen2Attention(
930
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
931
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
932
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
933
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
934
+ (rotary_emb): Qwen2RotaryEmbedding()
935
+ )
936
+ (mlp): Qwen2MLP(
937
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
938
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
939
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
940
+ (act_fn): SiLU()
941
+ )
942
+ (input_layernorm): Qwen2RMSNorm()
943
+ (post_attention_layernorm): Qwen2RMSNorm()
944
+ )
945
+ (21): Qwen2DecoderLayer(
946
+ (self_attn): Qwen2Attention(
947
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
948
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
949
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
950
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
951
+ (rotary_emb): Qwen2RotaryEmbedding()
952
+ )
953
+ (mlp): Qwen2MLP(
954
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
955
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
956
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
957
+ (act_fn): SiLU()
958
+ )
959
+ (input_layernorm): Qwen2RMSNorm()
960
+ (post_attention_layernorm): Qwen2RMSNorm()
961
+ )
962
+ (22): Qwen2DecoderLayer(
963
+ (self_attn): Qwen2Attention(
964
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
965
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
966
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
967
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
968
+ (rotary_emb): Qwen2RotaryEmbedding()
969
+ )
970
+ (mlp): Qwen2MLP(
971
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
972
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
973
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
974
+ (act_fn): SiLU()
975
+ )
976
+ (input_layernorm): Qwen2RMSNorm()
977
+ (post_attention_layernorm): Qwen2RMSNorm()
978
+ )
979
+ (23): Qwen2DecoderLayer(
980
+ (self_attn): Qwen2Attention(
981
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
982
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
983
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
984
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
985
+ (rotary_emb): Qwen2RotaryEmbedding()
986
+ )
987
+ (mlp): Qwen2MLP(
988
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
989
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
990
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
991
+ (act_fn): SiLU()
992
+ )
993
+ (input_layernorm): Qwen2RMSNorm()
994
+ (post_attention_layernorm): Qwen2RMSNorm()
995
+ )
996
+ )
997
+ (norm): Qwen2RMSNorm()
998
+ )
999
+ (lm_head): Linear(in_features=896, out_features=151936, bias=False)
1000
+ )
1001
+ (input_projetor): Linear(in_features=1024, out_features=896, bias=True)
1002
+ )
josie_dict.txt ADDED
@@ -0,0 +1,693 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Model's state_dict:
2
+ encoder.modality_preprocessors.vision.cls_token torch.Size([1, 1, 768])
3
+ encoder.modality_preprocessors.vision.rgbt_stem.proj.1.weight torch.Size([768, 3, 2, 14, 14])
4
+ encoder.modality_preprocessors.vision.pos_embedding_helper.pos_embed torch.Size([1, 7681, 768])
5
+ encoder.modality_preprocessors.audio.cls_token torch.Size([1, 1, 768])
6
+ encoder.modality_preprocessors.audio.rgbt_stem.proj.weight torch.Size([768, 1, 16, 16])
7
+ encoder.modality_preprocessors.audio.rgbt_stem.norm_layer.weight torch.Size([768])
8
+ encoder.modality_preprocessors.audio.pos_embedding_helper.pos_embed torch.Size([1, 229, 768])
9
+ encoder.modality_preprocessors.depth.cls_token torch.Size([1, 1, 384])
10
+ encoder.modality_preprocessors.depth.depth_stem.proj.weight torch.Size([384, 1, 16, 16])
11
+ encoder.modality_preprocessors.depth.depth_stem.norm_layer.weight torch.Size([384])
12
+ encoder.modality_preprocessors.depth.pos_embedding_helper.pos_embed torch.Size([1, 197, 384])
13
+ encoder.modality_preprocessors.thermal.cls_token torch.Size([1, 1, 768])
14
+ encoder.modality_preprocessors.thermal.rgbt_stem.proj.weight torch.Size([768, 1, 16, 16])
15
+ encoder.modality_preprocessors.thermal.rgbt_stem.norm_layer.weight torch.Size([768])
16
+ encoder.modality_preprocessors.thermal.pos_embedding_helper.pos_embed torch.Size([1, 197, 768])
17
+ encoder.modality_transformers.vision.pre_transformer_layer.0.weight torch.Size([768])
18
+ encoder.modality_transformers.vision.blocks.0.attn.in_proj_weight torch.Size([2304, 768])
19
+ encoder.modality_transformers.vision.blocks.0.attn.in_proj_bias torch.Size([2304])
20
+ encoder.modality_transformers.vision.blocks.0.attn.out_proj.weight torch.Size([768, 768])
21
+ encoder.modality_transformers.vision.blocks.0.attn.out_proj.bias torch.Size([768])
22
+ encoder.modality_transformers.vision.blocks.0.norm1.weight torch.Size([768])
23
+ encoder.modality_transformers.vision.blocks.0.norm2.weight torch.Size([768])
24
+ encoder.modality_transformers.vision.blocks.0.mlp.w1.weight torch.Size([512, 768])
25
+ encoder.modality_transformers.vision.blocks.0.mlp.w2.weight torch.Size([768, 512])
26
+ encoder.modality_transformers.vision.blocks.0.mlp.w3.weight torch.Size([512, 768])
27
+ encoder.modality_transformers.vision.blocks.1.attn.in_proj_weight torch.Size([2304, 768])
28
+ encoder.modality_transformers.vision.blocks.1.attn.in_proj_bias torch.Size([2304])
29
+ encoder.modality_transformers.vision.blocks.1.attn.out_proj.weight torch.Size([768, 768])
30
+ encoder.modality_transformers.vision.blocks.1.attn.out_proj.bias torch.Size([768])
31
+ encoder.modality_transformers.vision.blocks.1.norm1.weight torch.Size([768])
32
+ encoder.modality_transformers.vision.blocks.1.norm2.weight torch.Size([768])
33
+ encoder.modality_transformers.vision.blocks.1.mlp.w1.weight torch.Size([512, 768])
34
+ encoder.modality_transformers.vision.blocks.1.mlp.w2.weight torch.Size([768, 512])
35
+ encoder.modality_transformers.vision.blocks.1.mlp.w3.weight torch.Size([512, 768])
36
+ encoder.modality_transformers.vision.blocks.2.attn.in_proj_weight torch.Size([2304, 768])
37
+ encoder.modality_transformers.vision.blocks.2.attn.in_proj_bias torch.Size([2304])
38
+ encoder.modality_transformers.vision.blocks.2.attn.out_proj.weight torch.Size([768, 768])
39
+ encoder.modality_transformers.vision.blocks.2.attn.out_proj.bias torch.Size([768])
40
+ encoder.modality_transformers.vision.blocks.2.norm1.weight torch.Size([768])
41
+ encoder.modality_transformers.vision.blocks.2.norm2.weight torch.Size([768])
42
+ encoder.modality_transformers.vision.blocks.2.mlp.w1.weight torch.Size([512, 768])
43
+ encoder.modality_transformers.vision.blocks.2.mlp.w2.weight torch.Size([768, 512])
44
+ encoder.modality_transformers.vision.blocks.2.mlp.w3.weight torch.Size([512, 768])
45
+ encoder.modality_transformers.vision.blocks.3.attn.in_proj_weight torch.Size([2304, 768])
46
+ encoder.modality_transformers.vision.blocks.3.attn.in_proj_bias torch.Size([2304])
47
+ encoder.modality_transformers.vision.blocks.3.attn.out_proj.weight torch.Size([768, 768])
48
+ encoder.modality_transformers.vision.blocks.3.attn.out_proj.bias torch.Size([768])
49
+ encoder.modality_transformers.vision.blocks.3.norm1.weight torch.Size([768])
50
+ encoder.modality_transformers.vision.blocks.3.norm2.weight torch.Size([768])
51
+ encoder.modality_transformers.vision.blocks.3.mlp.w1.weight torch.Size([512, 768])
52
+ encoder.modality_transformers.vision.blocks.3.mlp.w2.weight torch.Size([768, 512])
53
+ encoder.modality_transformers.vision.blocks.3.mlp.w3.weight torch.Size([512, 768])
54
+ encoder.modality_transformers.vision.blocks.4.attn.in_proj_weight torch.Size([2304, 768])
55
+ encoder.modality_transformers.vision.blocks.4.attn.in_proj_bias torch.Size([2304])
56
+ encoder.modality_transformers.vision.blocks.4.attn.out_proj.weight torch.Size([768, 768])
57
+ encoder.modality_transformers.vision.blocks.4.attn.out_proj.bias torch.Size([768])
58
+ encoder.modality_transformers.vision.blocks.4.norm1.weight torch.Size([768])
59
+ encoder.modality_transformers.vision.blocks.4.norm2.weight torch.Size([768])
60
+ encoder.modality_transformers.vision.blocks.4.mlp.w1.weight torch.Size([512, 768])
61
+ encoder.modality_transformers.vision.blocks.4.mlp.w2.weight torch.Size([768, 512])
62
+ encoder.modality_transformers.vision.blocks.4.mlp.w3.weight torch.Size([512, 768])
63
+ encoder.modality_transformers.vision.blocks.5.attn.in_proj_weight torch.Size([2304, 768])
64
+ encoder.modality_transformers.vision.blocks.5.attn.in_proj_bias torch.Size([2304])
65
+ encoder.modality_transformers.vision.blocks.5.attn.out_proj.weight torch.Size([768, 768])
66
+ encoder.modality_transformers.vision.blocks.5.attn.out_proj.bias torch.Size([768])
67
+ encoder.modality_transformers.vision.blocks.5.norm1.weight torch.Size([768])
68
+ encoder.modality_transformers.vision.blocks.5.norm2.weight torch.Size([768])
69
+ encoder.modality_transformers.vision.blocks.5.mlp.w1.weight torch.Size([512, 768])
70
+ encoder.modality_transformers.vision.blocks.5.mlp.w2.weight torch.Size([768, 512])
71
+ encoder.modality_transformers.vision.blocks.5.mlp.w3.weight torch.Size([512, 768])
72
+ encoder.modality_transformers.vision.blocks.6.attn.in_proj_weight torch.Size([2304, 768])
73
+ encoder.modality_transformers.vision.blocks.6.attn.in_proj_bias torch.Size([2304])
74
+ encoder.modality_transformers.vision.blocks.6.attn.out_proj.weight torch.Size([768, 768])
75
+ encoder.modality_transformers.vision.blocks.6.attn.out_proj.bias torch.Size([768])
76
+ encoder.modality_transformers.vision.blocks.6.norm1.weight torch.Size([768])
77
+ encoder.modality_transformers.vision.blocks.6.norm2.weight torch.Size([768])
78
+ encoder.modality_transformers.vision.blocks.6.mlp.w1.weight torch.Size([512, 768])
79
+ encoder.modality_transformers.vision.blocks.6.mlp.w2.weight torch.Size([768, 512])
80
+ encoder.modality_transformers.vision.blocks.6.mlp.w3.weight torch.Size([512, 768])
81
+ encoder.modality_transformers.vision.blocks.7.attn.in_proj_weight torch.Size([2304, 768])
82
+ encoder.modality_transformers.vision.blocks.7.attn.in_proj_bias torch.Size([2304])
83
+ encoder.modality_transformers.vision.blocks.7.attn.out_proj.weight torch.Size([768, 768])
84
+ encoder.modality_transformers.vision.blocks.7.attn.out_proj.bias torch.Size([768])
85
+ encoder.modality_transformers.vision.blocks.7.norm1.weight torch.Size([768])
86
+ encoder.modality_transformers.vision.blocks.7.norm2.weight torch.Size([768])
87
+ encoder.modality_transformers.vision.blocks.7.mlp.w1.weight torch.Size([512, 768])
88
+ encoder.modality_transformers.vision.blocks.7.mlp.w2.weight torch.Size([768, 512])
89
+ encoder.modality_transformers.vision.blocks.7.mlp.w3.weight torch.Size([512, 768])
90
+ encoder.modality_transformers.vision.blocks.8.attn.in_proj_weight torch.Size([2304, 768])
91
+ encoder.modality_transformers.vision.blocks.8.attn.in_proj_bias torch.Size([2304])
92
+ encoder.modality_transformers.vision.blocks.8.attn.out_proj.weight torch.Size([768, 768])
93
+ encoder.modality_transformers.vision.blocks.8.attn.out_proj.bias torch.Size([768])
94
+ encoder.modality_transformers.vision.blocks.8.norm1.weight torch.Size([768])
95
+ encoder.modality_transformers.vision.blocks.8.norm2.weight torch.Size([768])
96
+ encoder.modality_transformers.vision.blocks.8.mlp.w1.weight torch.Size([512, 768])
97
+ encoder.modality_transformers.vision.blocks.8.mlp.w2.weight torch.Size([768, 512])
98
+ encoder.modality_transformers.vision.blocks.8.mlp.w3.weight torch.Size([512, 768])
99
+ encoder.modality_transformers.vision.blocks.9.attn.in_proj_weight torch.Size([2304, 768])
100
+ encoder.modality_transformers.vision.blocks.9.attn.in_proj_bias torch.Size([2304])
101
+ encoder.modality_transformers.vision.blocks.9.attn.out_proj.weight torch.Size([768, 768])
102
+ encoder.modality_transformers.vision.blocks.9.attn.out_proj.bias torch.Size([768])
103
+ encoder.modality_transformers.vision.blocks.9.norm1.weight torch.Size([768])
104
+ encoder.modality_transformers.vision.blocks.9.norm2.weight torch.Size([768])
105
+ encoder.modality_transformers.vision.blocks.9.mlp.w1.weight torch.Size([512, 768])
106
+ encoder.modality_transformers.vision.blocks.9.mlp.w2.weight torch.Size([768, 512])
107
+ encoder.modality_transformers.vision.blocks.9.mlp.w3.weight torch.Size([512, 768])
108
+ encoder.modality_transformers.vision.blocks.10.attn.in_proj_weight torch.Size([2304, 768])
109
+ encoder.modality_transformers.vision.blocks.10.attn.in_proj_bias torch.Size([2304])
110
+ encoder.modality_transformers.vision.blocks.10.attn.out_proj.weight torch.Size([768, 768])
111
+ encoder.modality_transformers.vision.blocks.10.attn.out_proj.bias torch.Size([768])
112
+ encoder.modality_transformers.vision.blocks.10.norm1.weight torch.Size([768])
113
+ encoder.modality_transformers.vision.blocks.10.norm2.weight torch.Size([768])
114
+ encoder.modality_transformers.vision.blocks.10.mlp.w1.weight torch.Size([512, 768])
115
+ encoder.modality_transformers.vision.blocks.10.mlp.w2.weight torch.Size([768, 512])
116
+ encoder.modality_transformers.vision.blocks.10.mlp.w3.weight torch.Size([512, 768])
117
+ encoder.modality_transformers.vision.blocks.11.attn.in_proj_weight torch.Size([2304, 768])
118
+ encoder.modality_transformers.vision.blocks.11.attn.in_proj_bias torch.Size([2304])
119
+ encoder.modality_transformers.vision.blocks.11.attn.out_proj.weight torch.Size([768, 768])
120
+ encoder.modality_transformers.vision.blocks.11.attn.out_proj.bias torch.Size([768])
121
+ encoder.modality_transformers.vision.blocks.11.norm1.weight torch.Size([768])
122
+ encoder.modality_transformers.vision.blocks.11.norm2.weight torch.Size([768])
123
+ encoder.modality_transformers.vision.blocks.11.mlp.w1.weight torch.Size([512, 768])
124
+ encoder.modality_transformers.vision.blocks.11.mlp.w2.weight torch.Size([768, 512])
125
+ encoder.modality_transformers.vision.blocks.11.mlp.w3.weight torch.Size([512, 768])
126
+ encoder.modality_transformers.audio.pre_transformer_layer.0.weight torch.Size([768])
127
+ encoder.modality_transformers.audio.blocks.0.attn.in_proj_weight torch.Size([2304, 768])
128
+ encoder.modality_transformers.audio.blocks.0.attn.in_proj_bias torch.Size([2304])
129
+ encoder.modality_transformers.audio.blocks.0.attn.bias_k torch.Size([1, 1, 768])
130
+ encoder.modality_transformers.audio.blocks.0.attn.bias_v torch.Size([1, 1, 768])
131
+ encoder.modality_transformers.audio.blocks.0.attn.out_proj.weight torch.Size([768, 768])
132
+ encoder.modality_transformers.audio.blocks.0.attn.out_proj.bias torch.Size([768])
133
+ encoder.modality_transformers.audio.blocks.0.norm1.weight torch.Size([768])
134
+ encoder.modality_transformers.audio.blocks.0.norm2.weight torch.Size([768])
135
+ encoder.modality_transformers.audio.blocks.0.mlp.w1.weight torch.Size([512, 768])
136
+ encoder.modality_transformers.audio.blocks.0.mlp.w2.weight torch.Size([768, 512])
137
+ encoder.modality_transformers.audio.blocks.0.mlp.w3.weight torch.Size([512, 768])
138
+ encoder.modality_transformers.audio.blocks.1.attn.in_proj_weight torch.Size([2304, 768])
139
+ encoder.modality_transformers.audio.blocks.1.attn.in_proj_bias torch.Size([2304])
140
+ encoder.modality_transformers.audio.blocks.1.attn.bias_k torch.Size([1, 1, 768])
141
+ encoder.modality_transformers.audio.blocks.1.attn.bias_v torch.Size([1, 1, 768])
142
+ encoder.modality_transformers.audio.blocks.1.attn.out_proj.weight torch.Size([768, 768])
143
+ encoder.modality_transformers.audio.blocks.1.attn.out_proj.bias torch.Size([768])
144
+ encoder.modality_transformers.audio.blocks.1.norm1.weight torch.Size([768])
145
+ encoder.modality_transformers.audio.blocks.1.norm2.weight torch.Size([768])
146
+ encoder.modality_transformers.audio.blocks.1.mlp.w1.weight torch.Size([512, 768])
147
+ encoder.modality_transformers.audio.blocks.1.mlp.w2.weight torch.Size([768, 512])
148
+ encoder.modality_transformers.audio.blocks.1.mlp.w3.weight torch.Size([512, 768])
149
+ encoder.modality_transformers.audio.blocks.2.attn.in_proj_weight torch.Size([2304, 768])
150
+ encoder.modality_transformers.audio.blocks.2.attn.in_proj_bias torch.Size([2304])
151
+ encoder.modality_transformers.audio.blocks.2.attn.bias_k torch.Size([1, 1, 768])
152
+ encoder.modality_transformers.audio.blocks.2.attn.bias_v torch.Size([1, 1, 768])
153
+ encoder.modality_transformers.audio.blocks.2.attn.out_proj.weight torch.Size([768, 768])
154
+ encoder.modality_transformers.audio.blocks.2.attn.out_proj.bias torch.Size([768])
155
+ encoder.modality_transformers.audio.blocks.2.norm1.weight torch.Size([768])
156
+ encoder.modality_transformers.audio.blocks.2.norm2.weight torch.Size([768])
157
+ encoder.modality_transformers.audio.blocks.2.mlp.w1.weight torch.Size([512, 768])
158
+ encoder.modality_transformers.audio.blocks.2.mlp.w2.weight torch.Size([768, 512])
159
+ encoder.modality_transformers.audio.blocks.2.mlp.w3.weight torch.Size([512, 768])
160
+ encoder.modality_transformers.audio.blocks.3.attn.in_proj_weight torch.Size([2304, 768])
161
+ encoder.modality_transformers.audio.blocks.3.attn.in_proj_bias torch.Size([2304])
162
+ encoder.modality_transformers.audio.blocks.3.attn.bias_k torch.Size([1, 1, 768])
163
+ encoder.modality_transformers.audio.blocks.3.attn.bias_v torch.Size([1, 1, 768])
164
+ encoder.modality_transformers.audio.blocks.3.attn.out_proj.weight torch.Size([768, 768])
165
+ encoder.modality_transformers.audio.blocks.3.attn.out_proj.bias torch.Size([768])
166
+ encoder.modality_transformers.audio.blocks.3.norm1.weight torch.Size([768])
167
+ encoder.modality_transformers.audio.blocks.3.norm2.weight torch.Size([768])
168
+ encoder.modality_transformers.audio.blocks.3.mlp.w1.weight torch.Size([512, 768])
169
+ encoder.modality_transformers.audio.blocks.3.mlp.w2.weight torch.Size([768, 512])
170
+ encoder.modality_transformers.audio.blocks.3.mlp.w3.weight torch.Size([512, 768])
171
+ encoder.modality_transformers.audio.blocks.4.attn.in_proj_weight torch.Size([2304, 768])
172
+ encoder.modality_transformers.audio.blocks.4.attn.in_proj_bias torch.Size([2304])
173
+ encoder.modality_transformers.audio.blocks.4.attn.bias_k torch.Size([1, 1, 768])
174
+ encoder.modality_transformers.audio.blocks.4.attn.bias_v torch.Size([1, 1, 768])
175
+ encoder.modality_transformers.audio.blocks.4.attn.out_proj.weight torch.Size([768, 768])
176
+ encoder.modality_transformers.audio.blocks.4.attn.out_proj.bias torch.Size([768])
177
+ encoder.modality_transformers.audio.blocks.4.norm1.weight torch.Size([768])
178
+ encoder.modality_transformers.audio.blocks.4.norm2.weight torch.Size([768])
179
+ encoder.modality_transformers.audio.blocks.4.mlp.w1.weight torch.Size([512, 768])
180
+ encoder.modality_transformers.audio.blocks.4.mlp.w2.weight torch.Size([768, 512])
181
+ encoder.modality_transformers.audio.blocks.4.mlp.w3.weight torch.Size([512, 768])
182
+ encoder.modality_transformers.audio.blocks.5.attn.in_proj_weight torch.Size([2304, 768])
183
+ encoder.modality_transformers.audio.blocks.5.attn.in_proj_bias torch.Size([2304])
184
+ encoder.modality_transformers.audio.blocks.5.attn.bias_k torch.Size([1, 1, 768])
185
+ encoder.modality_transformers.audio.blocks.5.attn.bias_v torch.Size([1, 1, 768])
186
+ encoder.modality_transformers.audio.blocks.5.attn.out_proj.weight torch.Size([768, 768])
187
+ encoder.modality_transformers.audio.blocks.5.attn.out_proj.bias torch.Size([768])
188
+ encoder.modality_transformers.audio.blocks.5.norm1.weight torch.Size([768])
189
+ encoder.modality_transformers.audio.blocks.5.norm2.weight torch.Size([768])
190
+ encoder.modality_transformers.audio.blocks.5.mlp.w1.weight torch.Size([512, 768])
191
+ encoder.modality_transformers.audio.blocks.5.mlp.w2.weight torch.Size([768, 512])
192
+ encoder.modality_transformers.audio.blocks.5.mlp.w3.weight torch.Size([512, 768])
193
+ encoder.modality_transformers.audio.blocks.6.attn.in_proj_weight torch.Size([2304, 768])
194
+ encoder.modality_transformers.audio.blocks.6.attn.in_proj_bias torch.Size([2304])
195
+ encoder.modality_transformers.audio.blocks.6.attn.bias_k torch.Size([1, 1, 768])
196
+ encoder.modality_transformers.audio.blocks.6.attn.bias_v torch.Size([1, 1, 768])
197
+ encoder.modality_transformers.audio.blocks.6.attn.out_proj.weight torch.Size([768, 768])
198
+ encoder.modality_transformers.audio.blocks.6.attn.out_proj.bias torch.Size([768])
199
+ encoder.modality_transformers.audio.blocks.6.norm1.weight torch.Size([768])
200
+ encoder.modality_transformers.audio.blocks.6.norm2.weight torch.Size([768])
201
+ encoder.modality_transformers.audio.blocks.6.mlp.w1.weight torch.Size([512, 768])
202
+ encoder.modality_transformers.audio.blocks.6.mlp.w2.weight torch.Size([768, 512])
203
+ encoder.modality_transformers.audio.blocks.6.mlp.w3.weight torch.Size([512, 768])
204
+ encoder.modality_transformers.audio.blocks.7.attn.in_proj_weight torch.Size([2304, 768])
205
+ encoder.modality_transformers.audio.blocks.7.attn.in_proj_bias torch.Size([2304])
206
+ encoder.modality_transformers.audio.blocks.7.attn.bias_k torch.Size([1, 1, 768])
207
+ encoder.modality_transformers.audio.blocks.7.attn.bias_v torch.Size([1, 1, 768])
208
+ encoder.modality_transformers.audio.blocks.7.attn.out_proj.weight torch.Size([768, 768])
209
+ encoder.modality_transformers.audio.blocks.7.attn.out_proj.bias torch.Size([768])
210
+ encoder.modality_transformers.audio.blocks.7.norm1.weight torch.Size([768])
211
+ encoder.modality_transformers.audio.blocks.7.norm2.weight torch.Size([768])
212
+ encoder.modality_transformers.audio.blocks.7.mlp.w1.weight torch.Size([512, 768])
213
+ encoder.modality_transformers.audio.blocks.7.mlp.w2.weight torch.Size([768, 512])
214
+ encoder.modality_transformers.audio.blocks.7.mlp.w3.weight torch.Size([512, 768])
215
+ encoder.modality_transformers.audio.blocks.8.attn.in_proj_weight torch.Size([2304, 768])
216
+ encoder.modality_transformers.audio.blocks.8.attn.in_proj_bias torch.Size([2304])
217
+ encoder.modality_transformers.audio.blocks.8.attn.bias_k torch.Size([1, 1, 768])
218
+ encoder.modality_transformers.audio.blocks.8.attn.bias_v torch.Size([1, 1, 768])
219
+ encoder.modality_transformers.audio.blocks.8.attn.out_proj.weight torch.Size([768, 768])
220
+ encoder.modality_transformers.audio.blocks.8.attn.out_proj.bias torch.Size([768])
221
+ encoder.modality_transformers.audio.blocks.8.norm1.weight torch.Size([768])
222
+ encoder.modality_transformers.audio.blocks.8.norm2.weight torch.Size([768])
223
+ encoder.modality_transformers.audio.blocks.8.mlp.w1.weight torch.Size([512, 768])
224
+ encoder.modality_transformers.audio.blocks.8.mlp.w2.weight torch.Size([768, 512])
225
+ encoder.modality_transformers.audio.blocks.8.mlp.w3.weight torch.Size([512, 768])
226
+ encoder.modality_transformers.audio.blocks.9.attn.in_proj_weight torch.Size([2304, 768])
227
+ encoder.modality_transformers.audio.blocks.9.attn.in_proj_bias torch.Size([2304])
228
+ encoder.modality_transformers.audio.blocks.9.attn.bias_k torch.Size([1, 1, 768])
229
+ encoder.modality_transformers.audio.blocks.9.attn.bias_v torch.Size([1, 1, 768])
230
+ encoder.modality_transformers.audio.blocks.9.attn.out_proj.weight torch.Size([768, 768])
231
+ encoder.modality_transformers.audio.blocks.9.attn.out_proj.bias torch.Size([768])
232
+ encoder.modality_transformers.audio.blocks.9.norm1.weight torch.Size([768])
233
+ encoder.modality_transformers.audio.blocks.9.norm2.weight torch.Size([768])
234
+ encoder.modality_transformers.audio.blocks.9.mlp.w1.weight torch.Size([512, 768])
235
+ encoder.modality_transformers.audio.blocks.9.mlp.w2.weight torch.Size([768, 512])
236
+ encoder.modality_transformers.audio.blocks.9.mlp.w3.weight torch.Size([512, 768])
237
+ encoder.modality_transformers.audio.blocks.10.attn.in_proj_weight torch.Size([2304, 768])
238
+ encoder.modality_transformers.audio.blocks.10.attn.in_proj_bias torch.Size([2304])
239
+ encoder.modality_transformers.audio.blocks.10.attn.bias_k torch.Size([1, 1, 768])
240
+ encoder.modality_transformers.audio.blocks.10.attn.bias_v torch.Size([1, 1, 768])
241
+ encoder.modality_transformers.audio.blocks.10.attn.out_proj.weight torch.Size([768, 768])
242
+ encoder.modality_transformers.audio.blocks.10.attn.out_proj.bias torch.Size([768])
243
+ encoder.modality_transformers.audio.blocks.10.norm1.weight torch.Size([768])
244
+ encoder.modality_transformers.audio.blocks.10.norm2.weight torch.Size([768])
245
+ encoder.modality_transformers.audio.blocks.10.mlp.w1.weight torch.Size([512, 768])
246
+ encoder.modality_transformers.audio.blocks.10.mlp.w2.weight torch.Size([768, 512])
247
+ encoder.modality_transformers.audio.blocks.10.mlp.w3.weight torch.Size([512, 768])
248
+ encoder.modality_transformers.audio.blocks.11.attn.in_proj_weight torch.Size([2304, 768])
249
+ encoder.modality_transformers.audio.blocks.11.attn.in_proj_bias torch.Size([2304])
250
+ encoder.modality_transformers.audio.blocks.11.attn.bias_k torch.Size([1, 1, 768])
251
+ encoder.modality_transformers.audio.blocks.11.attn.bias_v torch.Size([1, 1, 768])
252
+ encoder.modality_transformers.audio.blocks.11.attn.out_proj.weight torch.Size([768, 768])
253
+ encoder.modality_transformers.audio.blocks.11.attn.out_proj.bias torch.Size([768])
254
+ encoder.modality_transformers.audio.blocks.11.norm1.weight torch.Size([768])
255
+ encoder.modality_transformers.audio.blocks.11.norm2.weight torch.Size([768])
256
+ encoder.modality_transformers.audio.blocks.11.mlp.w1.weight torch.Size([512, 768])
257
+ encoder.modality_transformers.audio.blocks.11.mlp.w2.weight torch.Size([768, 512])
258
+ encoder.modality_transformers.audio.blocks.11.mlp.w3.weight torch.Size([512, 768])
259
+ encoder.modality_transformers.depth.pre_transformer_layer.0.weight torch.Size([384])
260
+ encoder.modality_transformers.depth.blocks.0.attn.in_proj_weight torch.Size([1152, 384])
261
+ encoder.modality_transformers.depth.blocks.0.attn.in_proj_bias torch.Size([1152])
262
+ encoder.modality_transformers.depth.blocks.0.attn.bias_k torch.Size([1, 1, 384])
263
+ encoder.modality_transformers.depth.blocks.0.attn.bias_v torch.Size([1, 1, 384])
264
+ encoder.modality_transformers.depth.blocks.0.attn.out_proj.weight torch.Size([384, 384])
265
+ encoder.modality_transformers.depth.blocks.0.attn.out_proj.bias torch.Size([384])
266
+ encoder.modality_transformers.depth.blocks.0.norm1.weight torch.Size([384])
267
+ encoder.modality_transformers.depth.blocks.0.norm2.weight torch.Size([384])
268
+ encoder.modality_transformers.depth.blocks.0.mlp.w1.weight torch.Size([256, 384])
269
+ encoder.modality_transformers.depth.blocks.0.mlp.w2.weight torch.Size([384, 256])
270
+ encoder.modality_transformers.depth.blocks.0.mlp.w3.weight torch.Size([256, 384])
271
+ encoder.modality_transformers.depth.blocks.1.attn.in_proj_weight torch.Size([1152, 384])
272
+ encoder.modality_transformers.depth.blocks.1.attn.in_proj_bias torch.Size([1152])
273
+ encoder.modality_transformers.depth.blocks.1.attn.bias_k torch.Size([1, 1, 384])
274
+ encoder.modality_transformers.depth.blocks.1.attn.bias_v torch.Size([1, 1, 384])
275
+ encoder.modality_transformers.depth.blocks.1.attn.out_proj.weight torch.Size([384, 384])
276
+ encoder.modality_transformers.depth.blocks.1.attn.out_proj.bias torch.Size([384])
277
+ encoder.modality_transformers.depth.blocks.1.norm1.weight torch.Size([384])
278
+ encoder.modality_transformers.depth.blocks.1.norm2.weight torch.Size([384])
279
+ encoder.modality_transformers.depth.blocks.1.mlp.w1.weight torch.Size([256, 384])
280
+ encoder.modality_transformers.depth.blocks.1.mlp.w2.weight torch.Size([384, 256])
281
+ encoder.modality_transformers.depth.blocks.1.mlp.w3.weight torch.Size([256, 384])
282
+ encoder.modality_transformers.depth.blocks.2.attn.in_proj_weight torch.Size([1152, 384])
283
+ encoder.modality_transformers.depth.blocks.2.attn.in_proj_bias torch.Size([1152])
284
+ encoder.modality_transformers.depth.blocks.2.attn.bias_k torch.Size([1, 1, 384])
285
+ encoder.modality_transformers.depth.blocks.2.attn.bias_v torch.Size([1, 1, 384])
286
+ encoder.modality_transformers.depth.blocks.2.attn.out_proj.weight torch.Size([384, 384])
287
+ encoder.modality_transformers.depth.blocks.2.attn.out_proj.bias torch.Size([384])
288
+ encoder.modality_transformers.depth.blocks.2.norm1.weight torch.Size([384])
289
+ encoder.modality_transformers.depth.blocks.2.norm2.weight torch.Size([384])
290
+ encoder.modality_transformers.depth.blocks.2.mlp.w1.weight torch.Size([256, 384])
291
+ encoder.modality_transformers.depth.blocks.2.mlp.w2.weight torch.Size([384, 256])
292
+ encoder.modality_transformers.depth.blocks.2.mlp.w3.weight torch.Size([256, 384])
293
+ encoder.modality_transformers.depth.blocks.3.attn.in_proj_weight torch.Size([1152, 384])
294
+ encoder.modality_transformers.depth.blocks.3.attn.in_proj_bias torch.Size([1152])
295
+ encoder.modality_transformers.depth.blocks.3.attn.bias_k torch.Size([1, 1, 384])
296
+ encoder.modality_transformers.depth.blocks.3.attn.bias_v torch.Size([1, 1, 384])
297
+ encoder.modality_transformers.depth.blocks.3.attn.out_proj.weight torch.Size([384, 384])
298
+ encoder.modality_transformers.depth.blocks.3.attn.out_proj.bias torch.Size([384])
299
+ encoder.modality_transformers.depth.blocks.3.norm1.weight torch.Size([384])
300
+ encoder.modality_transformers.depth.blocks.3.norm2.weight torch.Size([384])
301
+ encoder.modality_transformers.depth.blocks.3.mlp.w1.weight torch.Size([256, 384])
302
+ encoder.modality_transformers.depth.blocks.3.mlp.w2.weight torch.Size([384, 256])
303
+ encoder.modality_transformers.depth.blocks.3.mlp.w3.weight torch.Size([256, 384])
304
+ encoder.modality_transformers.depth.blocks.4.attn.in_proj_weight torch.Size([1152, 384])
305
+ encoder.modality_transformers.depth.blocks.4.attn.in_proj_bias torch.Size([1152])
306
+ encoder.modality_transformers.depth.blocks.4.attn.bias_k torch.Size([1, 1, 384])
307
+ encoder.modality_transformers.depth.blocks.4.attn.bias_v torch.Size([1, 1, 384])
308
+ encoder.modality_transformers.depth.blocks.4.attn.out_proj.weight torch.Size([384, 384])
309
+ encoder.modality_transformers.depth.blocks.4.attn.out_proj.bias torch.Size([384])
310
+ encoder.modality_transformers.depth.blocks.4.norm1.weight torch.Size([384])
311
+ encoder.modality_transformers.depth.blocks.4.norm2.weight torch.Size([384])
312
+ encoder.modality_transformers.depth.blocks.4.mlp.w1.weight torch.Size([256, 384])
313
+ encoder.modality_transformers.depth.blocks.4.mlp.w2.weight torch.Size([384, 256])
314
+ encoder.modality_transformers.depth.blocks.4.mlp.w3.weight torch.Size([256, 384])
315
+ encoder.modality_transformers.depth.blocks.5.attn.in_proj_weight torch.Size([1152, 384])
316
+ encoder.modality_transformers.depth.blocks.5.attn.in_proj_bias torch.Size([1152])
317
+ encoder.modality_transformers.depth.blocks.5.attn.bias_k torch.Size([1, 1, 384])
318
+ encoder.modality_transformers.depth.blocks.5.attn.bias_v torch.Size([1, 1, 384])
319
+ encoder.modality_transformers.depth.blocks.5.attn.out_proj.weight torch.Size([384, 384])
320
+ encoder.modality_transformers.depth.blocks.5.attn.out_proj.bias torch.Size([384])
321
+ encoder.modality_transformers.depth.blocks.5.norm1.weight torch.Size([384])
322
+ encoder.modality_transformers.depth.blocks.5.norm2.weight torch.Size([384])
323
+ encoder.modality_transformers.depth.blocks.5.mlp.w1.weight torch.Size([256, 384])
324
+ encoder.modality_transformers.depth.blocks.5.mlp.w2.weight torch.Size([384, 256])
325
+ encoder.modality_transformers.depth.blocks.5.mlp.w3.weight torch.Size([256, 384])
326
+ encoder.modality_transformers.thermal.pre_transformer_layer.0.weight torch.Size([768])
327
+ encoder.modality_transformers.thermal.blocks.0.attn.in_proj_weight torch.Size([2304, 768])
328
+ encoder.modality_transformers.thermal.blocks.0.attn.in_proj_bias torch.Size([2304])
329
+ encoder.modality_transformers.thermal.blocks.0.attn.bias_k torch.Size([1, 1, 768])
330
+ encoder.modality_transformers.thermal.blocks.0.attn.bias_v torch.Size([1, 1, 768])
331
+ encoder.modality_transformers.thermal.blocks.0.attn.out_proj.weight torch.Size([768, 768])
332
+ encoder.modality_transformers.thermal.blocks.0.attn.out_proj.bias torch.Size([768])
333
+ encoder.modality_transformers.thermal.blocks.0.norm1.weight torch.Size([768])
334
+ encoder.modality_transformers.thermal.blocks.0.norm2.weight torch.Size([768])
335
+ encoder.modality_transformers.thermal.blocks.0.mlp.w1.weight torch.Size([512, 768])
336
+ encoder.modality_transformers.thermal.blocks.0.mlp.w2.weight torch.Size([768, 512])
337
+ encoder.modality_transformers.thermal.blocks.0.mlp.w3.weight torch.Size([512, 768])
338
+ encoder.modality_transformers.thermal.blocks.1.attn.in_proj_weight torch.Size([2304, 768])
339
+ encoder.modality_transformers.thermal.blocks.1.attn.in_proj_bias torch.Size([2304])
340
+ encoder.modality_transformers.thermal.blocks.1.attn.bias_k torch.Size([1, 1, 768])
341
+ encoder.modality_transformers.thermal.blocks.1.attn.bias_v torch.Size([1, 1, 768])
342
+ encoder.modality_transformers.thermal.blocks.1.attn.out_proj.weight torch.Size([768, 768])
343
+ encoder.modality_transformers.thermal.blocks.1.attn.out_proj.bias torch.Size([768])
344
+ encoder.modality_transformers.thermal.blocks.1.norm1.weight torch.Size([768])
345
+ encoder.modality_transformers.thermal.blocks.1.norm2.weight torch.Size([768])
346
+ encoder.modality_transformers.thermal.blocks.1.mlp.w1.weight torch.Size([512, 768])
347
+ encoder.modality_transformers.thermal.blocks.1.mlp.w2.weight torch.Size([768, 512])
348
+ encoder.modality_transformers.thermal.blocks.1.mlp.w3.weight torch.Size([512, 768])
349
+ encoder.modality_transformers.thermal.blocks.2.attn.in_proj_weight torch.Size([2304, 768])
350
+ encoder.modality_transformers.thermal.blocks.2.attn.in_proj_bias torch.Size([2304])
351
+ encoder.modality_transformers.thermal.blocks.2.attn.bias_k torch.Size([1, 1, 768])
352
+ encoder.modality_transformers.thermal.blocks.2.attn.bias_v torch.Size([1, 1, 768])
353
+ encoder.modality_transformers.thermal.blocks.2.attn.out_proj.weight torch.Size([768, 768])
354
+ encoder.modality_transformers.thermal.blocks.2.attn.out_proj.bias torch.Size([768])
355
+ encoder.modality_transformers.thermal.blocks.2.norm1.weight torch.Size([768])
356
+ encoder.modality_transformers.thermal.blocks.2.norm2.weight torch.Size([768])
357
+ encoder.modality_transformers.thermal.blocks.2.mlp.w1.weight torch.Size([512, 768])
358
+ encoder.modality_transformers.thermal.blocks.2.mlp.w2.weight torch.Size([768, 512])
359
+ encoder.modality_transformers.thermal.blocks.2.mlp.w3.weight torch.Size([512, 768])
360
+ encoder.modality_transformers.thermal.blocks.3.attn.in_proj_weight torch.Size([2304, 768])
361
+ encoder.modality_transformers.thermal.blocks.3.attn.in_proj_bias torch.Size([2304])
362
+ encoder.modality_transformers.thermal.blocks.3.attn.bias_k torch.Size([1, 1, 768])
363
+ encoder.modality_transformers.thermal.blocks.3.attn.bias_v torch.Size([1, 1, 768])
364
+ encoder.modality_transformers.thermal.blocks.3.attn.out_proj.weight torch.Size([768, 768])
365
+ encoder.modality_transformers.thermal.blocks.3.attn.out_proj.bias torch.Size([768])
366
+ encoder.modality_transformers.thermal.blocks.3.norm1.weight torch.Size([768])
367
+ encoder.modality_transformers.thermal.blocks.3.norm2.weight torch.Size([768])
368
+ encoder.modality_transformers.thermal.blocks.3.mlp.w1.weight torch.Size([512, 768])
369
+ encoder.modality_transformers.thermal.blocks.3.mlp.w2.weight torch.Size([768, 512])
370
+ encoder.modality_transformers.thermal.blocks.3.mlp.w3.weight torch.Size([512, 768])
371
+ encoder.modality_transformers.thermal.blocks.4.attn.in_proj_weight torch.Size([2304, 768])
372
+ encoder.modality_transformers.thermal.blocks.4.attn.in_proj_bias torch.Size([2304])
373
+ encoder.modality_transformers.thermal.blocks.4.attn.bias_k torch.Size([1, 1, 768])
374
+ encoder.modality_transformers.thermal.blocks.4.attn.bias_v torch.Size([1, 1, 768])
375
+ encoder.modality_transformers.thermal.blocks.4.attn.out_proj.weight torch.Size([768, 768])
376
+ encoder.modality_transformers.thermal.blocks.4.attn.out_proj.bias torch.Size([768])
377
+ encoder.modality_transformers.thermal.blocks.4.norm1.weight torch.Size([768])
378
+ encoder.modality_transformers.thermal.blocks.4.norm2.weight torch.Size([768])
379
+ encoder.modality_transformers.thermal.blocks.4.mlp.w1.weight torch.Size([512, 768])
380
+ encoder.modality_transformers.thermal.blocks.4.mlp.w2.weight torch.Size([768, 512])
381
+ encoder.modality_transformers.thermal.blocks.4.mlp.w3.weight torch.Size([512, 768])
382
+ encoder.modality_transformers.thermal.blocks.5.attn.in_proj_weight torch.Size([2304, 768])
383
+ encoder.modality_transformers.thermal.blocks.5.attn.in_proj_bias torch.Size([2304])
384
+ encoder.modality_transformers.thermal.blocks.5.attn.bias_k torch.Size([1, 1, 768])
385
+ encoder.modality_transformers.thermal.blocks.5.attn.bias_v torch.Size([1, 1, 768])
386
+ encoder.modality_transformers.thermal.blocks.5.attn.out_proj.weight torch.Size([768, 768])
387
+ encoder.modality_transformers.thermal.blocks.5.attn.out_proj.bias torch.Size([768])
388
+ encoder.modality_transformers.thermal.blocks.5.norm1.weight torch.Size([768])
389
+ encoder.modality_transformers.thermal.blocks.5.norm2.weight torch.Size([768])
390
+ encoder.modality_transformers.thermal.blocks.5.mlp.w1.weight torch.Size([512, 768])
391
+ encoder.modality_transformers.thermal.blocks.5.mlp.w2.weight torch.Size([768, 512])
392
+ encoder.modality_transformers.thermal.blocks.5.mlp.w3.weight torch.Size([512, 768])
393
+ encoder.modality_heads.vision.0.weight torch.Size([768])
394
+ encoder.modality_heads.vision.2.weight torch.Size([1024, 768])
395
+ encoder.modality_heads.audio.0.weight torch.Size([768])
396
+ encoder.modality_heads.audio.2.weight torch.Size([1024, 768])
397
+ encoder.modality_heads.depth.0.weight torch.Size([384])
398
+ encoder.modality_heads.depth.2.weight torch.Size([1024, 384])
399
+ encoder.modality_heads.thermal.0.weight torch.Size([768])
400
+ encoder.modality_heads.thermal.2.weight torch.Size([1024, 768])
401
+ reasoner.model.embed_tokens.weight torch.Size([151936, 896])
402
+ reasoner.model.layers.0.self_attn.q_proj.weight torch.Size([896, 896])
403
+ reasoner.model.layers.0.self_attn.q_proj.bias torch.Size([896])
404
+ reasoner.model.layers.0.self_attn.k_proj.weight torch.Size([128, 896])
405
+ reasoner.model.layers.0.self_attn.k_proj.bias torch.Size([128])
406
+ reasoner.model.layers.0.self_attn.v_proj.weight torch.Size([128, 896])
407
+ reasoner.model.layers.0.self_attn.v_proj.bias torch.Size([128])
408
+ reasoner.model.layers.0.self_attn.o_proj.weight torch.Size([896, 896])
409
+ reasoner.model.layers.0.mlp.gate_proj.weight torch.Size([4864, 896])
410
+ reasoner.model.layers.0.mlp.up_proj.weight torch.Size([4864, 896])
411
+ reasoner.model.layers.0.mlp.down_proj.weight torch.Size([896, 4864])
412
+ reasoner.model.layers.0.input_layernorm.weight torch.Size([896])
413
+ reasoner.model.layers.0.post_attention_layernorm.weight torch.Size([896])
414
+ reasoner.model.layers.1.self_attn.q_proj.weight torch.Size([896, 896])
415
+ reasoner.model.layers.1.self_attn.q_proj.bias torch.Size([896])
416
+ reasoner.model.layers.1.self_attn.k_proj.weight torch.Size([128, 896])
417
+ reasoner.model.layers.1.self_attn.k_proj.bias torch.Size([128])
418
+ reasoner.model.layers.1.self_attn.v_proj.weight torch.Size([128, 896])
419
+ reasoner.model.layers.1.self_attn.v_proj.bias torch.Size([128])
420
+ reasoner.model.layers.1.self_attn.o_proj.weight torch.Size([896, 896])
421
+ reasoner.model.layers.1.mlp.gate_proj.weight torch.Size([4864, 896])
422
+ reasoner.model.layers.1.mlp.up_proj.weight torch.Size([4864, 896])
423
+ reasoner.model.layers.1.mlp.down_proj.weight torch.Size([896, 4864])
424
+ reasoner.model.layers.1.input_layernorm.weight torch.Size([896])
425
+ reasoner.model.layers.1.post_attention_layernorm.weight torch.Size([896])
426
+ reasoner.model.layers.2.self_attn.q_proj.weight torch.Size([896, 896])
427
+ reasoner.model.layers.2.self_attn.q_proj.bias torch.Size([896])
428
+ reasoner.model.layers.2.self_attn.k_proj.weight torch.Size([128, 896])
429
+ reasoner.model.layers.2.self_attn.k_proj.bias torch.Size([128])
430
+ reasoner.model.layers.2.self_attn.v_proj.weight torch.Size([128, 896])
431
+ reasoner.model.layers.2.self_attn.v_proj.bias torch.Size([128])
432
+ reasoner.model.layers.2.self_attn.o_proj.weight torch.Size([896, 896])
433
+ reasoner.model.layers.2.mlp.gate_proj.weight torch.Size([4864, 896])
434
+ reasoner.model.layers.2.mlp.up_proj.weight torch.Size([4864, 896])
435
+ reasoner.model.layers.2.mlp.down_proj.weight torch.Size([896, 4864])
436
+ reasoner.model.layers.2.input_layernorm.weight torch.Size([896])
437
+ reasoner.model.layers.2.post_attention_layernorm.weight torch.Size([896])
438
+ reasoner.model.layers.3.self_attn.q_proj.weight torch.Size([896, 896])
439
+ reasoner.model.layers.3.self_attn.q_proj.bias torch.Size([896])
440
+ reasoner.model.layers.3.self_attn.k_proj.weight torch.Size([128, 896])
441
+ reasoner.model.layers.3.self_attn.k_proj.bias torch.Size([128])
442
+ reasoner.model.layers.3.self_attn.v_proj.weight torch.Size([128, 896])
443
+ reasoner.model.layers.3.self_attn.v_proj.bias torch.Size([128])
444
+ reasoner.model.layers.3.self_attn.o_proj.weight torch.Size([896, 896])
445
+ reasoner.model.layers.3.mlp.gate_proj.weight torch.Size([4864, 896])
446
+ reasoner.model.layers.3.mlp.up_proj.weight torch.Size([4864, 896])
447
+ reasoner.model.layers.3.mlp.down_proj.weight torch.Size([896, 4864])
448
+ reasoner.model.layers.3.input_layernorm.weight torch.Size([896])
449
+ reasoner.model.layers.3.post_attention_layernorm.weight torch.Size([896])
450
+ reasoner.model.layers.4.self_attn.q_proj.weight torch.Size([896, 896])
451
+ reasoner.model.layers.4.self_attn.q_proj.bias torch.Size([896])
452
+ reasoner.model.layers.4.self_attn.k_proj.weight torch.Size([128, 896])
453
+ reasoner.model.layers.4.self_attn.k_proj.bias torch.Size([128])
454
+ reasoner.model.layers.4.self_attn.v_proj.weight torch.Size([128, 896])
455
+ reasoner.model.layers.4.self_attn.v_proj.bias torch.Size([128])
456
+ reasoner.model.layers.4.self_attn.o_proj.weight torch.Size([896, 896])
457
+ reasoner.model.layers.4.mlp.gate_proj.weight torch.Size([4864, 896])
458
+ reasoner.model.layers.4.mlp.up_proj.weight torch.Size([4864, 896])
459
+ reasoner.model.layers.4.mlp.down_proj.weight torch.Size([896, 4864])
460
+ reasoner.model.layers.4.input_layernorm.weight torch.Size([896])
461
+ reasoner.model.layers.4.post_attention_layernorm.weight torch.Size([896])
462
+ reasoner.model.layers.5.self_attn.q_proj.weight torch.Size([896, 896])
463
+ reasoner.model.layers.5.self_attn.q_proj.bias torch.Size([896])
464
+ reasoner.model.layers.5.self_attn.k_proj.weight torch.Size([128, 896])
465
+ reasoner.model.layers.5.self_attn.k_proj.bias torch.Size([128])
466
+ reasoner.model.layers.5.self_attn.v_proj.weight torch.Size([128, 896])
467
+ reasoner.model.layers.5.self_attn.v_proj.bias torch.Size([128])
468
+ reasoner.model.layers.5.self_attn.o_proj.weight torch.Size([896, 896])
469
+ reasoner.model.layers.5.mlp.gate_proj.weight torch.Size([4864, 896])
470
+ reasoner.model.layers.5.mlp.up_proj.weight torch.Size([4864, 896])
471
+ reasoner.model.layers.5.mlp.down_proj.weight torch.Size([896, 4864])
472
+ reasoner.model.layers.5.input_layernorm.weight torch.Size([896])
473
+ reasoner.model.layers.5.post_attention_layernorm.weight torch.Size([896])
474
+ reasoner.model.layers.6.self_attn.q_proj.weight torch.Size([896, 896])
475
+ reasoner.model.layers.6.self_attn.q_proj.bias torch.Size([896])
476
+ reasoner.model.layers.6.self_attn.k_proj.weight torch.Size([128, 896])
477
+ reasoner.model.layers.6.self_attn.k_proj.bias torch.Size([128])
478
+ reasoner.model.layers.6.self_attn.v_proj.weight torch.Size([128, 896])
479
+ reasoner.model.layers.6.self_attn.v_proj.bias torch.Size([128])
480
+ reasoner.model.layers.6.self_attn.o_proj.weight torch.Size([896, 896])
481
+ reasoner.model.layers.6.mlp.gate_proj.weight torch.Size([4864, 896])
482
+ reasoner.model.layers.6.mlp.up_proj.weight torch.Size([4864, 896])
483
+ reasoner.model.layers.6.mlp.down_proj.weight torch.Size([896, 4864])
484
+ reasoner.model.layers.6.input_layernorm.weight torch.Size([896])
485
+ reasoner.model.layers.6.post_attention_layernorm.weight torch.Size([896])
486
+ reasoner.model.layers.7.self_attn.q_proj.weight torch.Size([896, 896])
487
+ reasoner.model.layers.7.self_attn.q_proj.bias torch.Size([896])
488
+ reasoner.model.layers.7.self_attn.k_proj.weight torch.Size([128, 896])
489
+ reasoner.model.layers.7.self_attn.k_proj.bias torch.Size([128])
490
+ reasoner.model.layers.7.self_attn.v_proj.weight torch.Size([128, 896])
491
+ reasoner.model.layers.7.self_attn.v_proj.bias torch.Size([128])
492
+ reasoner.model.layers.7.self_attn.o_proj.weight torch.Size([896, 896])
493
+ reasoner.model.layers.7.mlp.gate_proj.weight torch.Size([4864, 896])
494
+ reasoner.model.layers.7.mlp.up_proj.weight torch.Size([4864, 896])
495
+ reasoner.model.layers.7.mlp.down_proj.weight torch.Size([896, 4864])
496
+ reasoner.model.layers.7.input_layernorm.weight torch.Size([896])
497
+ reasoner.model.layers.7.post_attention_layernorm.weight torch.Size([896])
498
+ reasoner.model.layers.8.self_attn.q_proj.weight torch.Size([896, 896])
499
+ reasoner.model.layers.8.self_attn.q_proj.bias torch.Size([896])
500
+ reasoner.model.layers.8.self_attn.k_proj.weight torch.Size([128, 896])
501
+ reasoner.model.layers.8.self_attn.k_proj.bias torch.Size([128])
502
+ reasoner.model.layers.8.self_attn.v_proj.weight torch.Size([128, 896])
503
+ reasoner.model.layers.8.self_attn.v_proj.bias torch.Size([128])
504
+ reasoner.model.layers.8.self_attn.o_proj.weight torch.Size([896, 896])
505
+ reasoner.model.layers.8.mlp.gate_proj.weight torch.Size([4864, 896])
506
+ reasoner.model.layers.8.mlp.up_proj.weight torch.Size([4864, 896])
507
+ reasoner.model.layers.8.mlp.down_proj.weight torch.Size([896, 4864])
508
+ reasoner.model.layers.8.input_layernorm.weight torch.Size([896])
509
+ reasoner.model.layers.8.post_attention_layernorm.weight torch.Size([896])
510
+ reasoner.model.layers.9.self_attn.q_proj.weight torch.Size([896, 896])
511
+ reasoner.model.layers.9.self_attn.q_proj.bias torch.Size([896])
512
+ reasoner.model.layers.9.self_attn.k_proj.weight torch.Size([128, 896])
513
+ reasoner.model.layers.9.self_attn.k_proj.bias torch.Size([128])
514
+ reasoner.model.layers.9.self_attn.v_proj.weight torch.Size([128, 896])
515
+ reasoner.model.layers.9.self_attn.v_proj.bias torch.Size([128])
516
+ reasoner.model.layers.9.self_attn.o_proj.weight torch.Size([896, 896])
517
+ reasoner.model.layers.9.mlp.gate_proj.weight torch.Size([4864, 896])
518
+ reasoner.model.layers.9.mlp.up_proj.weight torch.Size([4864, 896])
519
+ reasoner.model.layers.9.mlp.down_proj.weight torch.Size([896, 4864])
520
+ reasoner.model.layers.9.input_layernorm.weight torch.Size([896])
521
+ reasoner.model.layers.9.post_attention_layernorm.weight torch.Size([896])
522
+ reasoner.model.layers.10.self_attn.q_proj.weight torch.Size([896, 896])
523
+ reasoner.model.layers.10.self_attn.q_proj.bias torch.Size([896])
524
+ reasoner.model.layers.10.self_attn.k_proj.weight torch.Size([128, 896])
525
+ reasoner.model.layers.10.self_attn.k_proj.bias torch.Size([128])
526
+ reasoner.model.layers.10.self_attn.v_proj.weight torch.Size([128, 896])
527
+ reasoner.model.layers.10.self_attn.v_proj.bias torch.Size([128])
528
+ reasoner.model.layers.10.self_attn.o_proj.weight torch.Size([896, 896])
529
+ reasoner.model.layers.10.mlp.gate_proj.weight torch.Size([4864, 896])
530
+ reasoner.model.layers.10.mlp.up_proj.weight torch.Size([4864, 896])
531
+ reasoner.model.layers.10.mlp.down_proj.weight torch.Size([896, 4864])
532
+ reasoner.model.layers.10.input_layernorm.weight torch.Size([896])
533
+ reasoner.model.layers.10.post_attention_layernorm.weight torch.Size([896])
534
+ reasoner.model.layers.11.self_attn.q_proj.weight torch.Size([896, 896])
535
+ reasoner.model.layers.11.self_attn.q_proj.bias torch.Size([896])
536
+ reasoner.model.layers.11.self_attn.k_proj.weight torch.Size([128, 896])
537
+ reasoner.model.layers.11.self_attn.k_proj.bias torch.Size([128])
538
+ reasoner.model.layers.11.self_attn.v_proj.weight torch.Size([128, 896])
539
+ reasoner.model.layers.11.self_attn.v_proj.bias torch.Size([128])
540
+ reasoner.model.layers.11.self_attn.o_proj.weight torch.Size([896, 896])
541
+ reasoner.model.layers.11.mlp.gate_proj.weight torch.Size([4864, 896])
542
+ reasoner.model.layers.11.mlp.up_proj.weight torch.Size([4864, 896])
543
+ reasoner.model.layers.11.mlp.down_proj.weight torch.Size([896, 4864])
544
+ reasoner.model.layers.11.input_layernorm.weight torch.Size([896])
545
+ reasoner.model.layers.11.post_attention_layernorm.weight torch.Size([896])
546
+ reasoner.model.layers.12.self_attn.q_proj.weight torch.Size([896, 896])
547
+ reasoner.model.layers.12.self_attn.q_proj.bias torch.Size([896])
548
+ reasoner.model.layers.12.self_attn.k_proj.weight torch.Size([128, 896])
549
+ reasoner.model.layers.12.self_attn.k_proj.bias torch.Size([128])
550
+ reasoner.model.layers.12.self_attn.v_proj.weight torch.Size([128, 896])
551
+ reasoner.model.layers.12.self_attn.v_proj.bias torch.Size([128])
552
+ reasoner.model.layers.12.self_attn.o_proj.weight torch.Size([896, 896])
553
+ reasoner.model.layers.12.mlp.gate_proj.weight torch.Size([4864, 896])
554
+ reasoner.model.layers.12.mlp.up_proj.weight torch.Size([4864, 896])
555
+ reasoner.model.layers.12.mlp.down_proj.weight torch.Size([896, 4864])
556
+ reasoner.model.layers.12.input_layernorm.weight torch.Size([896])
557
+ reasoner.model.layers.12.post_attention_layernorm.weight torch.Size([896])
558
+ reasoner.model.layers.13.self_attn.q_proj.weight torch.Size([896, 896])
559
+ reasoner.model.layers.13.self_attn.q_proj.bias torch.Size([896])
560
+ reasoner.model.layers.13.self_attn.k_proj.weight torch.Size([128, 896])
561
+ reasoner.model.layers.13.self_attn.k_proj.bias torch.Size([128])
562
+ reasoner.model.layers.13.self_attn.v_proj.weight torch.Size([128, 896])
563
+ reasoner.model.layers.13.self_attn.v_proj.bias torch.Size([128])
564
+ reasoner.model.layers.13.self_attn.o_proj.weight torch.Size([896, 896])
565
+ reasoner.model.layers.13.mlp.gate_proj.weight torch.Size([4864, 896])
566
+ reasoner.model.layers.13.mlp.up_proj.weight torch.Size([4864, 896])
567
+ reasoner.model.layers.13.mlp.down_proj.weight torch.Size([896, 4864])
568
+ reasoner.model.layers.13.input_layernorm.weight torch.Size([896])
569
+ reasoner.model.layers.13.post_attention_layernorm.weight torch.Size([896])
570
+ reasoner.model.layers.14.self_attn.q_proj.weight torch.Size([896, 896])
571
+ reasoner.model.layers.14.self_attn.q_proj.bias torch.Size([896])
572
+ reasoner.model.layers.14.self_attn.k_proj.weight torch.Size([128, 896])
573
+ reasoner.model.layers.14.self_attn.k_proj.bias torch.Size([128])
574
+ reasoner.model.layers.14.self_attn.v_proj.weight torch.Size([128, 896])
575
+ reasoner.model.layers.14.self_attn.v_proj.bias torch.Size([128])
576
+ reasoner.model.layers.14.self_attn.o_proj.weight torch.Size([896, 896])
577
+ reasoner.model.layers.14.mlp.gate_proj.weight torch.Size([4864, 896])
578
+ reasoner.model.layers.14.mlp.up_proj.weight torch.Size([4864, 896])
579
+ reasoner.model.layers.14.mlp.down_proj.weight torch.Size([896, 4864])
580
+ reasoner.model.layers.14.input_layernorm.weight torch.Size([896])
581
+ reasoner.model.layers.14.post_attention_layernorm.weight torch.Size([896])
582
+ reasoner.model.layers.15.self_attn.q_proj.weight torch.Size([896, 896])
583
+ reasoner.model.layers.15.self_attn.q_proj.bias torch.Size([896])
584
+ reasoner.model.layers.15.self_attn.k_proj.weight torch.Size([128, 896])
585
+ reasoner.model.layers.15.self_attn.k_proj.bias torch.Size([128])
586
+ reasoner.model.layers.15.self_attn.v_proj.weight torch.Size([128, 896])
587
+ reasoner.model.layers.15.self_attn.v_proj.bias torch.Size([128])
588
+ reasoner.model.layers.15.self_attn.o_proj.weight torch.Size([896, 896])
589
+ reasoner.model.layers.15.mlp.gate_proj.weight torch.Size([4864, 896])
590
+ reasoner.model.layers.15.mlp.up_proj.weight torch.Size([4864, 896])
591
+ reasoner.model.layers.15.mlp.down_proj.weight torch.Size([896, 4864])
592
+ reasoner.model.layers.15.input_layernorm.weight torch.Size([896])
593
+ reasoner.model.layers.15.post_attention_layernorm.weight torch.Size([896])
594
+ reasoner.model.layers.16.self_attn.q_proj.weight torch.Size([896, 896])
595
+ reasoner.model.layers.16.self_attn.q_proj.bias torch.Size([896])
596
+ reasoner.model.layers.16.self_attn.k_proj.weight torch.Size([128, 896])
597
+ reasoner.model.layers.16.self_attn.k_proj.bias torch.Size([128])
598
+ reasoner.model.layers.16.self_attn.v_proj.weight torch.Size([128, 896])
599
+ reasoner.model.layers.16.self_attn.v_proj.bias torch.Size([128])
600
+ reasoner.model.layers.16.self_attn.o_proj.weight torch.Size([896, 896])
601
+ reasoner.model.layers.16.mlp.gate_proj.weight torch.Size([4864, 896])
602
+ reasoner.model.layers.16.mlp.up_proj.weight torch.Size([4864, 896])
603
+ reasoner.model.layers.16.mlp.down_proj.weight torch.Size([896, 4864])
604
+ reasoner.model.layers.16.input_layernorm.weight torch.Size([896])
605
+ reasoner.model.layers.16.post_attention_layernorm.weight torch.Size([896])
606
+ reasoner.model.layers.17.self_attn.q_proj.weight torch.Size([896, 896])
607
+ reasoner.model.layers.17.self_attn.q_proj.bias torch.Size([896])
608
+ reasoner.model.layers.17.self_attn.k_proj.weight torch.Size([128, 896])
609
+ reasoner.model.layers.17.self_attn.k_proj.bias torch.Size([128])
610
+ reasoner.model.layers.17.self_attn.v_proj.weight torch.Size([128, 896])
611
+ reasoner.model.layers.17.self_attn.v_proj.bias torch.Size([128])
612
+ reasoner.model.layers.17.self_attn.o_proj.weight torch.Size([896, 896])
613
+ reasoner.model.layers.17.mlp.gate_proj.weight torch.Size([4864, 896])
614
+ reasoner.model.layers.17.mlp.up_proj.weight torch.Size([4864, 896])
615
+ reasoner.model.layers.17.mlp.down_proj.weight torch.Size([896, 4864])
616
+ reasoner.model.layers.17.input_layernorm.weight torch.Size([896])
617
+ reasoner.model.layers.17.post_attention_layernorm.weight torch.Size([896])
618
+ reasoner.model.layers.18.self_attn.q_proj.weight torch.Size([896, 896])
619
+ reasoner.model.layers.18.self_attn.q_proj.bias torch.Size([896])
620
+ reasoner.model.layers.18.self_attn.k_proj.weight torch.Size([128, 896])
621
+ reasoner.model.layers.18.self_attn.k_proj.bias torch.Size([128])
622
+ reasoner.model.layers.18.self_attn.v_proj.weight torch.Size([128, 896])
623
+ reasoner.model.layers.18.self_attn.v_proj.bias torch.Size([128])
624
+ reasoner.model.layers.18.self_attn.o_proj.weight torch.Size([896, 896])
625
+ reasoner.model.layers.18.mlp.gate_proj.weight torch.Size([4864, 896])
626
+ reasoner.model.layers.18.mlp.up_proj.weight torch.Size([4864, 896])
627
+ reasoner.model.layers.18.mlp.down_proj.weight torch.Size([896, 4864])
628
+ reasoner.model.layers.18.input_layernorm.weight torch.Size([896])
629
+ reasoner.model.layers.18.post_attention_layernorm.weight torch.Size([896])
630
+ reasoner.model.layers.19.self_attn.q_proj.weight torch.Size([896, 896])
631
+ reasoner.model.layers.19.self_attn.q_proj.bias torch.Size([896])
632
+ reasoner.model.layers.19.self_attn.k_proj.weight torch.Size([128, 896])
633
+ reasoner.model.layers.19.self_attn.k_proj.bias torch.Size([128])
634
+ reasoner.model.layers.19.self_attn.v_proj.weight torch.Size([128, 896])
635
+ reasoner.model.layers.19.self_attn.v_proj.bias torch.Size([128])
636
+ reasoner.model.layers.19.self_attn.o_proj.weight torch.Size([896, 896])
637
+ reasoner.model.layers.19.mlp.gate_proj.weight torch.Size([4864, 896])
638
+ reasoner.model.layers.19.mlp.up_proj.weight torch.Size([4864, 896])
639
+ reasoner.model.layers.19.mlp.down_proj.weight torch.Size([896, 4864])
640
+ reasoner.model.layers.19.input_layernorm.weight torch.Size([896])
641
+ reasoner.model.layers.19.post_attention_layernorm.weight torch.Size([896])
642
+ reasoner.model.layers.20.self_attn.q_proj.weight torch.Size([896, 896])
643
+ reasoner.model.layers.20.self_attn.q_proj.bias torch.Size([896])
644
+ reasoner.model.layers.20.self_attn.k_proj.weight torch.Size([128, 896])
645
+ reasoner.model.layers.20.self_attn.k_proj.bias torch.Size([128])
646
+ reasoner.model.layers.20.self_attn.v_proj.weight torch.Size([128, 896])
647
+ reasoner.model.layers.20.self_attn.v_proj.bias torch.Size([128])
648
+ reasoner.model.layers.20.self_attn.o_proj.weight torch.Size([896, 896])
649
+ reasoner.model.layers.20.mlp.gate_proj.weight torch.Size([4864, 896])
650
+ reasoner.model.layers.20.mlp.up_proj.weight torch.Size([4864, 896])
651
+ reasoner.model.layers.20.mlp.down_proj.weight torch.Size([896, 4864])
652
+ reasoner.model.layers.20.input_layernorm.weight torch.Size([896])
653
+ reasoner.model.layers.20.post_attention_layernorm.weight torch.Size([896])
654
+ reasoner.model.layers.21.self_attn.q_proj.weight torch.Size([896, 896])
655
+ reasoner.model.layers.21.self_attn.q_proj.bias torch.Size([896])
656
+ reasoner.model.layers.21.self_attn.k_proj.weight torch.Size([128, 896])
657
+ reasoner.model.layers.21.self_attn.k_proj.bias torch.Size([128])
658
+ reasoner.model.layers.21.self_attn.v_proj.weight torch.Size([128, 896])
659
+ reasoner.model.layers.21.self_attn.v_proj.bias torch.Size([128])
660
+ reasoner.model.layers.21.self_attn.o_proj.weight torch.Size([896, 896])
661
+ reasoner.model.layers.21.mlp.gate_proj.weight torch.Size([4864, 896])
662
+ reasoner.model.layers.21.mlp.up_proj.weight torch.Size([4864, 896])
663
+ reasoner.model.layers.21.mlp.down_proj.weight torch.Size([896, 4864])
664
+ reasoner.model.layers.21.input_layernorm.weight torch.Size([896])
665
+ reasoner.model.layers.21.post_attention_layernorm.weight torch.Size([896])
666
+ reasoner.model.layers.22.self_attn.q_proj.weight torch.Size([896, 896])
667
+ reasoner.model.layers.22.self_attn.q_proj.bias torch.Size([896])
668
+ reasoner.model.layers.22.self_attn.k_proj.weight torch.Size([128, 896])
669
+ reasoner.model.layers.22.self_attn.k_proj.bias torch.Size([128])
670
+ reasoner.model.layers.22.self_attn.v_proj.weight torch.Size([128, 896])
671
+ reasoner.model.layers.22.self_attn.v_proj.bias torch.Size([128])
672
+ reasoner.model.layers.22.self_attn.o_proj.weight torch.Size([896, 896])
673
+ reasoner.model.layers.22.mlp.gate_proj.weight torch.Size([4864, 896])
674
+ reasoner.model.layers.22.mlp.up_proj.weight torch.Size([4864, 896])
675
+ reasoner.model.layers.22.mlp.down_proj.weight torch.Size([896, 4864])
676
+ reasoner.model.layers.22.input_layernorm.weight torch.Size([896])
677
+ reasoner.model.layers.22.post_attention_layernorm.weight torch.Size([896])
678
+ reasoner.model.layers.23.self_attn.q_proj.weight torch.Size([896, 896])
679
+ reasoner.model.layers.23.self_attn.q_proj.bias torch.Size([896])
680
+ reasoner.model.layers.23.self_attn.k_proj.weight torch.Size([128, 896])
681
+ reasoner.model.layers.23.self_attn.k_proj.bias torch.Size([128])
682
+ reasoner.model.layers.23.self_attn.v_proj.weight torch.Size([128, 896])
683
+ reasoner.model.layers.23.self_attn.v_proj.bias torch.Size([128])
684
+ reasoner.model.layers.23.self_attn.o_proj.weight torch.Size([896, 896])
685
+ reasoner.model.layers.23.mlp.gate_proj.weight torch.Size([4864, 896])
686
+ reasoner.model.layers.23.mlp.up_proj.weight torch.Size([4864, 896])
687
+ reasoner.model.layers.23.mlp.down_proj.weight torch.Size([896, 4864])
688
+ reasoner.model.layers.23.input_layernorm.weight torch.Size([896])
689
+ reasoner.model.layers.23.post_attention_layernorm.weight torch.Size([896])
690
+ reasoner.model.norm.weight torch.Size([896])
691
+ reasoner.lm_head.weight torch.Size([151936, 896])
692
+ input_projetor.weight torch.Size([896, 1024])
693
+ input_projetor.bias torch.Size([896])