Upload folder using huggingface_hub
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitignore +1 -0
- .ipynb_checkpoints/0,0,0-checkpoint.png +0 -0
- .ipynb_checkpoints/tmp-checkpoint.jpg +0 -0
- 0,0,0.png +0 -0
- CONTRIBUTING.md +106 -0
- LICENSE +201 -0
- MANIFEST.in +3 -0
- README.md +781 -8
- avg_checkpoints.py +152 -0
- benchmark.py +696 -0
- bulk_runner.py +184 -0
- clean_checkpoint.py +115 -0
- convert/convert_from_mxnet.py +107 -0
- convert/convert_nest_flax.py +109 -0
- demo.py +120 -0
- distributed_train.sh +5 -0
- docs/archived_changes.md +406 -0
- docs/changes.md +314 -0
- docs/feature_extraction.md +174 -0
- docs/index.md +80 -0
- docs/javascripts/tables.js +6 -0
- docs/models.md +171 -0
- docs/models/.pages +1 -0
- docs/models/.templates/code_snippets.md +62 -0
- docs/models/.templates/generate_readmes.py +64 -0
- docs/models/.templates/models/adversarial-inception-v3.md +98 -0
- docs/models/.templates/models/advprop.md +457 -0
- docs/models/.templates/models/big-transfer.md +295 -0
- docs/models/.templates/models/csp-darknet.md +81 -0
- docs/models/.templates/models/csp-resnet.md +76 -0
- docs/models/.templates/models/csp-resnext.md +77 -0
- docs/models/.templates/models/densenet.md +305 -0
- docs/models/.templates/models/dla.md +545 -0
- docs/models/.templates/models/dpn.md +256 -0
- docs/models/.templates/models/ecaresnet.md +236 -0
- docs/models/.templates/models/efficientnet-pruned.md +145 -0
- docs/models/.templates/models/efficientnet.md +325 -0
- docs/models/.templates/models/ensemble-adversarial.md +98 -0
- docs/models/.templates/models/ese-vovnet.md +92 -0
- docs/models/.templates/models/fbnet.md +76 -0
- docs/models/.templates/models/gloun-inception-v3.md +78 -0
- docs/models/.templates/models/gloun-resnet.md +504 -0
- docs/models/.templates/models/gloun-resnext.md +142 -0
- docs/models/.templates/models/gloun-senet.md +63 -0
- docs/models/.templates/models/gloun-seresnext.md +136 -0
- docs/models/.templates/models/gloun-xception.md +66 -0
- docs/models/.templates/models/hrnet.md +358 -0
- docs/models/.templates/models/ig-resnext.md +209 -0
- docs/models/.templates/models/inception-resnet-v2.md +72 -0
- docs/models/.templates/models/inception-v3.md +85 -0
.gitignore
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
**/__pycache__/
|
.ipynb_checkpoints/0,0,0-checkpoint.png
ADDED
.ipynb_checkpoints/tmp-checkpoint.jpg
ADDED
0,0,0.png
ADDED
CONTRIBUTING.md
ADDED
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*This guideline is very much a work-in-progress.*
|
2 |
+
|
3 |
+
Contriubtions to `timm` for code, documentation, tests are more than welcome!
|
4 |
+
|
5 |
+
There haven't been any formal guidelines to date so please bear with me, and feel free to add to this guide.
|
6 |
+
|
7 |
+
# Coding style
|
8 |
+
|
9 |
+
Code linting and auto-format (black) are not currently in place but open to consideration. In the meantime, the style to follow is (mostly) aligned with Google's guide: https://google.github.io/styleguide/pyguide.html.
|
10 |
+
|
11 |
+
A few specific differences from Google style (or black)
|
12 |
+
1. Line length is 120 char. Going over is okay in some cases (e.g. I prefer not to break URL across lines).
|
13 |
+
2. Hanging indents are always prefered, please avoid aligning arguments with closing brackets or braces.
|
14 |
+
|
15 |
+
Example, from Google guide, but this is a NO here:
|
16 |
+
```
|
17 |
+
# Aligned with opening delimiter.
|
18 |
+
foo = long_function_name(var_one, var_two,
|
19 |
+
var_three, var_four)
|
20 |
+
meal = (spam,
|
21 |
+
beans)
|
22 |
+
|
23 |
+
# Aligned with opening delimiter in a dictionary.
|
24 |
+
foo = {
|
25 |
+
'long_dictionary_key': value1 +
|
26 |
+
value2,
|
27 |
+
...
|
28 |
+
}
|
29 |
+
```
|
30 |
+
This is YES:
|
31 |
+
|
32 |
+
```
|
33 |
+
# 4-space hanging indent; nothing on first line,
|
34 |
+
# closing parenthesis on a new line.
|
35 |
+
foo = long_function_name(
|
36 |
+
var_one, var_two, var_three,
|
37 |
+
var_four
|
38 |
+
)
|
39 |
+
meal = (
|
40 |
+
spam,
|
41 |
+
beans,
|
42 |
+
)
|
43 |
+
|
44 |
+
# 4-space hanging indent in a dictionary.
|
45 |
+
foo = {
|
46 |
+
'long_dictionary_key':
|
47 |
+
long_dictionary_value,
|
48 |
+
...
|
49 |
+
}
|
50 |
+
```
|
51 |
+
|
52 |
+
When there is descrepancy in a given source file (there are many origins for various bits of code and not all have been updated to what I consider current goal), please follow the style in a given file.
|
53 |
+
|
54 |
+
In general, if you add new code, formatting it with black using the following options should result in a style that is compatible with the rest of the code base:
|
55 |
+
|
56 |
+
```
|
57 |
+
black --skip-string-normalization --line-length 120 <path-to-file>
|
58 |
+
```
|
59 |
+
|
60 |
+
Avoid formatting code that is unrelated to your PR though.
|
61 |
+
|
62 |
+
PR with pure formatting / style fixes will be accepted but only in isolation from functional changes, best to ask before starting such a change.
|
63 |
+
|
64 |
+
# Documentation
|
65 |
+
|
66 |
+
As with code style, docstrings style based on the Google guide: guide: https://google.github.io/styleguide/pyguide.html
|
67 |
+
|
68 |
+
The goal for the code is to eventually move to have all major functions and `__init__` methods use PEP484 type annotations.
|
69 |
+
|
70 |
+
When type annotations are used for a function, as per the Google pyguide, they should **NOT** be duplicated in the docstrings, please leave annotations as the one source of truth re typing.
|
71 |
+
|
72 |
+
There are a LOT of gaps in current documentation relative to the functionality in timm, please, document away!
|
73 |
+
|
74 |
+
# Installation
|
75 |
+
|
76 |
+
Create a Python virtual environment using Python 3.10. Inside the environment, install torch` and `torchvision` using the instructions matching your system as listed on the [PyTorch website](https://pytorch.org/).
|
77 |
+
|
78 |
+
Then install the remaining dependencies:
|
79 |
+
|
80 |
+
```
|
81 |
+
python -m pip install -r requirements.txt
|
82 |
+
python -m pip install -r requirements-dev.txt # for testing
|
83 |
+
python -m pip install -e .
|
84 |
+
```
|
85 |
+
|
86 |
+
## Unit tests
|
87 |
+
|
88 |
+
Run the tests using:
|
89 |
+
|
90 |
+
```
|
91 |
+
pytest tests/
|
92 |
+
```
|
93 |
+
|
94 |
+
Since the whole test suite takes a lot of time to run locally (a few hours), you may want to select a subset of tests relating to the changes you made by using the `-k` option of [`pytest`](https://docs.pytest.org/en/7.1.x/example/markers.html#using-k-expr-to-select-tests-based-on-their-name). Moreover, running tests in parallel (in this example 4 processes) with the `-n` option may help:
|
95 |
+
|
96 |
+
```
|
97 |
+
pytest -k "substring-to-match" -n 4 tests/
|
98 |
+
```
|
99 |
+
|
100 |
+
## Building documentation
|
101 |
+
|
102 |
+
Please refer to [this document](https://github.com/huggingface/pytorch-image-models/tree/main/hfdocs).
|
103 |
+
|
104 |
+
# Questions
|
105 |
+
|
106 |
+
If you have any questions about contribution, where / how to contribute, please ask in the [Discussions](https://github.com/huggingface/pytorch-image-models/discussions/categories/contributing) (there is a `Contributing` topic).
|
LICENSE
ADDED
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Apache License
|
2 |
+
Version 2.0, January 2004
|
3 |
+
http://www.apache.org/licenses/
|
4 |
+
|
5 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
6 |
+
|
7 |
+
1. Definitions.
|
8 |
+
|
9 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
10 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
11 |
+
|
12 |
+
"Licensor" shall mean the copyright owner or entity authorized by
|
13 |
+
the copyright owner that is granting the License.
|
14 |
+
|
15 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
16 |
+
other entities that control, are controlled by, or are under common
|
17 |
+
control with that entity. For the purposes of this definition,
|
18 |
+
"control" means (i) the power, direct or indirect, to cause the
|
19 |
+
direction or management of such entity, whether by contract or
|
20 |
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
21 |
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
22 |
+
|
23 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
24 |
+
exercising permissions granted by this License.
|
25 |
+
|
26 |
+
"Source" form shall mean the preferred form for making modifications,
|
27 |
+
including but not limited to software source code, documentation
|
28 |
+
source, and configuration files.
|
29 |
+
|
30 |
+
"Object" form shall mean any form resulting from mechanical
|
31 |
+
transformation or translation of a Source form, including but
|
32 |
+
not limited to compiled object code, generated documentation,
|
33 |
+
and conversions to other media types.
|
34 |
+
|
35 |
+
"Work" shall mean the work of authorship, whether in Source or
|
36 |
+
Object form, made available under the License, as indicated by a
|
37 |
+
copyright notice that is included in or attached to the work
|
38 |
+
(an example is provided in the Appendix below).
|
39 |
+
|
40 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
41 |
+
form, that is based on (or derived from) the Work and for which the
|
42 |
+
editorial revisions, annotations, elaborations, or other modifications
|
43 |
+
represent, as a whole, an original work of authorship. For the purposes
|
44 |
+
of this License, Derivative Works shall not include works that remain
|
45 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
46 |
+
the Work and Derivative Works thereof.
|
47 |
+
|
48 |
+
"Contribution" shall mean any work of authorship, including
|
49 |
+
the original version of the Work and any modifications or additions
|
50 |
+
to that Work or Derivative Works thereof, that is intentionally
|
51 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
52 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
53 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
54 |
+
means any form of electronic, verbal, or written communication sent
|
55 |
+
to the Licensor or its representatives, including but not limited to
|
56 |
+
communication on electronic mailing lists, source code control systems,
|
57 |
+
and issue tracking systems that are managed by, or on behalf of, the
|
58 |
+
Licensor for the purpose of discussing and improving the Work, but
|
59 |
+
excluding communication that is conspicuously marked or otherwise
|
60 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
61 |
+
|
62 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
63 |
+
on behalf of whom a Contribution has been received by Licensor and
|
64 |
+
subsequently incorporated within the Work.
|
65 |
+
|
66 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
67 |
+
this License, each Contributor hereby grants to You a perpetual,
|
68 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
69 |
+
copyright license to reproduce, prepare Derivative Works of,
|
70 |
+
publicly display, publicly perform, sublicense, and distribute the
|
71 |
+
Work and such Derivative Works in Source or Object form.
|
72 |
+
|
73 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
74 |
+
this License, each Contributor hereby grants to You a perpetual,
|
75 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
76 |
+
(except as stated in this section) patent license to make, have made,
|
77 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
78 |
+
where such license applies only to those patent claims licensable
|
79 |
+
by such Contributor that are necessarily infringed by their
|
80 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
81 |
+
with the Work to which such Contribution(s) was submitted. If You
|
82 |
+
institute patent litigation against any entity (including a
|
83 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
84 |
+
or a Contribution incorporated within the Work constitutes direct
|
85 |
+
or contributory patent infringement, then any patent licenses
|
86 |
+
granted to You under this License for that Work shall terminate
|
87 |
+
as of the date such litigation is filed.
|
88 |
+
|
89 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
90 |
+
Work or Derivative Works thereof in any medium, with or without
|
91 |
+
modifications, and in Source or Object form, provided that You
|
92 |
+
meet the following conditions:
|
93 |
+
|
94 |
+
(a) You must give any other recipients of the Work or
|
95 |
+
Derivative Works a copy of this License; and
|
96 |
+
|
97 |
+
(b) You must cause any modified files to carry prominent notices
|
98 |
+
stating that You changed the files; and
|
99 |
+
|
100 |
+
(c) You must retain, in the Source form of any Derivative Works
|
101 |
+
that You distribute, all copyright, patent, trademark, and
|
102 |
+
attribution notices from the Source form of the Work,
|
103 |
+
excluding those notices that do not pertain to any part of
|
104 |
+
the Derivative Works; and
|
105 |
+
|
106 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
107 |
+
distribution, then any Derivative Works that You distribute must
|
108 |
+
include a readable copy of the attribution notices contained
|
109 |
+
within such NOTICE file, excluding those notices that do not
|
110 |
+
pertain to any part of the Derivative Works, in at least one
|
111 |
+
of the following places: within a NOTICE text file distributed
|
112 |
+
as part of the Derivative Works; within the Source form or
|
113 |
+
documentation, if provided along with the Derivative Works; or,
|
114 |
+
within a display generated by the Derivative Works, if and
|
115 |
+
wherever such third-party notices normally appear. The contents
|
116 |
+
of the NOTICE file are for informational purposes only and
|
117 |
+
do not modify the License. You may add Your own attribution
|
118 |
+
notices within Derivative Works that You distribute, alongside
|
119 |
+
or as an addendum to the NOTICE text from the Work, provided
|
120 |
+
that such additional attribution notices cannot be construed
|
121 |
+
as modifying the License.
|
122 |
+
|
123 |
+
You may add Your own copyright statement to Your modifications and
|
124 |
+
may provide additional or different license terms and conditions
|
125 |
+
for use, reproduction, or distribution of Your modifications, or
|
126 |
+
for any such Derivative Works as a whole, provided Your use,
|
127 |
+
reproduction, and distribution of the Work otherwise complies with
|
128 |
+
the conditions stated in this License.
|
129 |
+
|
130 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
131 |
+
any Contribution intentionally submitted for inclusion in the Work
|
132 |
+
by You to the Licensor shall be under the terms and conditions of
|
133 |
+
this License, without any additional terms or conditions.
|
134 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
135 |
+
the terms of any separate license agreement you may have executed
|
136 |
+
with Licensor regarding such Contributions.
|
137 |
+
|
138 |
+
6. Trademarks. This License does not grant permission to use the trade
|
139 |
+
names, trademarks, service marks, or product names of the Licensor,
|
140 |
+
except as required for reasonable and customary use in describing the
|
141 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
142 |
+
|
143 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
144 |
+
agreed to in writing, Licensor provides the Work (and each
|
145 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
146 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
147 |
+
implied, including, without limitation, any warranties or conditions
|
148 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
149 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
150 |
+
appropriateness of using or redistributing the Work and assume any
|
151 |
+
risks associated with Your exercise of permissions under this License.
|
152 |
+
|
153 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
154 |
+
whether in tort (including negligence), contract, or otherwise,
|
155 |
+
unless required by applicable law (such as deliberate and grossly
|
156 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
157 |
+
liable to You for damages, including any direct, indirect, special,
|
158 |
+
incidental, or consequential damages of any character arising as a
|
159 |
+
result of this License or out of the use or inability to use the
|
160 |
+
Work (including but not limited to damages for loss of goodwill,
|
161 |
+
work stoppage, computer failure or malfunction, or any and all
|
162 |
+
other commercial damages or losses), even if such Contributor
|
163 |
+
has been advised of the possibility of such damages.
|
164 |
+
|
165 |
+
9. Accepting Warranty or Additional Liability. While redistributing
|
166 |
+
the Work or Derivative Works thereof, You may choose to offer,
|
167 |
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
168 |
+
or other liability obligations and/or rights consistent with this
|
169 |
+
License. However, in accepting such obligations, You may act only
|
170 |
+
on Your own behalf and on Your sole responsibility, not on behalf
|
171 |
+
of any other Contributor, and only if You agree to indemnify,
|
172 |
+
defend, and hold each Contributor harmless for any liability
|
173 |
+
incurred by, or claims asserted against, such Contributor by reason
|
174 |
+
of your accepting any such warranty or additional liability.
|
175 |
+
|
176 |
+
END OF TERMS AND CONDITIONS
|
177 |
+
|
178 |
+
APPENDIX: How to apply the Apache License to your work.
|
179 |
+
|
180 |
+
To apply the Apache License to your work, attach the following
|
181 |
+
boilerplate notice, with the fields enclosed by brackets "{}"
|
182 |
+
replaced with your own identifying information. (Don't include
|
183 |
+
the brackets!) The text should be enclosed in the appropriate
|
184 |
+
comment syntax for the file format. We also recommend that a
|
185 |
+
file or class name and description of purpose be included on the
|
186 |
+
same "printed page" as the copyright notice for easier
|
187 |
+
identification within third-party archives.
|
188 |
+
|
189 |
+
Copyright 2019 Ross Wightman
|
190 |
+
|
191 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
192 |
+
you may not use this file except in compliance with the License.
|
193 |
+
You may obtain a copy of the License at
|
194 |
+
|
195 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
196 |
+
|
197 |
+
Unless required by applicable law or agreed to in writing, software
|
198 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
199 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
200 |
+
See the License for the specific language governing permissions and
|
201 |
+
limitations under the License.
|
MANIFEST.in
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
include timm/models/_pruned/*.txt
|
2 |
+
include timm/data/_info/*.txt
|
3 |
+
include timm/data/_info/*.json
|
README.md
CHANGED
@@ -1,12 +1,785 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
|
4 |
-
colorFrom: red
|
5 |
-
colorTo: green
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 3.
|
8 |
-
app_file: app.py
|
9 |
-
pinned: false
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: cs-mixer
|
3 |
+
app_file: demo.py
|
|
|
|
|
4 |
sdk: gradio
|
5 |
+
sdk_version: 3.37.0
|
|
|
|
|
6 |
---
|
7 |
+
# PyTorch Image Models
|
8 |
+
- [Sponsors](#sponsors)
|
9 |
+
- [What's New](#whats-new)
|
10 |
+
- [Introduction](#introduction)
|
11 |
+
- [Models](#models)
|
12 |
+
- [Features](#features)
|
13 |
+
- [Results](#results)
|
14 |
+
- [Getting Started (Documentation)](#getting-started-documentation)
|
15 |
+
- [Train, Validation, Inference Scripts](#train-validation-inference-scripts)
|
16 |
+
- [Awesome PyTorch Resources](#awesome-pytorch-resources)
|
17 |
+
- [Licenses](#licenses)
|
18 |
+
- [Citing](#citing)
|
19 |
|
20 |
+
## Sponsors
|
21 |
+
|
22 |
+
Thanks to the following for hardware support:
|
23 |
+
* TPU Research Cloud (TRC) (https://sites.research.google/trc/about/)
|
24 |
+
* Nvidia (https://www.nvidia.com/en-us/)
|
25 |
+
|
26 |
+
And a big thanks to all GitHub sponsors who helped with some of my costs before I joined Hugging Face.
|
27 |
+
|
28 |
+
## What's New
|
29 |
+
|
30 |
+
❗Updates after Oct 10, 2022 are available in version >= 0.9❗
|
31 |
+
* Many changes since the last 0.6.x stable releases. They were previewed in 0.8.x dev releases but not everyone transitioned.
|
32 |
+
* `timm.models.layers` moved to `timm.layers`:
|
33 |
+
* `from timm.models.layers import name` will still work via deprecation mapping (but please transition to `timm.layers`).
|
34 |
+
* `import timm.models.layers.module` or `from timm.models.layers.module import name` needs to be changed now.
|
35 |
+
* Builder, helper, non-model modules in `timm.models` have a `_` prefix added, ie `timm.models.helpers` -> `timm.models._helpers`, there are temporary deprecation mapping files but those will be removed.
|
36 |
+
* All models now support `architecture.pretrained_tag` naming (ex `resnet50.rsb_a1`).
|
37 |
+
* The pretrained_tag is the specific weight variant (different head) for the architecture.
|
38 |
+
* Using only `architecture` defaults to the first weights in the default_cfgs for that model architecture.
|
39 |
+
* In adding pretrained tags, many model names that existed to differentiate were renamed to use the tag (ex: `vit_base_patch16_224_in21k` -> `vit_base_patch16_224.augreg_in21k`). There are deprecation mappings for these.
|
40 |
+
* A number of models had their checkpoints remaped to match architecture changes needed to better support `features_only=True`, there are `checkpoint_filter_fn` methods in any model module that was remapped. These can be passed to `timm.models.load_checkpoint(..., filter_fn=timm.models.swin_transformer_v2.checkpoint_filter_fn)` to remap your existing checkpoint.
|
41 |
+
* The Hugging Face Hub (https://huggingface.co/timm) is now the primary source for `timm` weights. Model cards include link to papers, original source, license.
|
42 |
+
* Previous 0.6.x can be cloned from [0.6.x](https://github.com/rwightman/pytorch-image-models/tree/0.6.x) branch or installed via pip with version.
|
43 |
+
|
44 |
+
### May 11, 2023
|
45 |
+
* `timm` 0.9 released, transition from 0.8.xdev releases
|
46 |
+
|
47 |
+
### May 10, 2023
|
48 |
+
* Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in `timm`
|
49 |
+
* DINOv2 vit feature backbone weights added thanks to [Leng Yue](https://github.com/leng-yue)
|
50 |
+
* FB MAE vit feature backbone weights added
|
51 |
+
* OpenCLIP DataComp-XL L/14 feat backbone weights added
|
52 |
+
* MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by [Fredo Guan](https://github.com/fffffgggg54)
|
53 |
+
* Experimental `get_intermediate_layers` function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
|
54 |
+
* Model creation throws error if `pretrained=True` and no weights exist (instead of continuing with random initialization)
|
55 |
+
* Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
|
56 |
+
* bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use `bnb` prefix, ie `bnbadam8bit`
|
57 |
+
* Misc cleanup and fixes
|
58 |
+
* Final testing before switching to a 0.9 and bringing `timm` out of pre-release state
|
59 |
+
|
60 |
+
### April 27, 2023
|
61 |
+
* 97% of `timm` models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
|
62 |
+
* Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.
|
63 |
+
|
64 |
+
### April 21, 2023
|
65 |
+
* Gradient accumulation support added to train script and tested (`--grad-accum-steps`), thanks [Taeksang Kim](https://github.com/voidbag)
|
66 |
+
* More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
|
67 |
+
* Added `--head-init-scale` and `--head-init-bias` to train.py to scale classiifer head and set fixed bias for fine-tune
|
68 |
+
* Remove all InplaceABN (`inplace_abn`) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
|
69 |
+
|
70 |
+
### April 12, 2023
|
71 |
+
* Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
|
72 |
+
* Refactor dropout args for vit and vit-like models, separate drop_rate into `drop_rate` (classifier dropout), `proj_drop_rate` (block mlp / out projections), `pos_drop_rate` (position embedding drop), `attn_drop_rate` (attention dropout). Also add patch dropout (FLIP) to vit and eva models.
|
73 |
+
* fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
|
74 |
+
* Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
|
75 |
+
|
76 |
+
### April 5, 2023
|
77 |
+
* ALL ResNet models pushed to Hugging Face Hub with multi-weight support
|
78 |
+
* All past `timm` trained weights added with recipe based tags to differentiate
|
79 |
+
* All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
|
80 |
+
* Add torchvision v2 recipe weights to existing torchvision originals
|
81 |
+
* See comparison table in https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison
|
82 |
+
* New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
|
83 |
+
* `resnetaa50d.sw_in12k_ft_in1k` - 81.7 @ 224, 82.6 @ 288
|
84 |
+
* `resnetaa101d.sw_in12k_ft_in1k` - 83.5 @ 224, 84.1 @ 288
|
85 |
+
* `seresnextaa101d_32x8d.sw_in12k_ft_in1k` - 86.0 @ 224, 86.5 @ 288
|
86 |
+
* `seresnextaa101d_32x8d.sw_in12k_ft_in1k_288` - 86.5 @ 288, 86.7 @ 320
|
87 |
+
|
88 |
+
### March 31, 2023
|
89 |
+
* Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
|
90 |
+
|
91 |
+
| model |top1 |top5 |img_size|param_count|gmacs |macts |
|
92 |
+
|----------------------------------------------------------------------------------------------------------------------|------|------|--------|-----------|------|------|
|
93 |
+
| [convnext_xxlarge.clip_laion2b_soup_ft_in1k](https://huggingface.co/timm/convnext_xxlarge.clip_laion2b_soup_ft_in1k) |88.612|98.704|256 |846.47 |198.09|124.45|
|
94 |
+
| convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 |88.312|98.578|384 |200.13 |101.11|126.74|
|
95 |
+
| convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 |87.968|98.47 |320 |200.13 |70.21 |88.02 |
|
96 |
+
| convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 |87.138|98.212|384 |88.59 |45.21 |84.49 |
|
97 |
+
| convnext_base.clip_laion2b_augreg_ft_in12k_in1k |86.344|97.97 |256 |88.59 |20.09 |37.55 |
|
98 |
+
|
99 |
+
* Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
|
100 |
+
|
101 |
+
| model |top1 |top5 |param_count|img_size|
|
102 |
+
|----------------------------------------------------|------|------|-----------|--------|
|
103 |
+
| [eva02_large_patch14_448.mim_m38m_ft_in22k_in1k](https://huggingface.co/timm/eva02_large_patch14_448.mim_m38m_ft_in1k) |90.054|99.042|305.08 |448 |
|
104 |
+
| eva02_large_patch14_448.mim_in22k_ft_in22k_in1k |89.946|99.01 |305.08 |448 |
|
105 |
+
| eva_giant_patch14_560.m30m_ft_in22k_in1k |89.792|98.992|1014.45 |560 |
|
106 |
+
| eva02_large_patch14_448.mim_in22k_ft_in1k |89.626|98.954|305.08 |448 |
|
107 |
+
| eva02_large_patch14_448.mim_m38m_ft_in1k |89.57 |98.918|305.08 |448 |
|
108 |
+
| eva_giant_patch14_336.m30m_ft_in22k_in1k |89.56 |98.956|1013.01 |336 |
|
109 |
+
| eva_giant_patch14_336.clip_ft_in1k |89.466|98.82 |1013.01 |336 |
|
110 |
+
| eva_large_patch14_336.in22k_ft_in22k_in1k |89.214|98.854|304.53 |336 |
|
111 |
+
| eva_giant_patch14_224.clip_ft_in1k |88.882|98.678|1012.56 |224 |
|
112 |
+
| eva02_base_patch14_448.mim_in22k_ft_in22k_in1k |88.692|98.722|87.12 |448 |
|
113 |
+
| eva_large_patch14_336.in22k_ft_in1k |88.652|98.722|304.53 |336 |
|
114 |
+
| eva_large_patch14_196.in22k_ft_in22k_in1k |88.592|98.656|304.14 |196 |
|
115 |
+
| eva02_base_patch14_448.mim_in22k_ft_in1k |88.23 |98.564|87.12 |448 |
|
116 |
+
| eva_large_patch14_196.in22k_ft_in1k |87.934|98.504|304.14 |196 |
|
117 |
+
| eva02_small_patch14_336.mim_in22k_ft_in1k |85.74 |97.614|22.13 |336 |
|
118 |
+
| eva02_tiny_patch14_336.mim_in22k_ft_in1k |80.658|95.524|5.76 |336 |
|
119 |
+
|
120 |
+
* Multi-weight and HF hub for DeiT and MLP-Mixer based models
|
121 |
+
|
122 |
+
### March 22, 2023
|
123 |
+
* More weights pushed to HF hub along with multi-weight support, including: `regnet.py`, `rexnet.py`, `byobnet.py`, `resnetv2.py`, `swin_transformer.py`, `swin_transformer_v2.py`, `swin_transformer_v2_cr.py`
|
124 |
+
* Swin Transformer models support feature extraction (NCHW feat maps for `swinv2_cr_*`, and NHWC for all others) and spatial embedding outputs.
|
125 |
+
* FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
|
126 |
+
* RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
|
127 |
+
* More ImageNet-12k pretrained and 1k fine-tuned `timm` weights:
|
128 |
+
* `rexnetr_200.sw_in12k_ft_in1k` - 82.6 @ 224, 83.2 @ 288
|
129 |
+
* `rexnetr_300.sw_in12k_ft_in1k` - 84.0 @ 224, 84.5 @ 288
|
130 |
+
* `regnety_120.sw_in12k_ft_in1k` - 85.0 @ 224, 85.4 @ 288
|
131 |
+
* `regnety_160.lion_in12k_ft_in1k` - 85.6 @ 224, 86.0 @ 288
|
132 |
+
* `regnety_160.sw_in12k_ft_in1k` - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
|
133 |
+
* Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
|
134 |
+
* Minor bug fixes and improvements.
|
135 |
+
|
136 |
+
### Feb 26, 2023
|
137 |
+
* Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see [model card](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup)
|
138 |
+
* Update `convnext_xxlarge` default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
|
139 |
+
* 0.8.15dev0
|
140 |
+
|
141 |
+
### Feb 20, 2023
|
142 |
+
* Add 320x320 `convnext_large_mlp.clip_laion2b_ft_320` and `convnext_lage_mlp.clip_laion2b_ft_soup_320` CLIP image tower weights for features & fine-tune
|
143 |
+
* 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
|
144 |
+
|
145 |
+
### Feb 16, 2023
|
146 |
+
* `safetensor` checkpoint support added
|
147 |
+
* Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
|
148 |
+
* Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to `vit_*`, `vit_relpos*`, `coatnet` / `maxxvit` (to start)
|
149 |
+
* Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
|
150 |
+
* gradient checkpointing works with `features_only=True`
|
151 |
+
|
152 |
+
### Feb 7, 2023
|
153 |
+
* New inference benchmark numbers added in [results](results/) folder.
|
154 |
+
* Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
|
155 |
+
* `convnext_base.clip_laion2b_augreg_ft_in1k` - 86.2% @ 256x256
|
156 |
+
* `convnext_base.clip_laiona_augreg_ft_in1k_384` - 86.5% @ 384x384
|
157 |
+
* `convnext_large_mlp.clip_laion2b_augreg_ft_in1k` - 87.3% @ 256x256
|
158 |
+
* `convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384` - 87.9% @ 384x384
|
159 |
+
* Add DaViT models. Supports `features_only=True`. Adapted from https://github.com/dingmyu/davit by [Fredo](https://github.com/fffffgggg54).
|
160 |
+
* Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
|
161 |
+
* Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
|
162 |
+
* New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports `features_only=True`.
|
163 |
+
* Minor updates to EfficientFormer.
|
164 |
+
* Refactor LeViT models to stages, add `features_only=True` support to new `conv` variants, weight remap required.
|
165 |
+
* Move ImageNet meta-data (synsets, indices) from `/results` to [`timm/data/_info`](timm/data/_info/).
|
166 |
+
* Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in `timm`
|
167 |
+
* Update `inference.py` to use, try: `python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5`
|
168 |
+
* Ready for 0.8.10 pypi pre-release (final testing).
|
169 |
+
|
170 |
+
### Jan 20, 2023
|
171 |
+
* Add two convnext 12k -> 1k fine-tunes at 384x384
|
172 |
+
* `convnext_tiny.in12k_ft_in1k_384` - 85.1 @ 384
|
173 |
+
* `convnext_small.in12k_ft_in1k_384` - 86.2 @ 384
|
174 |
+
|
175 |
+
* Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for `rw` base MaxViT and CoAtNet 1/2 models
|
176 |
+
|
177 |
+
|model |top1 |top5 |samples / sec |Params (M) |GMAC |Act (M)|
|
178 |
+
|------------------------------------------------------------------------------------------------------------------------|----:|----:|--------------:|--------------:|-----:|------:|
|
179 |
+
|[maxvit_xlarge_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) |88.53|98.64| 21.76| 475.77|534.14|1413.22|
|
180 |
+
|[maxvit_xlarge_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) |88.32|98.54| 42.53| 475.32|292.78| 668.76|
|
181 |
+
|[maxvit_base_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) |88.20|98.53| 50.87| 119.88|138.02| 703.99|
|
182 |
+
|[maxvit_large_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) |88.04|98.40| 36.42| 212.33|244.75| 942.15|
|
183 |
+
|[maxvit_large_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) |87.98|98.56| 71.75| 212.03|132.55| 445.84|
|
184 |
+
|[maxvit_base_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) |87.92|98.54| 104.71| 119.65| 73.80| 332.90|
|
185 |
+
|[maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k) |87.81|98.37| 106.55| 116.14| 70.97| 318.95|
|
186 |
+
|[maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k) |87.47|98.37| 149.49| 116.09| 72.98| 213.74|
|
187 |
+
|[coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k) |87.39|98.31| 160.80| 73.88| 47.69| 209.43|
|
188 |
+
|[maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k) |86.89|98.02| 375.86| 116.14| 23.15| 92.64|
|
189 |
+
|[maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k) |86.64|98.02| 501.03| 116.09| 24.20| 62.77|
|
190 |
+
|[maxvit_base_tf_512.in1k](https://huggingface.co/timm/maxvit_base_tf_512.in1k) |86.60|97.92| 50.75| 119.88|138.02| 703.99|
|
191 |
+
|[coatnet_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_2_rw_224.sw_in12k_ft_in1k) |86.57|97.89| 631.88| 73.87| 15.09| 49.22|
|
192 |
+
|[maxvit_large_tf_512.in1k](https://huggingface.co/timm/maxvit_large_tf_512.in1k) |86.52|97.88| 36.04| 212.33|244.75| 942.15|
|
193 |
+
|[coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k) |86.49|97.90| 620.58| 73.88| 15.18| 54.78|
|
194 |
+
|[maxvit_base_tf_384.in1k](https://huggingface.co/timm/maxvit_base_tf_384.in1k) |86.29|97.80| 101.09| 119.65| 73.80| 332.90|
|
195 |
+
|[maxvit_large_tf_384.in1k](https://huggingface.co/timm/maxvit_large_tf_384.in1k) |86.23|97.69| 70.56| 212.03|132.55| 445.84|
|
196 |
+
|[maxvit_small_tf_512.in1k](https://huggingface.co/timm/maxvit_small_tf_512.in1k) |86.10|97.76| 88.63| 69.13| 67.26| 383.77|
|
197 |
+
|[maxvit_tiny_tf_512.in1k](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) |85.67|97.58| 144.25| 31.05| 33.49| 257.59|
|
198 |
+
|[maxvit_small_tf_384.in1k](https://huggingface.co/timm/maxvit_small_tf_384.in1k) |85.54|97.46| 188.35| 69.02| 35.87| 183.65|
|
199 |
+
|[maxvit_tiny_tf_384.in1k](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) |85.11|97.38| 293.46| 30.98| 17.53| 123.42|
|
200 |
+
|[maxvit_large_tf_224.in1k](https://huggingface.co/timm/maxvit_large_tf_224.in1k) |84.93|96.97| 247.71| 211.79| 43.68| 127.35|
|
201 |
+
|[coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k) |84.90|96.96| 1025.45| 41.72| 8.11| 40.13|
|
202 |
+
|[maxvit_base_tf_224.in1k](https://huggingface.co/timm/maxvit_base_tf_224.in1k) |84.85|96.99| 358.25| 119.47| 24.04| 95.01|
|
203 |
+
|[maxxvit_rmlp_small_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_small_rw_256.sw_in1k) |84.63|97.06| 575.53| 66.01| 14.67| 58.38|
|
204 |
+
|[coatnet_rmlp_2_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in1k) |84.61|96.74| 625.81| 73.88| 15.18| 54.78|
|
205 |
+
|[maxvit_rmlp_small_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_small_rw_224.sw_in1k) |84.49|96.76| 693.82| 64.90| 10.75| 49.30|
|
206 |
+
|[maxvit_small_tf_224.in1k](https://huggingface.co/timm/maxvit_small_tf_224.in1k) |84.43|96.83| 647.96| 68.93| 11.66| 53.17|
|
207 |
+
|[maxvit_rmlp_tiny_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_tiny_rw_256.sw_in1k) |84.23|96.78| 807.21| 29.15| 6.77| 46.92|
|
208 |
+
|[coatnet_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_1_rw_224.sw_in1k) |83.62|96.38| 989.59| 41.72| 8.04| 34.60|
|
209 |
+
|[maxvit_tiny_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_tiny_rw_224.sw_in1k) |83.50|96.50| 1100.53| 29.06| 5.11| 33.11|
|
210 |
+
|[maxvit_tiny_tf_224.in1k](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) |83.41|96.59| 1004.94| 30.92| 5.60| 35.78|
|
211 |
+
|[coatnet_rmlp_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw_224.sw_in1k) |83.36|96.45| 1093.03| 41.69| 7.85| 35.47|
|
212 |
+
|[maxxvitv2_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvitv2_nano_rw_256.sw_in1k) |83.11|96.33| 1276.88| 23.70| 6.26| 23.05|
|
213 |
+
|[maxxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_nano_rw_256.sw_in1k) |83.03|96.34| 1341.24| 16.78| 4.37| 26.05|
|
214 |
+
|[maxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_nano_rw_256.sw_in1k) |82.96|96.26| 1283.24| 15.50| 4.47| 31.92|
|
215 |
+
|[maxvit_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_nano_rw_256.sw_in1k) |82.93|96.23| 1218.17| 15.45| 4.46| 30.28|
|
216 |
+
|[coatnet_bn_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_bn_0_rw_224.sw_in1k) |82.39|96.19| 1600.14| 27.44| 4.67| 22.04|
|
217 |
+
|[coatnet_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_0_rw_224.sw_in1k) |82.39|95.84| 1831.21| 27.44| 4.43| 18.73|
|
218 |
+
|[coatnet_rmlp_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_nano_rw_224.sw_in1k) |82.05|95.87| 2109.09| 15.15| 2.62| 20.34|
|
219 |
+
|[coatnext_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnext_nano_rw_224.sw_in1k) |81.95|95.92| 2525.52| 14.70| 2.47| 12.80|
|
220 |
+
|[coatnet_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_nano_rw_224.sw_in1k) |81.70|95.64| 2344.52| 15.14| 2.41| 15.41|
|
221 |
+
|[maxvit_rmlp_pico_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_pico_rw_256.sw_in1k) |80.53|95.21| 1594.71| 7.52| 1.85| 24.86|
|
222 |
+
|
223 |
+
### Jan 11, 2023
|
224 |
+
* Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT `.in12k` tags)
|
225 |
+
* `convnext_nano.in12k_ft_in1k` - 82.3 @ 224, 82.9 @ 288 (previously released)
|
226 |
+
* `convnext_tiny.in12k_ft_in1k` - 84.2 @ 224, 84.5 @ 288
|
227 |
+
* `convnext_small.in12k_ft_in1k` - 85.2 @ 224, 85.3 @ 288
|
228 |
+
|
229 |
+
### Jan 6, 2023
|
230 |
+
* Finally got around to adding `--model-kwargs` and `--opt-kwargs` to scripts to pass through rare args directly to model classes from cmd line
|
231 |
+
* `train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu`
|
232 |
+
* `train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12`
|
233 |
+
* Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
|
234 |
+
|
235 |
+
### Jan 5, 2023
|
236 |
+
* ConvNeXt-V2 models and weights added to existing `convnext.py`
|
237 |
+
* Paper: [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](http://arxiv.org/abs/2301.00808)
|
238 |
+
* Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
|
239 |
+
|
240 |
+
### Dec 23, 2022 🎄☃
|
241 |
+
* Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
|
242 |
+
* NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
|
243 |
+
* Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
|
244 |
+
* More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
|
245 |
+
* More ImageNet-12k (subset of 22k) pretrain models popping up:
|
246 |
+
* `efficientnet_b5.in12k_ft_in1k` - 85.9 @ 448x448
|
247 |
+
* `vit_medium_patch16_gap_384.in12k_ft_in1k` - 85.5 @ 384x384
|
248 |
+
* `vit_medium_patch16_gap_256.in12k_ft_in1k` - 84.5 @ 256x256
|
249 |
+
* `convnext_nano.in12k_ft_in1k` - 82.9 @ 288x288
|
250 |
+
|
251 |
+
### Dec 8, 2022
|
252 |
+
* Add 'EVA l' to `vision_transformer.py`, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
|
253 |
+
* original source: https://github.com/baaivision/EVA
|
254 |
+
|
255 |
+
| model | top1 | param_count | gmac | macts | hub |
|
256 |
+
|:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------|
|
257 |
+
| eva_large_patch14_336.in22k_ft_in22k_in1k | 89.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
|
258 |
+
| eva_large_patch14_336.in22k_ft_in1k | 88.7 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
|
259 |
+
| eva_large_patch14_196.in22k_ft_in22k_in1k | 88.6 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
|
260 |
+
| eva_large_patch14_196.in22k_ft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
|
261 |
+
|
262 |
+
### Dec 6, 2022
|
263 |
+
* Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to `beit.py`.
|
264 |
+
* original source: https://github.com/baaivision/EVA
|
265 |
+
* paper: https://arxiv.org/abs/2211.07636
|
266 |
+
|
267 |
+
| model | top1 | param_count | gmac | macts | hub |
|
268 |
+
|:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------|
|
269 |
+
| eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | [link](https://huggingface.co/BAAI/EVA) |
|
270 |
+
| eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.6 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
|
271 |
+
| eva_giant_patch14_336.clip_ft_in1k | 89.4 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
|
272 |
+
| eva_giant_patch14_224.clip_ft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | [link](https://huggingface.co/BAAI/EVA) |
|
273 |
+
|
274 |
+
### Dec 5, 2022
|
275 |
+
|
276 |
+
* Pre-release (`0.8.0dev0`) of multi-weight support (`model_arch.pretrained_tag`). Install with `pip install --pre timm`
|
277 |
+
* vision_transformer, maxvit, convnext are the first three model impl w/ support
|
278 |
+
* model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
|
279 |
+
* bugs are likely, but I need feedback so please try it out
|
280 |
+
* if stability is needed, please use 0.6.x pypi releases or clone from [0.6.x branch](https://github.com/rwightman/pytorch-image-models/tree/0.6.x)
|
281 |
+
* Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use `--torchcompile` argument
|
282 |
+
* Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
|
283 |
+
* Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
|
284 |
+
|
285 |
+
| model | top1 | param_count | gmac | macts | hub |
|
286 |
+
|:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------|
|
287 |
+
| vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k) |
|
288 |
+
| vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.openai_ft_in12k_in1k) |
|
289 |
+
| vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k) |
|
290 |
+
| vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k) |
|
291 |
+
| vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k) |
|
292 |
+
| vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1k) |
|
293 |
+
| vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in1k) |
|
294 |
+
| vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in1k) |
|
295 |
+
| vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in1k) |
|
296 |
+
| vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in1k) |
|
297 |
+
| vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k) |
|
298 |
+
| vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k) |
|
299 |
+
| vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k) |
|
300 |
+
| vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k) |
|
301 |
+
| vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k) |
|
302 |
+
| vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k) |
|
303 |
+
| vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k) |
|
304 |
+
| vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k) |
|
305 |
+
| vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k) |
|
306 |
+
| vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k) |
|
307 |
+
| vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k) |
|
308 |
+
| vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k) |
|
309 |
+
| vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k) |
|
310 |
+
| vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k) |
|
311 |
+
|
312 |
+
* Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
|
313 |
+
* There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
|
314 |
+
|
315 |
+
| model | top1 | param_count | gmac | macts | hub |
|
316 |
+
|:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------|
|
317 |
+
| maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) |
|
318 |
+
| maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) |
|
319 |
+
| maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) |
|
320 |
+
| maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) |
|
321 |
+
| maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) |
|
322 |
+
| maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) |
|
323 |
+
| maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in1k) |
|
324 |
+
| maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in1k) |
|
325 |
+
| maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in1k) |
|
326 |
+
| maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in1k) |
|
327 |
+
| maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | [link](https://huggingface.co/timm/maxvit_small_tf_512.in1k) |
|
328 |
+
| maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | [link](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) |
|
329 |
+
| maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | [link](https://huggingface.co/timm/maxvit_small_tf_384.in1k) |
|
330 |
+
| maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | [link](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) |
|
331 |
+
| maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | [link](https://huggingface.co/timm/maxvit_large_tf_224.in1k) |
|
332 |
+
| maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | [link](https://huggingface.co/timm/maxvit_base_tf_224.in1k) |
|
333 |
+
| maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | [link](https://huggingface.co/timm/maxvit_small_tf_224.in1k) |
|
334 |
+
| maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | [link](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) |
|
335 |
+
|
336 |
+
### Oct 15, 2022
|
337 |
+
* Train and validation script enhancements
|
338 |
+
* Non-GPU (ie CPU) device support
|
339 |
+
* SLURM compatibility for train script
|
340 |
+
* HF datasets support (via ReaderHfds)
|
341 |
+
* TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
|
342 |
+
* in_chans !=3 support for scripts / loader
|
343 |
+
* Adan optimizer
|
344 |
+
* Can enable per-step LR scheduling via args
|
345 |
+
* Dataset 'parsers' renamed to 'readers', more descriptive of purpose
|
346 |
+
* AMP args changed, APEX via `--amp-impl apex`, bfloat16 supportedf via `--amp-dtype bfloat16`
|
347 |
+
* main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
|
348 |
+
* master -> main branch rename
|
349 |
+
|
350 |
+
### Oct 10, 2022
|
351 |
+
* More weights in `maxxvit` series, incl first ConvNeXt block based `coatnext` and `maxxvit` experiments:
|
352 |
+
* `coatnext_nano_rw_224` - 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm)
|
353 |
+
* `maxxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
|
354 |
+
* `maxvit_rmlp_small_rw_224` - 84.5 @ 224, 85.1 @ 320 (G)
|
355 |
+
* `maxxvit_rmlp_small_rw_256` - 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
|
356 |
+
* `coatnet_rmlp_2_rw_224` - 84.6 @ 224, 85 @ 320 (T)
|
357 |
+
* NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit -- some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.
|
358 |
+
|
359 |
+
### Sept 23, 2022
|
360 |
+
* LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)
|
361 |
+
* vit_base_patch32_224_clip_laion2b
|
362 |
+
* vit_large_patch14_224_clip_laion2b
|
363 |
+
* vit_huge_patch14_224_clip_laion2b
|
364 |
+
* vit_giant_patch14_224_clip_laion2b
|
365 |
+
|
366 |
+
### Sept 7, 2022
|
367 |
+
* Hugging Face [`timm` docs](https://huggingface.co/docs/hub/timm) home now exists, look for more here in the future
|
368 |
+
* Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
|
369 |
+
* Add more weights in `maxxvit` series incl a `pico` (7.5M params, 1.9 GMACs), two `tiny` variants:
|
370 |
+
* `maxvit_rmlp_pico_rw_256` - 80.5 @ 256, 81.3 @ 320 (T)
|
371 |
+
* `maxvit_tiny_rw_224` - 83.5 @ 224 (G)
|
372 |
+
* `maxvit_rmlp_tiny_rw_256` - 84.2 @ 256, 84.8 @ 320 (T)
|
373 |
+
|
374 |
+
### Aug 29, 2022
|
375 |
+
* MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
|
376 |
+
* `maxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.6 @ 320 (T)
|
377 |
+
|
378 |
+
### Aug 26, 2022
|
379 |
+
* CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697) `timm` original models
|
380 |
+
* both found in [`maxxvit.py`](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/maxxvit.py) model def, contains numerous experiments outside scope of original papers
|
381 |
+
* an unfinished Tensorflow version from MaxVit authors can be found https://github.com/google-research/maxvit
|
382 |
+
* Initial CoAtNet and MaxVit timm pretrained weights (working on more):
|
383 |
+
* `coatnet_nano_rw_224` - 81.7 @ 224 (T)
|
384 |
+
* `coatnet_rmlp_nano_rw_224` - 82.0 @ 224, 82.8 @ 320 (T)
|
385 |
+
* `coatnet_0_rw_224` - 82.4 (T) -- NOTE timm '0' coatnets have 2 more 3rd stage blocks
|
386 |
+
* `coatnet_bn_0_rw_224` - 82.4 (T)
|
387 |
+
* `maxvit_nano_rw_256` - 82.9 @ 256 (T)
|
388 |
+
* `coatnet_rmlp_1_rw_224` - 83.4 @ 224, 84 @ 320 (T)
|
389 |
+
* `coatnet_1_rw_224` - 83.6 @ 224 (G)
|
390 |
+
* (T) = TPU trained with `bits_and_tpu` branch training code, (G) = GPU trained
|
391 |
+
* GCVit (weights adapted from https://github.com/NVlabs/GCVit, code 100% `timm` re-write for license purposes)
|
392 |
+
* MViT-V2 (multi-scale vit, adapted from https://github.com/facebookresearch/mvit)
|
393 |
+
* EfficientFormer (adapted from https://github.com/snap-research/EfficientFormer)
|
394 |
+
* PyramidVisionTransformer-V2 (adapted from https://github.com/whai362/PVT)
|
395 |
+
* 'Fast Norm' support for LayerNorm and GroupNorm that avoids float32 upcast w/ AMP (uses APEX LN if available for further boost)
|
396 |
+
|
397 |
+
|
398 |
+
### Aug 15, 2022
|
399 |
+
* ConvNeXt atto weights added
|
400 |
+
* `convnext_atto` - 75.7 @ 224, 77.0 @ 288
|
401 |
+
* `convnext_atto_ols` - 75.9 @ 224, 77.2 @ 288
|
402 |
+
|
403 |
+
### Aug 5, 2022
|
404 |
+
* More custom ConvNeXt smaller model defs with weights
|
405 |
+
* `convnext_femto` - 77.5 @ 224, 78.7 @ 288
|
406 |
+
* `convnext_femto_ols` - 77.9 @ 224, 78.9 @ 288
|
407 |
+
* `convnext_pico` - 79.5 @ 224, 80.4 @ 288
|
408 |
+
* `convnext_pico_ols` - 79.5 @ 224, 80.5 @ 288
|
409 |
+
* `convnext_nano_ols` - 80.9 @ 224, 81.6 @ 288
|
410 |
+
* Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)
|
411 |
+
|
412 |
+
### July 28, 2022
|
413 |
+
* Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks [Hugo Touvron](https://github.com/TouvronHugo)!
|
414 |
+
|
415 |
+
### July 27, 2022
|
416 |
+
* All runtime benchmark and validation result csv files are finally up-to-date!
|
417 |
+
* A few more weights & model defs added:
|
418 |
+
* `darknetaa53` - 79.8 @ 256, 80.5 @ 288
|
419 |
+
* `convnext_nano` - 80.8 @ 224, 81.5 @ 288
|
420 |
+
* `cs3sedarknet_l` - 81.2 @ 256, 81.8 @ 288
|
421 |
+
* `cs3darknet_x` - 81.8 @ 256, 82.2 @ 288
|
422 |
+
* `cs3sedarknet_x` - 82.2 @ 256, 82.7 @ 288
|
423 |
+
* `cs3edgenet_x` - 82.2 @ 256, 82.7 @ 288
|
424 |
+
* `cs3se_edgenet_x` - 82.8 @ 256, 83.5 @ 320
|
425 |
+
* `cs3*` weights above all trained on TPU w/ `bits_and_tpu` branch. Thanks to TRC program!
|
426 |
+
* Add output_stride=8 and 16 support to ConvNeXt (dilation)
|
427 |
+
* deit3 models not being able to resize pos_emb fixed
|
428 |
+
* Version 0.6.7 PyPi release (/w above bug fixes and new weighs since 0.6.5)
|
429 |
+
|
430 |
+
### July 8, 2022
|
431 |
+
More models, more fixes
|
432 |
+
* Official research models (w/ weights) added:
|
433 |
+
* EdgeNeXt from (https://github.com/mmaaz60/EdgeNeXt)
|
434 |
+
* MobileViT-V2 from (https://github.com/apple/ml-cvnets)
|
435 |
+
* DeiT III (Revenge of the ViT) from (https://github.com/facebookresearch/deit)
|
436 |
+
* My own models:
|
437 |
+
* Small `ResNet` defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
|
438 |
+
* `CspNet` refactored with dataclass config, simplified CrossStage3 (`cs3`) option. These are closer to YOLO-v5+ backbone defs.
|
439 |
+
* More relative position vit fiddling. Two `srelpos` (shared relative position) models trained, and a medium w/ class token.
|
440 |
+
* Add an alternate downsample mode to EdgeNeXt and train a `small` model. Better than original small, but not their new USI trained weights.
|
441 |
+
* My own model weight results (all ImageNet-1k training)
|
442 |
+
* `resnet10t` - 66.5 @ 176, 68.3 @ 224
|
443 |
+
* `resnet14t` - 71.3 @ 176, 72.3 @ 224
|
444 |
+
* `resnetaa50` - 80.6 @ 224 , 81.6 @ 288
|
445 |
+
* `darknet53` - 80.0 @ 256, 80.5 @ 288
|
446 |
+
* `cs3darknet_m` - 77.0 @ 256, 77.6 @ 288
|
447 |
+
* `cs3darknet_focus_m` - 76.7 @ 256, 77.3 @ 288
|
448 |
+
* `cs3darknet_l` - 80.4 @ 256, 80.9 @ 288
|
449 |
+
* `cs3darknet_focus_l` - 80.3 @ 256, 80.9 @ 288
|
450 |
+
* `vit_srelpos_small_patch16_224` - 81.1 @ 224, 82.1 @ 320
|
451 |
+
* `vit_srelpos_medium_patch16_224` - 82.3 @ 224, 83.1 @ 320
|
452 |
+
* `vit_relpos_small_patch16_cls_224` - 82.6 @ 224, 83.6 @ 320
|
453 |
+
* `edgnext_small_rw` - 79.6 @ 224, 80.4 @ 320
|
454 |
+
* `cs3`, `darknet`, and `vit_*relpos` weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
|
455 |
+
* Hugging Face Hub support fixes verified, demo notebook TBA
|
456 |
+
* Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
|
457 |
+
* Add support to change image extensions scanned by `timm` datasets/readers. See (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
|
458 |
+
* Default ConvNeXt LayerNorm impl to use `F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2)` via `LayerNorm2d` in all cases.
|
459 |
+
* a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
|
460 |
+
* previous impl exists as `LayerNormExp2d` in `models/layers/norm.py`
|
461 |
+
* Numerous bug fixes
|
462 |
+
* Currently testing for imminent PyPi 0.6.x release
|
463 |
+
* LeViT pretraining of larger models still a WIP, they don't train well / easily without distillation. Time to add distill support (finally)?
|
464 |
+
* ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) ...
|
465 |
+
|
466 |
+
### May 13, 2022
|
467 |
+
* Official Swin-V2 models and weights added from (https://github.com/microsoft/Swin-Transformer). Cleaned up to support torchscript.
|
468 |
+
* Some refactoring for existing `timm` Swin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects.
|
469 |
+
* More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)
|
470 |
+
* `vit_relpos_small_patch16_224` - 81.5 @ 224, 82.5 @ 320 -- rel pos, layer scale, no class token, avg pool
|
471 |
+
* `vit_relpos_medium_patch16_rpn_224` - 82.3 @ 224, 83.1 @ 320 -- rel pos + res-post-norm, no class token, avg pool
|
472 |
+
* `vit_relpos_medium_patch16_224` - 82.5 @ 224, 83.3 @ 320 -- rel pos, layer scale, no class token, avg pool
|
473 |
+
* `vit_relpos_base_patch16_gapcls_224` - 82.8 @ 224, 83.9 @ 320 -- rel pos, layer scale, class token, avg pool (by mistake)
|
474 |
+
* Bring 512 dim, 8-head 'medium' ViT model variant back to life (after using in a pre DeiT 'small' model for first ViT impl back in 2020)
|
475 |
+
* Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials
|
476 |
+
* Sequencer2D impl (https://arxiv.org/abs/2205.01972), added via PR from author (https://github.com/okojoalg)
|
477 |
+
|
478 |
+
### May 2, 2022
|
479 |
+
* Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (`vision_transformer_relpos.py`) and Residual Post-Norm branches (from Swin-V2) (`vision_transformer*.py`)
|
480 |
+
* `vit_relpos_base_patch32_plus_rpn_256` - 79.5 @ 256, 80.6 @ 320 -- rel pos + extended width + res-post-norm, no class token, avg pool
|
481 |
+
* `vit_relpos_base_patch16_224` - 82.5 @ 224, 83.6 @ 320 -- rel pos, layer scale, no class token, avg pool
|
482 |
+
* `vit_base_patch16_rpn_224` - 82.3 @ 224 -- rel pos + res-post-norm, no class token, avg pool
|
483 |
+
* Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie `How to Train Your ViT`)
|
484 |
+
* `vit_*` models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).
|
485 |
+
|
486 |
+
### April 22, 2022
|
487 |
+
* `timm` models are now officially supported in [fast.ai](https://www.fast.ai/)! Just in time for the new Practical Deep Learning course. `timmdocs` documentation link updated to [timm.fast.ai](http://timm.fast.ai/).
|
488 |
+
* Two more model weights added in the TPU trained [series](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights). Some In22k pretrain still in progress.
|
489 |
+
* `seresnext101d_32x8d` - 83.69 @ 224, 84.35 @ 288
|
490 |
+
* `seresnextaa101d_32x8d` (anti-aliased w/ AvgPool2d) - 83.85 @ 224, 84.57 @ 288
|
491 |
+
|
492 |
+
### March 23, 2022
|
493 |
+
* Add `ParallelBlock` and `LayerScale` option to base vit models to support model configs in [Three things everyone should know about ViT](https://arxiv.org/abs/2203.09795)
|
494 |
+
* `convnext_tiny_hnf` (head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.
|
495 |
+
|
496 |
+
### March 21, 2022
|
497 |
+
* Merge `norm_norm_norm`. **IMPORTANT** this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch [`0.5.x`](https://github.com/rwightman/pytorch-image-models/tree/0.5.x) or a previous 0.5.x release can be used if stability is required.
|
498 |
+
* Significant weights update (all TPU trained) as described in this [release](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights)
|
499 |
+
* `regnety_040` - 82.3 @ 224, 82.96 @ 288
|
500 |
+
* `regnety_064` - 83.0 @ 224, 83.65 @ 288
|
501 |
+
* `regnety_080` - 83.17 @ 224, 83.86 @ 288
|
502 |
+
* `regnetv_040` - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
|
503 |
+
* `regnetv_064` - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
|
504 |
+
* `regnetz_040` - 83.67 @ 256, 84.25 @ 320
|
505 |
+
* `regnetz_040h` - 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)
|
506 |
+
* `resnetv2_50d_gn` - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
|
507 |
+
* `resnetv2_50d_evos` 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
|
508 |
+
* `regnetz_c16_evos` - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
|
509 |
+
* `regnetz_d8_evos` - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
|
510 |
+
* `xception41p` - 82 @ 299 (timm pre-act)
|
511 |
+
* `xception65` - 83.17 @ 299
|
512 |
+
* `xception65p` - 83.14 @ 299 (timm pre-act)
|
513 |
+
* `resnext101_64x4d` - 82.46 @ 224, 83.16 @ 288
|
514 |
+
* `seresnext101_32x8d` - 83.57 @ 224, 84.270 @ 288
|
515 |
+
* `resnetrs200` - 83.85 @ 256, 84.44 @ 320
|
516 |
+
* HuggingFace hub support fixed w/ initial groundwork for allowing alternative 'config sources' for pretrained model definitions and weights (generic local file / remote url support soon)
|
517 |
+
* SwinTransformer-V2 implementation added. Submitted by [Christoph Reich](https://github.com/ChristophReich1996). Training experiments and model changes by myself are ongoing so expect compat breaks.
|
518 |
+
* Swin-S3 (AutoFormerV2) models / weights added from https://github.com/microsoft/Cream/tree/main/AutoFormerV2
|
519 |
+
* MobileViT models w/ weights adapted from https://github.com/apple/ml-cvnets
|
520 |
+
* PoolFormer models w/ weights adapted from https://github.com/sail-sg/poolformer
|
521 |
+
* VOLO models w/ weights adapted from https://github.com/sail-sg/volo
|
522 |
+
* Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc
|
523 |
+
* Enhance support for alternate norm + act ('NormAct') layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception
|
524 |
+
* Grouped conv support added to EfficientNet family
|
525 |
+
* Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler
|
526 |
+
* Gradient checkpointing support added to many models
|
527 |
+
* `forward_head(x, pre_logits=False)` fn added to all models to allow separate calls of `forward_features` + `forward_head`
|
528 |
+
* All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `foward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
|
529 |
+
|
530 |
+
### Feb 2, 2022
|
531 |
+
* [Chris Hughes](https://github.com/Chris-hughes10) posted an exhaustive run through of `timm` on his blog yesterday. Well worth a read. [Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055)
|
532 |
+
* I'm currently prepping to merge the `norm_norm_norm` branch back to master (ver 0.6.x) in next week or so.
|
533 |
+
* The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware `pip install git+https://github.com/rwightman/pytorch-image-models` installs!
|
534 |
+
* `0.5.x` releases and a `0.5.x` branch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.
|
535 |
+
|
536 |
+
### Jan 14, 2022
|
537 |
+
* Version 0.5.4 w/ release to be pushed to pypi. It's been a while since last pypi update and riskier changes will be merged to main branch soon....
|
538 |
+
* Add ConvNeXT models /w weights from official impl (https://github.com/facebookresearch/ConvNeXt), a few perf tweaks, compatible with timm features
|
539 |
+
* Tried training a few small (~1.8-3M param) / mobile optimized models, a few are good so far, more on the way...
|
540 |
+
* `mnasnet_small` - 65.6 top-1
|
541 |
+
* `mobilenetv2_050` - 65.9
|
542 |
+
* `lcnet_100/075/050` - 72.1 / 68.8 / 63.1
|
543 |
+
* `semnasnet_075` - 73
|
544 |
+
* `fbnetv3_b/d/g` - 79.1 / 79.7 / 82.0
|
545 |
+
* TinyNet models added by [rsomani95](https://github.com/rsomani95)
|
546 |
+
* LCNet added via MobileNetV3 architecture
|
547 |
+
|
548 |
+
## Introduction
|
549 |
+
|
550 |
+
Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
|
551 |
+
|
552 |
+
The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code docstrings. Please let me know if I missed anything.
|
553 |
+
|
554 |
+
## Models
|
555 |
+
|
556 |
+
All model architecture families include variants with pretrained weights. There are specific model variants without any weights, it is NOT a bug. Help training new or better weights is always appreciated.
|
557 |
+
|
558 |
+
* Aggregating Nested Transformers - https://arxiv.org/abs/2105.12723
|
559 |
+
* BEiT - https://arxiv.org/abs/2106.08254
|
560 |
+
* Big Transfer ResNetV2 (BiT) - https://arxiv.org/abs/1912.11370
|
561 |
+
* Bottleneck Transformers - https://arxiv.org/abs/2101.11605
|
562 |
+
* CaiT (Class-Attention in Image Transformers) - https://arxiv.org/abs/2103.17239
|
563 |
+
* CoaT (Co-Scale Conv-Attentional Image Transformers) - https://arxiv.org/abs/2104.06399
|
564 |
+
* CoAtNet (Convolution and Attention) - https://arxiv.org/abs/2106.04803
|
565 |
+
* ConvNeXt - https://arxiv.org/abs/2201.03545
|
566 |
+
* ConvNeXt-V2 - http://arxiv.org/abs/2301.00808
|
567 |
+
* ConViT (Soft Convolutional Inductive Biases Vision Transformers)- https://arxiv.org/abs/2103.10697
|
568 |
+
* CspNet (Cross-Stage Partial Networks) - https://arxiv.org/abs/1911.11929
|
569 |
+
* DeiT - https://arxiv.org/abs/2012.12877
|
570 |
+
* DeiT-III - https://arxiv.org/pdf/2204.07118.pdf
|
571 |
+
* DenseNet - https://arxiv.org/abs/1608.06993
|
572 |
+
* DLA - https://arxiv.org/abs/1707.06484
|
573 |
+
* DPN (Dual-Path Network) - https://arxiv.org/abs/1707.01629
|
574 |
+
* EdgeNeXt - https://arxiv.org/abs/2206.10589
|
575 |
+
* EfficientFormer - https://arxiv.org/abs/2206.01191
|
576 |
+
* EfficientNet (MBConvNet Family)
|
577 |
+
* EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
|
578 |
+
* EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
|
579 |
+
* EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
|
580 |
+
* EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
|
581 |
+
* EfficientNet V2 - https://arxiv.org/abs/2104.00298
|
582 |
+
* FBNet-C - https://arxiv.org/abs/1812.03443
|
583 |
+
* MixNet - https://arxiv.org/abs/1907.09595
|
584 |
+
* MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
|
585 |
+
* MobileNet-V2 - https://arxiv.org/abs/1801.04381
|
586 |
+
* Single-Path NAS - https://arxiv.org/abs/1904.02877
|
587 |
+
* TinyNet - https://arxiv.org/abs/2010.14819
|
588 |
+
* EVA - https://arxiv.org/abs/2211.07636
|
589 |
+
* EVA-02 - https://arxiv.org/abs/2303.11331
|
590 |
+
* FlexiViT - https://arxiv.org/abs/2212.08013
|
591 |
+
* FocalNet (Focal Modulation Networks) - https://arxiv.org/abs/2203.11926
|
592 |
+
* GCViT (Global Context Vision Transformer) - https://arxiv.org/abs/2206.09959
|
593 |
+
* GhostNet - https://arxiv.org/abs/1911.11907
|
594 |
+
* gMLP - https://arxiv.org/abs/2105.08050
|
595 |
+
* GPU-Efficient Networks - https://arxiv.org/abs/2006.14090
|
596 |
+
* Halo Nets - https://arxiv.org/abs/2103.12731
|
597 |
+
* HRNet - https://arxiv.org/abs/1908.07919
|
598 |
+
* Inception-V3 - https://arxiv.org/abs/1512.00567
|
599 |
+
* Inception-ResNet-V2 and Inception-V4 - https://arxiv.org/abs/1602.07261
|
600 |
+
* Lambda Networks - https://arxiv.org/abs/2102.08602
|
601 |
+
* LeViT (Vision Transformer in ConvNet's Clothing) - https://arxiv.org/abs/2104.01136
|
602 |
+
* MaxViT (Multi-Axis Vision Transformer) - https://arxiv.org/abs/2204.01697
|
603 |
+
* MLP-Mixer - https://arxiv.org/abs/2105.01601
|
604 |
+
* MobileNet-V3 (MBConvNet w/ Efficient Head) - https://arxiv.org/abs/1905.02244
|
605 |
+
* FBNet-V3 - https://arxiv.org/abs/2006.02049
|
606 |
+
* HardCoRe-NAS - https://arxiv.org/abs/2102.11646
|
607 |
+
* LCNet - https://arxiv.org/abs/2109.15099
|
608 |
+
* MobileViT - https://arxiv.org/abs/2110.02178
|
609 |
+
* MobileViT-V2 - https://arxiv.org/abs/2206.02680
|
610 |
+
* MViT-V2 (Improved Multiscale Vision Transformer) - https://arxiv.org/abs/2112.01526
|
611 |
+
* NASNet-A - https://arxiv.org/abs/1707.07012
|
612 |
+
* NesT - https://arxiv.org/abs/2105.12723
|
613 |
+
* NFNet-F - https://arxiv.org/abs/2102.06171
|
614 |
+
* NF-RegNet / NF-ResNet - https://arxiv.org/abs/2101.08692
|
615 |
+
* PNasNet - https://arxiv.org/abs/1712.00559
|
616 |
+
* PoolFormer (MetaFormer) - https://arxiv.org/abs/2111.11418
|
617 |
+
* Pooling-based Vision Transformer (PiT) - https://arxiv.org/abs/2103.16302
|
618 |
+
* PVT-V2 (Improved Pyramid Vision Transformer) - https://arxiv.org/abs/2106.13797
|
619 |
+
* RegNet - https://arxiv.org/abs/2003.13678
|
620 |
+
* RegNetZ - https://arxiv.org/abs/2103.06877
|
621 |
+
* RepVGG - https://arxiv.org/abs/2101.03697
|
622 |
+
* ResMLP - https://arxiv.org/abs/2105.03404
|
623 |
+
* ResNet/ResNeXt
|
624 |
+
* ResNet (v1b/v1.5) - https://arxiv.org/abs/1512.03385
|
625 |
+
* ResNeXt - https://arxiv.org/abs/1611.05431
|
626 |
+
* 'Bag of Tricks' / Gluon C, D, E, S variations - https://arxiv.org/abs/1812.01187
|
627 |
+
* Weakly-supervised (WSL) Instagram pretrained / ImageNet tuned ResNeXt101 - https://arxiv.org/abs/1805.00932
|
628 |
+
* Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet/ResNeXts - https://arxiv.org/abs/1905.00546
|
629 |
+
* ECA-Net (ECAResNet) - https://arxiv.org/abs/1910.03151v4
|
630 |
+
* Squeeze-and-Excitation Networks (SEResNet) - https://arxiv.org/abs/1709.01507
|
631 |
+
* ResNet-RS - https://arxiv.org/abs/2103.07579
|
632 |
+
* Res2Net - https://arxiv.org/abs/1904.01169
|
633 |
+
* ResNeSt - https://arxiv.org/abs/2004.08955
|
634 |
+
* ReXNet - https://arxiv.org/abs/2007.00992
|
635 |
+
* SelecSLS - https://arxiv.org/abs/1907.00837
|
636 |
+
* Selective Kernel Networks - https://arxiv.org/abs/1903.06586
|
637 |
+
* Sequencer2D - https://arxiv.org/abs/2205.01972
|
638 |
+
* Swin S3 (AutoFormerV2) - https://arxiv.org/abs/2111.14725
|
639 |
+
* Swin Transformer - https://arxiv.org/abs/2103.14030
|
640 |
+
* Swin Transformer V2 - https://arxiv.org/abs/2111.09883
|
641 |
+
* Transformer-iN-Transformer (TNT) - https://arxiv.org/abs/2103.00112
|
642 |
+
* TResNet - https://arxiv.org/abs/2003.13630
|
643 |
+
* Twins (Spatial Attention in Vision Transformers) - https://arxiv.org/pdf/2104.13840.pdf
|
644 |
+
* Visformer - https://arxiv.org/abs/2104.12533
|
645 |
+
* Vision Transformer - https://arxiv.org/abs/2010.11929
|
646 |
+
* VOLO (Vision Outlooker) - https://arxiv.org/abs/2106.13112
|
647 |
+
* VovNet V2 and V1 - https://arxiv.org/abs/1911.06667
|
648 |
+
* Xception - https://arxiv.org/abs/1610.02357
|
649 |
+
* Xception (Modified Aligned, Gluon) - https://arxiv.org/abs/1802.02611
|
650 |
+
* Xception (Modified Aligned, TF) - https://arxiv.org/abs/1802.02611
|
651 |
+
* XCiT (Cross-Covariance Image Transformers) - https://arxiv.org/abs/2106.09681
|
652 |
+
|
653 |
+
## Features
|
654 |
+
|
655 |
+
Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:
|
656 |
+
|
657 |
+
* All models have a common default configuration interface and API for
|
658 |
+
* accessing/changing the classifier - `get_classifier` and `reset_classifier`
|
659 |
+
* doing a forward pass on just the features - `forward_features` (see [documentation](https://huggingface.co/docs/timm/feature_extraction))
|
660 |
+
* these makes it easy to write consistent network wrappers that work with any of the models
|
661 |
+
* All models support multi-scale feature map extraction (feature pyramids) via create_model (see [documentation](https://huggingface.co/docs/timm/feature_extraction))
|
662 |
+
* `create_model(name, features_only=True, out_indices=..., output_stride=...)`
|
663 |
+
* `out_indices` creation arg specifies which feature maps to return, these indices are 0 based and generally correspond to the `C(i + 1)` feature level.
|
664 |
+
* `output_stride` creation arg controls output stride of the network by using dilated convolutions. Most networks are stride 32 by default. Not all networks support this.
|
665 |
+
* feature map channel counts, reduction level (stride) can be queried AFTER model creation via the `.feature_info` member
|
666 |
+
* All models have a consistent pretrained weight loader that adapts last linear if necessary, and from 3 to 1 channel input if desired
|
667 |
+
* High performance [reference training, validation, and inference scripts](https://huggingface.co/docs/timm/training_script) that work in several process/GPU modes:
|
668 |
+
* NVIDIA DDP w/ a single GPU per process, multiple processes with APEX present (AMP mixed-precision optional)
|
669 |
+
* PyTorch DistributedDataParallel w/ multi-gpu, single process (AMP disabled as it crashes when enabled)
|
670 |
+
* PyTorch w/ single GPU single process (AMP optional)
|
671 |
+
* A dynamic global pool implementation that allows selecting from average pooling, max pooling, average + max, or concat([average, max]) at model creation. All global pooling is adaptive average by default and compatible with pretrained weights.
|
672 |
+
* A 'Test Time Pool' wrapper that can wrap any of the included models and usually provides improved performance doing inference with input images larger than the training size. Idea adapted from original DPN implementation when I ported (https://github.com/cypw/DPNs)
|
673 |
+
* Learning rate schedulers
|
674 |
+
* Ideas adopted from
|
675 |
+
* [AllenNLP schedulers](https://github.com/allenai/allennlp/tree/master/allennlp/training/learning_rate_schedulers)
|
676 |
+
* [FAIRseq lr_scheduler](https://github.com/pytorch/fairseq/tree/master/fairseq/optim/lr_scheduler)
|
677 |
+
* SGDR: Stochastic Gradient Descent with Warm Restarts (https://arxiv.org/abs/1608.03983)
|
678 |
+
* Schedulers include `step`, `cosine` w/ restarts, `tanh` w/ restarts, `plateau`
|
679 |
+
* Optimizers:
|
680 |
+
* `rmsprop_tf` adapted from PyTorch RMSProp by myself. Reproduces much improved Tensorflow RMSProp behaviour.
|
681 |
+
* `radam` by [Liyuan Liu](https://github.com/LiyuanLucasLiu/RAdam) (https://arxiv.org/abs/1908.03265)
|
682 |
+
* `novograd` by [Masashi Kimura](https://github.com/convergence-lab/novograd) (https://arxiv.org/abs/1905.11286)
|
683 |
+
* `lookahead` adapted from impl by [Liam](https://github.com/alphadl/lookahead.pytorch) (https://arxiv.org/abs/1907.08610)
|
684 |
+
* `fused<name>` optimizers by name with [NVIDIA Apex](https://github.com/NVIDIA/apex/tree/master/apex/optimizers) installed
|
685 |
+
* `adamp` and `sgdp` by [Naver ClovAI](https://github.com/clovaai) (https://arxiv.org/abs/2006.08217)
|
686 |
+
* `adafactor` adapted from [FAIRSeq impl](https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py) (https://arxiv.org/abs/1804.04235)
|
687 |
+
* `adahessian` by [David Samuel](https://github.com/davda54/ada-hessian) (https://arxiv.org/abs/2006.00719)
|
688 |
+
* Random Erasing from [Zhun Zhong](https://github.com/zhunzhong07/Random-Erasing/blob/master/transforms.py) (https://arxiv.org/abs/1708.04896)
|
689 |
+
* Mixup (https://arxiv.org/abs/1710.09412)
|
690 |
+
* CutMix (https://arxiv.org/abs/1905.04899)
|
691 |
+
* AutoAugment (https://arxiv.org/abs/1805.09501) and RandAugment (https://arxiv.org/abs/1909.13719) ImageNet configurations modeled after impl for EfficientNet training (https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/autoaugment.py)
|
692 |
+
* AugMix w/ JSD loss (https://arxiv.org/abs/1912.02781), JSD w/ clean + augmented mixing support works with AutoAugment and RandAugment as well
|
693 |
+
* SplitBachNorm - allows splitting batch norm layers between clean and augmented (auxiliary batch norm) data
|
694 |
+
* DropPath aka "Stochastic Depth" (https://arxiv.org/abs/1603.09382)
|
695 |
+
* DropBlock (https://arxiv.org/abs/1810.12890)
|
696 |
+
* Blur Pooling (https://arxiv.org/abs/1904.11486)
|
697 |
+
* Space-to-Depth by [mrT23](https://github.com/mrT23/TResNet/blob/master/src/models/tresnet/layers/space_to_depth.py) (https://arxiv.org/abs/1801.04590) -- original paper?
|
698 |
+
* Adaptive Gradient Clipping (https://arxiv.org/abs/2102.06171, https://github.com/deepmind/deepmind-research/tree/master/nfnets)
|
699 |
+
* An extensive selection of channel and/or spatial attention modules:
|
700 |
+
* Bottleneck Transformer - https://arxiv.org/abs/2101.11605
|
701 |
+
* CBAM - https://arxiv.org/abs/1807.06521
|
702 |
+
* Effective Squeeze-Excitation (ESE) - https://arxiv.org/abs/1911.06667
|
703 |
+
* Efficient Channel Attention (ECA) - https://arxiv.org/abs/1910.03151
|
704 |
+
* Gather-Excite (GE) - https://arxiv.org/abs/1810.12348
|
705 |
+
* Global Context (GC) - https://arxiv.org/abs/1904.11492
|
706 |
+
* Halo - https://arxiv.org/abs/2103.12731
|
707 |
+
* Involution - https://arxiv.org/abs/2103.06255
|
708 |
+
* Lambda Layer - https://arxiv.org/abs/2102.08602
|
709 |
+
* Non-Local (NL) - https://arxiv.org/abs/1711.07971
|
710 |
+
* Squeeze-and-Excitation (SE) - https://arxiv.org/abs/1709.01507
|
711 |
+
* Selective Kernel (SK) - (https://arxiv.org/abs/1903.06586
|
712 |
+
* Split (SPLAT) - https://arxiv.org/abs/2004.08955
|
713 |
+
* Shifted Window (SWIN) - https://arxiv.org/abs/2103.14030
|
714 |
+
|
715 |
+
## Results
|
716 |
+
|
717 |
+
Model validation results can be found in the [results tables](results/README.md)
|
718 |
+
|
719 |
+
## Getting Started (Documentation)
|
720 |
+
|
721 |
+
The official documentation can be found at https://huggingface.co/docs/hub/timm. Documentation contributions are welcome.
|
722 |
+
|
723 |
+
[Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055) by [Chris Hughes](https://github.com/Chris-hughes10) is an extensive blog post covering many aspects of `timm` in detail.
|
724 |
+
|
725 |
+
[timmdocs](http://timm.fast.ai/) is an alternate set of documentation for `timm`. A big thanks to [Aman Arora](https://github.com/amaarora) for his efforts creating timmdocs.
|
726 |
+
|
727 |
+
[paperswithcode](https://paperswithcode.com/lib/timm) is a good resource for browsing the models within `timm`.
|
728 |
+
|
729 |
+
## Train, Validation, Inference Scripts
|
730 |
+
|
731 |
+
The root folder of the repository contains reference train, validation, and inference scripts that work with the included models and other features of this repository. They are adaptable for other datasets and use cases with a little hacking. See [documentation](https://huggingface.co/docs/timm/training_script).
|
732 |
+
|
733 |
+
## Awesome PyTorch Resources
|
734 |
+
|
735 |
+
One of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and components here are listed below.
|
736 |
+
|
737 |
+
### Object Detection, Instance and Semantic Segmentation
|
738 |
+
* Detectron2 - https://github.com/facebookresearch/detectron2
|
739 |
+
* Segmentation Models (Semantic) - https://github.com/qubvel/segmentation_models.pytorch
|
740 |
+
* EfficientDet (Obj Det, Semantic soon) - https://github.com/rwightman/efficientdet-pytorch
|
741 |
+
|
742 |
+
### Computer Vision / Image Augmentation
|
743 |
+
* Albumentations - https://github.com/albumentations-team/albumentations
|
744 |
+
* Kornia - https://github.com/kornia/kornia
|
745 |
+
|
746 |
+
### Knowledge Distillation
|
747 |
+
* RepDistiller - https://github.com/HobbitLong/RepDistiller
|
748 |
+
* torchdistill - https://github.com/yoshitomo-matsubara/torchdistill
|
749 |
+
|
750 |
+
### Metric Learning
|
751 |
+
* PyTorch Metric Learning - https://github.com/KevinMusgrave/pytorch-metric-learning
|
752 |
+
|
753 |
+
### Training / Frameworks
|
754 |
+
* fastai - https://github.com/fastai/fastai
|
755 |
+
|
756 |
+
## Licenses
|
757 |
+
|
758 |
+
### Code
|
759 |
+
The code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL / LGPL conflicts. That said, it is your responsibility to ensure you comply with licenses here and conditions of any dependent licenses. Where applicable, I've linked the sources/references for various components in docstrings. If you think I've missed anything please create an issue.
|
760 |
+
|
761 |
+
### Pretrained Weights
|
762 |
+
So far all of the pretrained weights available here are pretrained on ImageNet with a select few that have some additional pretraining (see extra note below). ImageNet was released for non-commercial research purposes only (https://image-net.org/download). It's not clear what the implications of that are for the use of pretrained weights from that dataset. Any models I have trained with ImageNet are done for research purposes and one should assume that the original dataset license applies to the weights. It's best to seek legal advice if you intend to use the pretrained weights in a commercial product.
|
763 |
+
|
764 |
+
#### Pretrained on more than ImageNet
|
765 |
+
Several weights included or references here were pretrained with proprietary datasets that I do not have access to. These include the Facebook WSL, SSL, SWSL ResNe(Xt) and the Google Noisy Student EfficientNet models. The Facebook models have an explicit non-commercial license (CC-BY-NC 4.0, https://github.com/facebookresearch/semi-supervised-ImageNet1K-models, https://github.com/facebookresearch/WSL-Images). The Google models do not appear to have any restriction beyond the Apache 2.0 license (and ImageNet concerns). In either case, you should contact Facebook or Google with any questions.
|
766 |
+
|
767 |
+
## Citing
|
768 |
+
|
769 |
+
### BibTeX
|
770 |
+
|
771 |
+
```bibtex
|
772 |
+
@misc{rw2019timm,
|
773 |
+
author = {Ross Wightman},
|
774 |
+
title = {PyTorch Image Models},
|
775 |
+
year = {2019},
|
776 |
+
publisher = {GitHub},
|
777 |
+
journal = {GitHub repository},
|
778 |
+
doi = {10.5281/zenodo.4414861},
|
779 |
+
howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
|
780 |
+
}
|
781 |
+
```
|
782 |
+
|
783 |
+
### Latest DOI
|
784 |
+
|
785 |
+
[![DOI](https://zenodo.org/badge/168799526.svg)](https://zenodo.org/badge/latestdoi/168799526)
|
avg_checkpoints.py
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
""" Checkpoint Averaging Script
|
3 |
+
|
4 |
+
This script averages all model weights for checkpoints in specified path that match
|
5 |
+
the specified filter wildcard. All checkpoints must be from the exact same model.
|
6 |
+
|
7 |
+
For any hope of decent results, the checkpoints should be from the same or child
|
8 |
+
(via resumes) training session. This can be viewed as similar to maintaining running
|
9 |
+
EMA (exponential moving average) of the model weights or performing SWA (stochastic
|
10 |
+
weight averaging), but post-training.
|
11 |
+
|
12 |
+
Hacked together by / Copyright 2020 Ross Wightman (https://github.com/rwightman)
|
13 |
+
"""
|
14 |
+
import torch
|
15 |
+
import argparse
|
16 |
+
import os
|
17 |
+
import glob
|
18 |
+
import hashlib
|
19 |
+
from timm.models import load_state_dict
|
20 |
+
try:
|
21 |
+
import safetensors.torch
|
22 |
+
_has_safetensors = True
|
23 |
+
except ImportError:
|
24 |
+
_has_safetensors = False
|
25 |
+
|
26 |
+
DEFAULT_OUTPUT = "./averaged.pth"
|
27 |
+
DEFAULT_SAFE_OUTPUT = "./averaged.safetensors"
|
28 |
+
|
29 |
+
parser = argparse.ArgumentParser(description='PyTorch Checkpoint Averager')
|
30 |
+
parser.add_argument('--input', default='', type=str, metavar='PATH',
|
31 |
+
help='path to base input folder containing checkpoints')
|
32 |
+
parser.add_argument('--filter', default='*.pth.tar', type=str, metavar='WILDCARD',
|
33 |
+
help='checkpoint filter (path wildcard)')
|
34 |
+
parser.add_argument('--output', default=DEFAULT_OUTPUT, type=str, metavar='PATH',
|
35 |
+
help=f'Output filename. Defaults to {DEFAULT_SAFE_OUTPUT} when passing --safetensors.')
|
36 |
+
parser.add_argument('--no-use-ema', dest='no_use_ema', action='store_true',
|
37 |
+
help='Force not using ema version of weights (if present)')
|
38 |
+
parser.add_argument('--no-sort', dest='no_sort', action='store_true',
|
39 |
+
help='Do not sort and select by checkpoint metric, also makes "n" argument irrelevant')
|
40 |
+
parser.add_argument('-n', type=int, default=10, metavar='N',
|
41 |
+
help='Number of checkpoints to average')
|
42 |
+
parser.add_argument('--safetensors', action='store_true',
|
43 |
+
help='Save weights using safetensors instead of the default torch way (pickle).')
|
44 |
+
|
45 |
+
|
46 |
+
def checkpoint_metric(checkpoint_path):
|
47 |
+
if not checkpoint_path or not os.path.isfile(checkpoint_path):
|
48 |
+
return {}
|
49 |
+
print("=> Extracting metric from checkpoint '{}'".format(checkpoint_path))
|
50 |
+
checkpoint = torch.load(checkpoint_path, map_location='cpu')
|
51 |
+
metric = None
|
52 |
+
if 'metric' in checkpoint:
|
53 |
+
metric = checkpoint['metric']
|
54 |
+
elif 'metrics' in checkpoint and 'metric_name' in checkpoint:
|
55 |
+
metrics = checkpoint['metrics']
|
56 |
+
print(metrics)
|
57 |
+
metric = metrics[checkpoint['metric_name']]
|
58 |
+
return metric
|
59 |
+
|
60 |
+
|
61 |
+
def main():
|
62 |
+
args = parser.parse_args()
|
63 |
+
# by default use the EMA weights (if present)
|
64 |
+
args.use_ema = not args.no_use_ema
|
65 |
+
# by default sort by checkpoint metric (if present) and avg top n checkpoints
|
66 |
+
args.sort = not args.no_sort
|
67 |
+
|
68 |
+
if args.safetensors and args.output == DEFAULT_OUTPUT:
|
69 |
+
# Default path changes if using safetensors
|
70 |
+
args.output = DEFAULT_SAFE_OUTPUT
|
71 |
+
|
72 |
+
output, output_ext = os.path.splitext(args.output)
|
73 |
+
if not output_ext:
|
74 |
+
output_ext = ('.safetensors' if args.safetensors else '.pth')
|
75 |
+
output = output + output_ext
|
76 |
+
|
77 |
+
if args.safetensors and not output_ext == ".safetensors":
|
78 |
+
print(
|
79 |
+
"Warning: saving weights as safetensors but output file extension is not "
|
80 |
+
f"set to '.safetensors': {args.output}"
|
81 |
+
)
|
82 |
+
|
83 |
+
if os.path.exists(output):
|
84 |
+
print("Error: Output filename ({}) already exists.".format(output))
|
85 |
+
exit(1)
|
86 |
+
|
87 |
+
pattern = args.input
|
88 |
+
if not args.input.endswith(os.path.sep) and not args.filter.startswith(os.path.sep):
|
89 |
+
pattern += os.path.sep
|
90 |
+
pattern += args.filter
|
91 |
+
checkpoints = glob.glob(pattern, recursive=True)
|
92 |
+
|
93 |
+
if args.sort:
|
94 |
+
checkpoint_metrics = []
|
95 |
+
for c in checkpoints:
|
96 |
+
metric = checkpoint_metric(c)
|
97 |
+
if metric is not None:
|
98 |
+
checkpoint_metrics.append((metric, c))
|
99 |
+
checkpoint_metrics = list(sorted(checkpoint_metrics))
|
100 |
+
checkpoint_metrics = checkpoint_metrics[-args.n:]
|
101 |
+
if checkpoint_metrics:
|
102 |
+
print("Selected checkpoints:")
|
103 |
+
[print(m, c) for m, c in checkpoint_metrics]
|
104 |
+
avg_checkpoints = [c for m, c in checkpoint_metrics]
|
105 |
+
else:
|
106 |
+
avg_checkpoints = checkpoints
|
107 |
+
if avg_checkpoints:
|
108 |
+
print("Selected checkpoints:")
|
109 |
+
[print(c) for c in checkpoints]
|
110 |
+
|
111 |
+
if not avg_checkpoints:
|
112 |
+
print('Error: No checkpoints found to average.')
|
113 |
+
exit(1)
|
114 |
+
|
115 |
+
avg_state_dict = {}
|
116 |
+
avg_counts = {}
|
117 |
+
for c in avg_checkpoints:
|
118 |
+
new_state_dict = load_state_dict(c, args.use_ema)
|
119 |
+
if not new_state_dict:
|
120 |
+
print(f"Error: Checkpoint ({c}) doesn't exist")
|
121 |
+
continue
|
122 |
+
for k, v in new_state_dict.items():
|
123 |
+
if k not in avg_state_dict:
|
124 |
+
avg_state_dict[k] = v.clone().to(dtype=torch.float64)
|
125 |
+
avg_counts[k] = 1
|
126 |
+
else:
|
127 |
+
avg_state_dict[k] += v.to(dtype=torch.float64)
|
128 |
+
avg_counts[k] += 1
|
129 |
+
|
130 |
+
for k, v in avg_state_dict.items():
|
131 |
+
v.div_(avg_counts[k])
|
132 |
+
|
133 |
+
# float32 overflow seems unlikely based on weights seen to date, but who knows
|
134 |
+
float32_info = torch.finfo(torch.float32)
|
135 |
+
final_state_dict = {}
|
136 |
+
for k, v in avg_state_dict.items():
|
137 |
+
v = v.clamp(float32_info.min, float32_info.max)
|
138 |
+
final_state_dict[k] = v.to(dtype=torch.float32)
|
139 |
+
|
140 |
+
if args.safetensors:
|
141 |
+
assert _has_safetensors, "`pip install safetensors` to use .safetensors"
|
142 |
+
safetensors.torch.save_file(final_state_dict, output)
|
143 |
+
else:
|
144 |
+
torch.save(final_state_dict, output)
|
145 |
+
|
146 |
+
with open(output, 'rb') as f:
|
147 |
+
sha_hash = hashlib.sha256(f.read()).hexdigest()
|
148 |
+
print(f"=> Saved state_dict to '{output}, SHA256: {sha_hash}'")
|
149 |
+
|
150 |
+
|
151 |
+
if __name__ == '__main__':
|
152 |
+
main()
|
benchmark.py
ADDED
@@ -0,0 +1,696 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
""" Model Benchmark Script
|
3 |
+
|
4 |
+
An inference and train step benchmark script for timm models.
|
5 |
+
|
6 |
+
Hacked together by Ross Wightman (https://github.com/rwightman)
|
7 |
+
"""
|
8 |
+
import argparse
|
9 |
+
import csv
|
10 |
+
import json
|
11 |
+
import logging
|
12 |
+
import time
|
13 |
+
from collections import OrderedDict
|
14 |
+
from contextlib import suppress
|
15 |
+
from functools import partial
|
16 |
+
|
17 |
+
import torch
|
18 |
+
import torch.nn as nn
|
19 |
+
import torch.nn.parallel
|
20 |
+
|
21 |
+
from timm.data import resolve_data_config
|
22 |
+
from timm.layers import set_fast_norm
|
23 |
+
from timm.models import create_model, is_model, list_models
|
24 |
+
from timm.optim import create_optimizer_v2
|
25 |
+
from timm.utils import setup_default_logging, set_jit_fuser, decay_batch_step, check_batch_size_retry, ParseKwargs
|
26 |
+
|
27 |
+
has_apex = False
|
28 |
+
try:
|
29 |
+
from apex import amp
|
30 |
+
has_apex = True
|
31 |
+
except ImportError:
|
32 |
+
pass
|
33 |
+
|
34 |
+
has_native_amp = False
|
35 |
+
try:
|
36 |
+
if getattr(torch.cuda.amp, 'autocast') is not None:
|
37 |
+
has_native_amp = True
|
38 |
+
except AttributeError:
|
39 |
+
pass
|
40 |
+
|
41 |
+
try:
|
42 |
+
from deepspeed.profiling.flops_profiler import get_model_profile
|
43 |
+
has_deepspeed_profiling = True
|
44 |
+
except ImportError as e:
|
45 |
+
has_deepspeed_profiling = False
|
46 |
+
|
47 |
+
try:
|
48 |
+
from fvcore.nn import FlopCountAnalysis, flop_count_str, ActivationCountAnalysis
|
49 |
+
has_fvcore_profiling = True
|
50 |
+
except ImportError as e:
|
51 |
+
FlopCountAnalysis = None
|
52 |
+
has_fvcore_profiling = False
|
53 |
+
|
54 |
+
try:
|
55 |
+
from functorch.compile import memory_efficient_fusion
|
56 |
+
has_functorch = True
|
57 |
+
except ImportError as e:
|
58 |
+
has_functorch = False
|
59 |
+
|
60 |
+
has_compile = hasattr(torch, 'compile')
|
61 |
+
|
62 |
+
if torch.cuda.is_available():
|
63 |
+
torch.backends.cuda.matmul.allow_tf32 = True
|
64 |
+
torch.backends.cudnn.benchmark = True
|
65 |
+
_logger = logging.getLogger('validate')
|
66 |
+
|
67 |
+
|
68 |
+
parser = argparse.ArgumentParser(description='PyTorch Benchmark')
|
69 |
+
|
70 |
+
# benchmark specific args
|
71 |
+
parser.add_argument('--model-list', metavar='NAME', default='',
|
72 |
+
help='txt file based list of model names to benchmark')
|
73 |
+
parser.add_argument('--bench', default='both', type=str,
|
74 |
+
help="Benchmark mode. One of 'inference', 'train', 'both'. Defaults to 'both'")
|
75 |
+
parser.add_argument('--detail', action='store_true', default=False,
|
76 |
+
help='Provide train fwd/bwd/opt breakdown detail if True. Defaults to False')
|
77 |
+
parser.add_argument('--no-retry', action='store_true', default=False,
|
78 |
+
help='Do not decay batch size and retry on error.')
|
79 |
+
parser.add_argument('--results-file', default='', type=str,
|
80 |
+
help='Output csv file for validation results (summary)')
|
81 |
+
parser.add_argument('--results-format', default='csv', type=str,
|
82 |
+
help='Format for results file one of (csv, json) (default: csv).')
|
83 |
+
parser.add_argument('--num-warm-iter', default=10, type=int,
|
84 |
+
help='Number of warmup iterations (default: 10)')
|
85 |
+
parser.add_argument('--num-bench-iter', default=40, type=int,
|
86 |
+
help='Number of benchmark iterations (default: 40)')
|
87 |
+
parser.add_argument('--device', default='cuda', type=str,
|
88 |
+
help="device to run benchmark on")
|
89 |
+
|
90 |
+
# common inference / train args
|
91 |
+
parser.add_argument('--model', '-m', metavar='NAME', default='resnet50',
|
92 |
+
help='model architecture (default: resnet50)')
|
93 |
+
parser.add_argument('-b', '--batch-size', default=256, type=int,
|
94 |
+
metavar='N', help='mini-batch size (default: 256)')
|
95 |
+
parser.add_argument('--img-size', default=None, type=int,
|
96 |
+
metavar='N', help='Input image dimension, uses model default if empty')
|
97 |
+
parser.add_argument('--input-size', default=None, nargs=3, type=int,
|
98 |
+
metavar='N N N', help='Input all image dimensions (d h w, e.g. --input-size 3 224 224), uses model default if empty')
|
99 |
+
parser.add_argument('--use-train-size', action='store_true', default=False,
|
100 |
+
help='Run inference at train size, not test-input-size if it exists.')
|
101 |
+
parser.add_argument('--num-classes', type=int, default=None,
|
102 |
+
help='Number classes in dataset')
|
103 |
+
parser.add_argument('--gp', default=None, type=str, metavar='POOL',
|
104 |
+
help='Global pool type, one of (fast, avg, max, avgmax, avgmaxc). Model default if None.')
|
105 |
+
parser.add_argument('--channels-last', action='store_true', default=False,
|
106 |
+
help='Use channels_last memory layout')
|
107 |
+
parser.add_argument('--grad-checkpointing', action='store_true', default=False,
|
108 |
+
help='Enable gradient checkpointing through model blocks/stages')
|
109 |
+
parser.add_argument('--amp', action='store_true', default=False,
|
110 |
+
help='use PyTorch Native AMP for mixed precision training. Overrides --precision arg.')
|
111 |
+
parser.add_argument('--amp-dtype', default='float16', type=str,
|
112 |
+
help='lower precision AMP dtype (default: float16). Overrides --precision arg if args.amp True.')
|
113 |
+
parser.add_argument('--precision', default='float32', type=str,
|
114 |
+
help='Numeric precision. One of (amp, float32, float16, bfloat16, tf32)')
|
115 |
+
parser.add_argument('--fuser', default='', type=str,
|
116 |
+
help="Select jit fuser. One of ('', 'te', 'old', 'nvfuser')")
|
117 |
+
parser.add_argument('--fast-norm', default=False, action='store_true',
|
118 |
+
help='enable experimental fast-norm')
|
119 |
+
parser.add_argument('--model-kwargs', nargs='*', default={}, action=ParseKwargs)
|
120 |
+
|
121 |
+
# codegen (model compilation) options
|
122 |
+
scripting_group = parser.add_mutually_exclusive_group()
|
123 |
+
scripting_group.add_argument('--torchscript', dest='torchscript', action='store_true',
|
124 |
+
help='convert model torchscript for inference')
|
125 |
+
scripting_group.add_argument('--torchcompile', nargs='?', type=str, default=None, const='inductor',
|
126 |
+
help="Enable compilation w/ specified backend (default: inductor).")
|
127 |
+
scripting_group.add_argument('--aot-autograd', default=False, action='store_true',
|
128 |
+
help="Enable AOT Autograd optimization.")
|
129 |
+
|
130 |
+
# train optimizer parameters
|
131 |
+
parser.add_argument('--opt', default='sgd', type=str, metavar='OPTIMIZER',
|
132 |
+
help='Optimizer (default: "sgd"')
|
133 |
+
parser.add_argument('--opt-eps', default=None, type=float, metavar='EPSILON',
|
134 |
+
help='Optimizer Epsilon (default: None, use opt default)')
|
135 |
+
parser.add_argument('--opt-betas', default=None, type=float, nargs='+', metavar='BETA',
|
136 |
+
help='Optimizer Betas (default: None, use opt default)')
|
137 |
+
parser.add_argument('--momentum', type=float, default=0.9, metavar='M',
|
138 |
+
help='Optimizer momentum (default: 0.9)')
|
139 |
+
parser.add_argument('--weight-decay', type=float, default=0.0001,
|
140 |
+
help='weight decay (default: 0.0001)')
|
141 |
+
parser.add_argument('--clip-grad', type=float, default=None, metavar='NORM',
|
142 |
+
help='Clip gradient norm (default: None, no clipping)')
|
143 |
+
parser.add_argument('--clip-mode', type=str, default='norm',
|
144 |
+
help='Gradient clipping mode. One of ("norm", "value", "agc")')
|
145 |
+
|
146 |
+
|
147 |
+
# model regularization / loss params that impact model or loss fn
|
148 |
+
parser.add_argument('--smoothing', type=float, default=0.1,
|
149 |
+
help='Label smoothing (default: 0.1)')
|
150 |
+
parser.add_argument('--drop', type=float, default=0.0, metavar='PCT',
|
151 |
+
help='Dropout rate (default: 0.)')
|
152 |
+
parser.add_argument('--drop-path', type=float, default=None, metavar='PCT',
|
153 |
+
help='Drop path rate (default: None)')
|
154 |
+
parser.add_argument('--drop-block', type=float, default=None, metavar='PCT',
|
155 |
+
help='Drop block rate (default: None)')
|
156 |
+
|
157 |
+
|
158 |
+
def timestamp(sync=False):
|
159 |
+
return time.perf_counter()
|
160 |
+
|
161 |
+
|
162 |
+
def cuda_timestamp(sync=False, device=None):
|
163 |
+
if sync:
|
164 |
+
torch.cuda.synchronize(device=device)
|
165 |
+
return time.perf_counter()
|
166 |
+
|
167 |
+
|
168 |
+
def count_params(model: nn.Module):
|
169 |
+
return sum([m.numel() for m in model.parameters()])
|
170 |
+
|
171 |
+
|
172 |
+
def resolve_precision(precision: str):
|
173 |
+
assert precision in ('amp', 'amp_bfloat16', 'float16', 'bfloat16', 'float32')
|
174 |
+
amp_dtype = None # amp disabled
|
175 |
+
model_dtype = torch.float32
|
176 |
+
data_dtype = torch.float32
|
177 |
+
if precision == 'amp':
|
178 |
+
amp_dtype = torch.float16
|
179 |
+
elif precision == 'amp_bfloat16':
|
180 |
+
amp_dtype = torch.bfloat16
|
181 |
+
elif precision == 'float16':
|
182 |
+
model_dtype = torch.float16
|
183 |
+
data_dtype = torch.float16
|
184 |
+
elif precision == 'bfloat16':
|
185 |
+
model_dtype = torch.bfloat16
|
186 |
+
data_dtype = torch.bfloat16
|
187 |
+
return amp_dtype, model_dtype, data_dtype
|
188 |
+
|
189 |
+
|
190 |
+
def profile_deepspeed(model, input_size=(3, 224, 224), batch_size=1, detailed=False):
|
191 |
+
_, macs, _ = get_model_profile(
|
192 |
+
model=model,
|
193 |
+
input_shape=(batch_size,) + input_size, # input shape/resolution
|
194 |
+
print_profile=detailed, # prints the model graph with the measured profile attached to each module
|
195 |
+
detailed=detailed, # print the detailed profile
|
196 |
+
warm_up=10, # the number of warm-ups before measuring the time of each module
|
197 |
+
as_string=False, # print raw numbers (e.g. 1000) or as human-readable strings (e.g. 1k)
|
198 |
+
output_file=None, # path to the output file. If None, the profiler prints to stdout.
|
199 |
+
ignore_modules=None) # the list of modules to ignore in the profiling
|
200 |
+
return macs, 0 # no activation count in DS
|
201 |
+
|
202 |
+
|
203 |
+
def profile_fvcore(model, input_size=(3, 224, 224), batch_size=1, detailed=False, force_cpu=False):
|
204 |
+
if force_cpu:
|
205 |
+
model = model.to('cpu')
|
206 |
+
device, dtype = next(model.parameters()).device, next(model.parameters()).dtype
|
207 |
+
example_input = torch.ones((batch_size,) + input_size, device=device, dtype=dtype)
|
208 |
+
fca = FlopCountAnalysis(model, example_input)
|
209 |
+
aca = ActivationCountAnalysis(model, example_input)
|
210 |
+
if detailed:
|
211 |
+
fcs = flop_count_str(fca)
|
212 |
+
print(fcs)
|
213 |
+
return fca.total(), aca.total()
|
214 |
+
|
215 |
+
|
216 |
+
class BenchmarkRunner:
|
217 |
+
def __init__(
|
218 |
+
self,
|
219 |
+
model_name,
|
220 |
+
detail=False,
|
221 |
+
device='cuda',
|
222 |
+
torchscript=False,
|
223 |
+
torchcompile=None,
|
224 |
+
aot_autograd=False,
|
225 |
+
precision='float32',
|
226 |
+
fuser='',
|
227 |
+
num_warm_iter=10,
|
228 |
+
num_bench_iter=50,
|
229 |
+
use_train_size=False,
|
230 |
+
**kwargs
|
231 |
+
):
|
232 |
+
self.model_name = model_name
|
233 |
+
self.detail = detail
|
234 |
+
self.device = device
|
235 |
+
self.amp_dtype, self.model_dtype, self.data_dtype = resolve_precision(precision)
|
236 |
+
self.channels_last = kwargs.pop('channels_last', False)
|
237 |
+
if self.amp_dtype is not None:
|
238 |
+
self.amp_autocast = partial(torch.cuda.amp.autocast, dtype=self.amp_dtype)
|
239 |
+
else:
|
240 |
+
self.amp_autocast = suppress
|
241 |
+
|
242 |
+
if fuser:
|
243 |
+
set_jit_fuser(fuser)
|
244 |
+
self.model = create_model(
|
245 |
+
model_name,
|
246 |
+
num_classes=kwargs.pop('num_classes', None),
|
247 |
+
in_chans=3,
|
248 |
+
global_pool=kwargs.pop('gp', 'fast'),
|
249 |
+
scriptable=torchscript,
|
250 |
+
drop_rate=kwargs.pop('drop', 0.),
|
251 |
+
drop_path_rate=kwargs.pop('drop_path', None),
|
252 |
+
drop_block_rate=kwargs.pop('drop_block', None),
|
253 |
+
**kwargs.pop('model_kwargs', {}),
|
254 |
+
)
|
255 |
+
self.model.to(
|
256 |
+
device=self.device,
|
257 |
+
dtype=self.model_dtype,
|
258 |
+
memory_format=torch.channels_last if self.channels_last else None)
|
259 |
+
self.num_classes = self.model.num_classes
|
260 |
+
self.param_count = count_params(self.model)
|
261 |
+
_logger.info('Model %s created, param count: %d' % (model_name, self.param_count))
|
262 |
+
|
263 |
+
data_config = resolve_data_config(kwargs, model=self.model, use_test_size=not use_train_size)
|
264 |
+
self.input_size = data_config['input_size']
|
265 |
+
self.batch_size = kwargs.pop('batch_size', 256)
|
266 |
+
|
267 |
+
self.compiled = False
|
268 |
+
if torchscript:
|
269 |
+
self.model = torch.jit.script(self.model)
|
270 |
+
self.compiled = True
|
271 |
+
elif torchcompile:
|
272 |
+
assert has_compile, 'A version of torch w/ torch.compile() is required, possibly a nightly.'
|
273 |
+
torch._dynamo.reset()
|
274 |
+
self.model = torch.compile(self.model, backend=torchcompile)
|
275 |
+
self.compiled = True
|
276 |
+
elif aot_autograd:
|
277 |
+
assert has_functorch, "functorch is needed for --aot-autograd"
|
278 |
+
self.model = memory_efficient_fusion(self.model)
|
279 |
+
self.compiled = True
|
280 |
+
|
281 |
+
self.example_inputs = None
|
282 |
+
self.num_warm_iter = num_warm_iter
|
283 |
+
self.num_bench_iter = num_bench_iter
|
284 |
+
self.log_freq = num_bench_iter // 5
|
285 |
+
if 'cuda' in self.device:
|
286 |
+
self.time_fn = partial(cuda_timestamp, device=self.device)
|
287 |
+
else:
|
288 |
+
self.time_fn = timestamp
|
289 |
+
|
290 |
+
def _init_input(self):
|
291 |
+
self.example_inputs = torch.randn(
|
292 |
+
(self.batch_size,) + self.input_size, device=self.device, dtype=self.data_dtype)
|
293 |
+
if self.channels_last:
|
294 |
+
self.example_inputs = self.example_inputs.contiguous(memory_format=torch.channels_last)
|
295 |
+
|
296 |
+
|
297 |
+
class InferenceBenchmarkRunner(BenchmarkRunner):
|
298 |
+
|
299 |
+
def __init__(
|
300 |
+
self,
|
301 |
+
model_name,
|
302 |
+
device='cuda',
|
303 |
+
torchscript=False,
|
304 |
+
**kwargs
|
305 |
+
):
|
306 |
+
super().__init__(model_name=model_name, device=device, torchscript=torchscript, **kwargs)
|
307 |
+
self.model.eval()
|
308 |
+
|
309 |
+
def run(self):
|
310 |
+
def _step():
|
311 |
+
t_step_start = self.time_fn()
|
312 |
+
with self.amp_autocast():
|
313 |
+
output = self.model(self.example_inputs)
|
314 |
+
t_step_end = self.time_fn(True)
|
315 |
+
return t_step_end - t_step_start
|
316 |
+
|
317 |
+
_logger.info(
|
318 |
+
f'Running inference benchmark on {self.model_name} for {self.num_bench_iter} steps w/ '
|
319 |
+
f'input size {self.input_size} and batch size {self.batch_size}.')
|
320 |
+
|
321 |
+
with torch.no_grad():
|
322 |
+
self._init_input()
|
323 |
+
|
324 |
+
for _ in range(self.num_warm_iter):
|
325 |
+
_step()
|
326 |
+
|
327 |
+
total_step = 0.
|
328 |
+
num_samples = 0
|
329 |
+
t_run_start = self.time_fn()
|
330 |
+
for i in range(self.num_bench_iter):
|
331 |
+
delta_fwd = _step()
|
332 |
+
total_step += delta_fwd
|
333 |
+
num_samples += self.batch_size
|
334 |
+
num_steps = i + 1
|
335 |
+
if num_steps % self.log_freq == 0:
|
336 |
+
_logger.info(
|
337 |
+
f"Infer [{num_steps}/{self.num_bench_iter}]."
|
338 |
+
f" {num_samples / total_step:0.2f} samples/sec."
|
339 |
+
f" {1000 * total_step / num_steps:0.3f} ms/step.")
|
340 |
+
t_run_end = self.time_fn(True)
|
341 |
+
t_run_elapsed = t_run_end - t_run_start
|
342 |
+
|
343 |
+
results = dict(
|
344 |
+
samples_per_sec=round(num_samples / t_run_elapsed, 2),
|
345 |
+
step_time=round(1000 * total_step / self.num_bench_iter, 3),
|
346 |
+
batch_size=self.batch_size,
|
347 |
+
img_size=self.input_size[-1],
|
348 |
+
param_count=round(self.param_count / 1e6, 2),
|
349 |
+
)
|
350 |
+
|
351 |
+
retries = 0 if self.compiled else 2 # skip profiling if model is scripted
|
352 |
+
while retries:
|
353 |
+
retries -= 1
|
354 |
+
try:
|
355 |
+
if has_deepspeed_profiling:
|
356 |
+
macs, _ = profile_deepspeed(self.model, self.input_size)
|
357 |
+
results['gmacs'] = round(macs / 1e9, 2)
|
358 |
+
elif has_fvcore_profiling:
|
359 |
+
macs, activations = profile_fvcore(self.model, self.input_size, force_cpu=not retries)
|
360 |
+
results['gmacs'] = round(macs / 1e9, 2)
|
361 |
+
results['macts'] = round(activations / 1e6, 2)
|
362 |
+
except RuntimeError as e:
|
363 |
+
pass
|
364 |
+
|
365 |
+
_logger.info(
|
366 |
+
f"Inference benchmark of {self.model_name} done. "
|
367 |
+
f"{results['samples_per_sec']:.2f} samples/sec, {results['step_time']:.2f} ms/step")
|
368 |
+
|
369 |
+
return results
|
370 |
+
|
371 |
+
|
372 |
+
class TrainBenchmarkRunner(BenchmarkRunner):
|
373 |
+
|
374 |
+
def __init__(
|
375 |
+
self,
|
376 |
+
model_name,
|
377 |
+
device='cuda',
|
378 |
+
torchscript=False,
|
379 |
+
**kwargs
|
380 |
+
):
|
381 |
+
super().__init__(model_name=model_name, device=device, torchscript=torchscript, **kwargs)
|
382 |
+
self.model.train()
|
383 |
+
|
384 |
+
self.loss = nn.CrossEntropyLoss().to(self.device)
|
385 |
+
self.target_shape = tuple()
|
386 |
+
|
387 |
+
self.optimizer = create_optimizer_v2(
|
388 |
+
self.model,
|
389 |
+
opt=kwargs.pop('opt', 'sgd'),
|
390 |
+
lr=kwargs.pop('lr', 1e-4))
|
391 |
+
|
392 |
+
if kwargs.pop('grad_checkpointing', False):
|
393 |
+
self.model.set_grad_checkpointing()
|
394 |
+
|
395 |
+
def _gen_target(self, batch_size):
|
396 |
+
return torch.empty(
|
397 |
+
(batch_size,) + self.target_shape, device=self.device, dtype=torch.long).random_(self.num_classes)
|
398 |
+
|
399 |
+
def run(self):
|
400 |
+
def _step(detail=False):
|
401 |
+
self.optimizer.zero_grad() # can this be ignored?
|
402 |
+
t_start = self.time_fn()
|
403 |
+
t_fwd_end = t_start
|
404 |
+
t_bwd_end = t_start
|
405 |
+
with self.amp_autocast():
|
406 |
+
output = self.model(self.example_inputs)
|
407 |
+
if isinstance(output, tuple):
|
408 |
+
output = output[0]
|
409 |
+
if detail:
|
410 |
+
t_fwd_end = self.time_fn(True)
|
411 |
+
target = self._gen_target(output.shape[0])
|
412 |
+
self.loss(output, target).backward()
|
413 |
+
if detail:
|
414 |
+
t_bwd_end = self.time_fn(True)
|
415 |
+
self.optimizer.step()
|
416 |
+
t_end = self.time_fn(True)
|
417 |
+
if detail:
|
418 |
+
delta_fwd = t_fwd_end - t_start
|
419 |
+
delta_bwd = t_bwd_end - t_fwd_end
|
420 |
+
delta_opt = t_end - t_bwd_end
|
421 |
+
return delta_fwd, delta_bwd, delta_opt
|
422 |
+
else:
|
423 |
+
delta_step = t_end - t_start
|
424 |
+
return delta_step
|
425 |
+
|
426 |
+
_logger.info(
|
427 |
+
f'Running train benchmark on {self.model_name} for {self.num_bench_iter} steps w/ '
|
428 |
+
f'input size {self.input_size} and batch size {self.batch_size}.')
|
429 |
+
|
430 |
+
self._init_input()
|
431 |
+
|
432 |
+
for _ in range(self.num_warm_iter):
|
433 |
+
_step()
|
434 |
+
|
435 |
+
t_run_start = self.time_fn()
|
436 |
+
if self.detail:
|
437 |
+
total_fwd = 0.
|
438 |
+
total_bwd = 0.
|
439 |
+
total_opt = 0.
|
440 |
+
num_samples = 0
|
441 |
+
for i in range(self.num_bench_iter):
|
442 |
+
delta_fwd, delta_bwd, delta_opt = _step(True)
|
443 |
+
num_samples += self.batch_size
|
444 |
+
total_fwd += delta_fwd
|
445 |
+
total_bwd += delta_bwd
|
446 |
+
total_opt += delta_opt
|
447 |
+
num_steps = (i + 1)
|
448 |
+
if num_steps % self.log_freq == 0:
|
449 |
+
total_step = total_fwd + total_bwd + total_opt
|
450 |
+
_logger.info(
|
451 |
+
f"Train [{num_steps}/{self.num_bench_iter}]."
|
452 |
+
f" {num_samples / total_step:0.2f} samples/sec."
|
453 |
+
f" {1000 * total_fwd / num_steps:0.3f} ms/step fwd,"
|
454 |
+
f" {1000 * total_bwd / num_steps:0.3f} ms/step bwd,"
|
455 |
+
f" {1000 * total_opt / num_steps:0.3f} ms/step opt."
|
456 |
+
)
|
457 |
+
total_step = total_fwd + total_bwd + total_opt
|
458 |
+
t_run_elapsed = self.time_fn() - t_run_start
|
459 |
+
results = dict(
|
460 |
+
samples_per_sec=round(num_samples / t_run_elapsed, 2),
|
461 |
+
step_time=round(1000 * total_step / self.num_bench_iter, 3),
|
462 |
+
fwd_time=round(1000 * total_fwd / self.num_bench_iter, 3),
|
463 |
+
bwd_time=round(1000 * total_bwd / self.num_bench_iter, 3),
|
464 |
+
opt_time=round(1000 * total_opt / self.num_bench_iter, 3),
|
465 |
+
batch_size=self.batch_size,
|
466 |
+
img_size=self.input_size[-1],
|
467 |
+
param_count=round(self.param_count / 1e6, 2),
|
468 |
+
)
|
469 |
+
else:
|
470 |
+
total_step = 0.
|
471 |
+
num_samples = 0
|
472 |
+
for i in range(self.num_bench_iter):
|
473 |
+
delta_step = _step(False)
|
474 |
+
num_samples += self.batch_size
|
475 |
+
total_step += delta_step
|
476 |
+
num_steps = (i + 1)
|
477 |
+
if num_steps % self.log_freq == 0:
|
478 |
+
_logger.info(
|
479 |
+
f"Train [{num_steps}/{self.num_bench_iter}]."
|
480 |
+
f" {num_samples / total_step:0.2f} samples/sec."
|
481 |
+
f" {1000 * total_step / num_steps:0.3f} ms/step.")
|
482 |
+
t_run_elapsed = self.time_fn() - t_run_start
|
483 |
+
results = dict(
|
484 |
+
samples_per_sec=round(num_samples / t_run_elapsed, 2),
|
485 |
+
step_time=round(1000 * total_step / self.num_bench_iter, 3),
|
486 |
+
batch_size=self.batch_size,
|
487 |
+
img_size=self.input_size[-1],
|
488 |
+
param_count=round(self.param_count / 1e6, 2),
|
489 |
+
)
|
490 |
+
|
491 |
+
_logger.info(
|
492 |
+
f"Train benchmark of {self.model_name} done. "
|
493 |
+
f"{results['samples_per_sec']:.2f} samples/sec, {results['step_time']:.2f} ms/sample")
|
494 |
+
|
495 |
+
return results
|
496 |
+
|
497 |
+
|
498 |
+
class ProfileRunner(BenchmarkRunner):
|
499 |
+
|
500 |
+
def __init__(self, model_name, device='cuda', profiler='', **kwargs):
|
501 |
+
super().__init__(model_name=model_name, device=device, **kwargs)
|
502 |
+
if not profiler:
|
503 |
+
if has_deepspeed_profiling:
|
504 |
+
profiler = 'deepspeed'
|
505 |
+
elif has_fvcore_profiling:
|
506 |
+
profiler = 'fvcore'
|
507 |
+
assert profiler, "One of deepspeed or fvcore needs to be installed for profiling to work."
|
508 |
+
self.profiler = profiler
|
509 |
+
self.model.eval()
|
510 |
+
|
511 |
+
def run(self):
|
512 |
+
_logger.info(
|
513 |
+
f'Running profiler on {self.model_name} w/ '
|
514 |
+
f'input size {self.input_size} and batch size {self.batch_size}.')
|
515 |
+
|
516 |
+
macs = 0
|
517 |
+
activations = 0
|
518 |
+
if self.profiler == 'deepspeed':
|
519 |
+
macs, _ = profile_deepspeed(self.model, self.input_size, batch_size=self.batch_size, detailed=True)
|
520 |
+
elif self.profiler == 'fvcore':
|
521 |
+
macs, activations = profile_fvcore(self.model, self.input_size, batch_size=self.batch_size, detailed=True)
|
522 |
+
|
523 |
+
results = dict(
|
524 |
+
gmacs=round(macs / 1e9, 2),
|
525 |
+
macts=round(activations / 1e6, 2),
|
526 |
+
batch_size=self.batch_size,
|
527 |
+
img_size=self.input_size[-1],
|
528 |
+
param_count=round(self.param_count / 1e6, 2),
|
529 |
+
)
|
530 |
+
|
531 |
+
_logger.info(
|
532 |
+
f"Profile of {self.model_name} done. "
|
533 |
+
f"{results['gmacs']:.2f} GMACs, {results['param_count']:.2f} M params.")
|
534 |
+
|
535 |
+
return results
|
536 |
+
|
537 |
+
|
538 |
+
def _try_run(
|
539 |
+
model_name,
|
540 |
+
bench_fn,
|
541 |
+
bench_kwargs,
|
542 |
+
initial_batch_size,
|
543 |
+
no_batch_size_retry=False
|
544 |
+
):
|
545 |
+
batch_size = initial_batch_size
|
546 |
+
results = dict()
|
547 |
+
error_str = 'Unknown'
|
548 |
+
while batch_size:
|
549 |
+
try:
|
550 |
+
torch.cuda.empty_cache()
|
551 |
+
bench = bench_fn(model_name=model_name, batch_size=batch_size, **bench_kwargs)
|
552 |
+
results = bench.run()
|
553 |
+
return results
|
554 |
+
except RuntimeError as e:
|
555 |
+
error_str = str(e)
|
556 |
+
_logger.error(f'"{error_str}" while running benchmark.')
|
557 |
+
if not check_batch_size_retry(error_str):
|
558 |
+
_logger.error(f'Unrecoverable error encountered while benchmarking {model_name}, skipping.')
|
559 |
+
break
|
560 |
+
if no_batch_size_retry:
|
561 |
+
break
|
562 |
+
batch_size = decay_batch_step(batch_size)
|
563 |
+
_logger.warning(f'Reducing batch size to {batch_size} for retry.')
|
564 |
+
results['error'] = error_str
|
565 |
+
return results
|
566 |
+
|
567 |
+
|
568 |
+
def benchmark(args):
|
569 |
+
if args.amp:
|
570 |
+
_logger.warning("Overriding precision to 'amp' since --amp flag set.")
|
571 |
+
args.precision = 'amp' if args.amp_dtype == 'float16' else '_'.join(['amp', args.amp_dtype])
|
572 |
+
_logger.info(f'Benchmarking in {args.precision} precision. '
|
573 |
+
f'{"NHWC" if args.channels_last else "NCHW"} layout. '
|
574 |
+
f'torchscript {"enabled" if args.torchscript else "disabled"}')
|
575 |
+
|
576 |
+
bench_kwargs = vars(args).copy()
|
577 |
+
bench_kwargs.pop('amp')
|
578 |
+
model = bench_kwargs.pop('model')
|
579 |
+
batch_size = bench_kwargs.pop('batch_size')
|
580 |
+
|
581 |
+
bench_fns = (InferenceBenchmarkRunner,)
|
582 |
+
prefixes = ('infer',)
|
583 |
+
if args.bench == 'both':
|
584 |
+
bench_fns = (
|
585 |
+
InferenceBenchmarkRunner,
|
586 |
+
TrainBenchmarkRunner
|
587 |
+
)
|
588 |
+
prefixes = ('infer', 'train')
|
589 |
+
elif args.bench == 'train':
|
590 |
+
bench_fns = TrainBenchmarkRunner,
|
591 |
+
prefixes = 'train',
|
592 |
+
elif args.bench.startswith('profile'):
|
593 |
+
# specific profiler used if included in bench mode string, otherwise default to deepspeed, fallback to fvcore
|
594 |
+
if 'deepspeed' in args.bench:
|
595 |
+
assert has_deepspeed_profiling, "deepspeed must be installed to use deepspeed flop counter"
|
596 |
+
bench_kwargs['profiler'] = 'deepspeed'
|
597 |
+
elif 'fvcore' in args.bench:
|
598 |
+
assert has_fvcore_profiling, "fvcore must be installed to use fvcore flop counter"
|
599 |
+
bench_kwargs['profiler'] = 'fvcore'
|
600 |
+
bench_fns = ProfileRunner,
|
601 |
+
batch_size = 1
|
602 |
+
|
603 |
+
model_results = OrderedDict(model=model)
|
604 |
+
for prefix, bench_fn in zip(prefixes, bench_fns):
|
605 |
+
run_results = _try_run(
|
606 |
+
model,
|
607 |
+
bench_fn,
|
608 |
+
bench_kwargs=bench_kwargs,
|
609 |
+
initial_batch_size=batch_size,
|
610 |
+
no_batch_size_retry=args.no_retry,
|
611 |
+
)
|
612 |
+
if prefix and 'error' not in run_results:
|
613 |
+
run_results = {'_'.join([prefix, k]): v for k, v in run_results.items()}
|
614 |
+
model_results.update(run_results)
|
615 |
+
if 'error' in run_results:
|
616 |
+
break
|
617 |
+
if 'error' not in model_results:
|
618 |
+
param_count = model_results.pop('infer_param_count', model_results.pop('train_param_count', 0))
|
619 |
+
model_results.setdefault('param_count', param_count)
|
620 |
+
model_results.pop('train_param_count', 0)
|
621 |
+
return model_results
|
622 |
+
|
623 |
+
|
624 |
+
def main():
|
625 |
+
setup_default_logging()
|
626 |
+
args = parser.parse_args()
|
627 |
+
model_cfgs = []
|
628 |
+
model_names = []
|
629 |
+
|
630 |
+
if args.fast_norm:
|
631 |
+
set_fast_norm()
|
632 |
+
|
633 |
+
if args.model_list:
|
634 |
+
args.model = ''
|
635 |
+
with open(args.model_list) as f:
|
636 |
+
model_names = [line.rstrip() for line in f]
|
637 |
+
model_cfgs = [(n, None) for n in model_names]
|
638 |
+
elif args.model == 'all':
|
639 |
+
# validate all models in a list of names with pretrained checkpoints
|
640 |
+
args.pretrained = True
|
641 |
+
model_names = list_models(pretrained=True, exclude_filters=['*in21k'])
|
642 |
+
model_cfgs = [(n, None) for n in model_names]
|
643 |
+
elif not is_model(args.model):
|
644 |
+
# model name doesn't exist, try as wildcard filter
|
645 |
+
model_names = list_models(args.model)
|
646 |
+
model_cfgs = [(n, None) for n in model_names]
|
647 |
+
|
648 |
+
if len(model_cfgs):
|
649 |
+
_logger.info('Running bulk validation on these pretrained models: {}'.format(', '.join(model_names)))
|
650 |
+
results = []
|
651 |
+
try:
|
652 |
+
for m, _ in model_cfgs:
|
653 |
+
if not m:
|
654 |
+
continue
|
655 |
+
args.model = m
|
656 |
+
r = benchmark(args)
|
657 |
+
if r:
|
658 |
+
results.append(r)
|
659 |
+
time.sleep(10)
|
660 |
+
except KeyboardInterrupt as e:
|
661 |
+
pass
|
662 |
+
sort_key = 'infer_samples_per_sec'
|
663 |
+
if 'train' in args.bench:
|
664 |
+
sort_key = 'train_samples_per_sec'
|
665 |
+
elif 'profile' in args.bench:
|
666 |
+
sort_key = 'infer_gmacs'
|
667 |
+
results = filter(lambda x: sort_key in x, results)
|
668 |
+
results = sorted(results, key=lambda x: x[sort_key], reverse=True)
|
669 |
+
else:
|
670 |
+
results = benchmark(args)
|
671 |
+
|
672 |
+
if args.results_file:
|
673 |
+
write_results(args.results_file, results, format=args.results_format)
|
674 |
+
|
675 |
+
# output results in JSON to stdout w/ delimiter for runner script
|
676 |
+
print(f'--result\n{json.dumps(results, indent=4)}')
|
677 |
+
|
678 |
+
|
679 |
+
def write_results(results_file, results, format='csv'):
|
680 |
+
with open(results_file, mode='w') as cf:
|
681 |
+
if format == 'json':
|
682 |
+
json.dump(results, cf, indent=4)
|
683 |
+
else:
|
684 |
+
if not isinstance(results, (list, tuple)):
|
685 |
+
results = [results]
|
686 |
+
if not results:
|
687 |
+
return
|
688 |
+
dw = csv.DictWriter(cf, fieldnames=results[0].keys())
|
689 |
+
dw.writeheader()
|
690 |
+
for r in results:
|
691 |
+
dw.writerow(r)
|
692 |
+
cf.flush()
|
693 |
+
|
694 |
+
|
695 |
+
if __name__ == '__main__':
|
696 |
+
main()
|
bulk_runner.py
ADDED
@@ -0,0 +1,184 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
""" Bulk Model Script Runner
|
3 |
+
|
4 |
+
Run validation or benchmark script in separate process for each model
|
5 |
+
|
6 |
+
Benchmark all 'vit*' models:
|
7 |
+
python bulk_runner.py --model-list 'vit*' --results-file vit_bench.csv benchmark.py --amp -b 512
|
8 |
+
|
9 |
+
Validate all models:
|
10 |
+
python bulk_runner.py --model-list all --results-file val.csv --pretrained validate.py /imagenet/validation/ --amp -b 512 --retry
|
11 |
+
|
12 |
+
Hacked together by Ross Wightman (https://github.com/rwightman)
|
13 |
+
"""
|
14 |
+
import argparse
|
15 |
+
import os
|
16 |
+
import sys
|
17 |
+
import csv
|
18 |
+
import json
|
19 |
+
import subprocess
|
20 |
+
import time
|
21 |
+
from typing import Callable, List, Tuple, Union
|
22 |
+
|
23 |
+
|
24 |
+
from timm.models import is_model, list_models
|
25 |
+
|
26 |
+
|
27 |
+
parser = argparse.ArgumentParser(description='Per-model process launcher')
|
28 |
+
|
29 |
+
# model and results args
|
30 |
+
parser.add_argument(
|
31 |
+
'--model-list', metavar='NAME', default='',
|
32 |
+
help='txt file based list of model names to benchmark')
|
33 |
+
parser.add_argument(
|
34 |
+
'--results-file', default='', type=str, metavar='FILENAME',
|
35 |
+
help='Output csv file for validation results (summary)')
|
36 |
+
parser.add_argument(
|
37 |
+
'--sort-key', default='', type=str, metavar='COL',
|
38 |
+
help='Specify sort key for results csv')
|
39 |
+
parser.add_argument(
|
40 |
+
"--pretrained", action='store_true',
|
41 |
+
help="only run models with pretrained weights")
|
42 |
+
|
43 |
+
parser.add_argument(
|
44 |
+
"--delay",
|
45 |
+
type=float,
|
46 |
+
default=0,
|
47 |
+
help="Interval, in seconds, to delay between model invocations.",
|
48 |
+
)
|
49 |
+
parser.add_argument(
|
50 |
+
"--start_method", type=str, default="spawn", choices=["spawn", "fork", "forkserver"],
|
51 |
+
help="Multiprocessing start method to use when creating workers.",
|
52 |
+
)
|
53 |
+
parser.add_argument(
|
54 |
+
"--no_python",
|
55 |
+
help="Skip prepending the script with 'python' - just execute it directly. Useful "
|
56 |
+
"when the script is not a Python script.",
|
57 |
+
)
|
58 |
+
parser.add_argument(
|
59 |
+
"-m",
|
60 |
+
"--module",
|
61 |
+
help="Change each process to interpret the launch script as a Python module, executing "
|
62 |
+
"with the same behavior as 'python -m'.",
|
63 |
+
)
|
64 |
+
|
65 |
+
# positional
|
66 |
+
parser.add_argument(
|
67 |
+
"script", type=str,
|
68 |
+
help="Full path to the program/script to be launched for each model config.",
|
69 |
+
)
|
70 |
+
parser.add_argument("script_args", nargs=argparse.REMAINDER)
|
71 |
+
|
72 |
+
|
73 |
+
def cmd_from_args(args) -> Tuple[Union[Callable, str], List[str]]:
|
74 |
+
# If ``args`` not passed, defaults to ``sys.argv[:1]``
|
75 |
+
with_python = not args.no_python
|
76 |
+
cmd: Union[Callable, str]
|
77 |
+
cmd_args = []
|
78 |
+
if with_python:
|
79 |
+
cmd = os.getenv("PYTHON_EXEC", sys.executable)
|
80 |
+
cmd_args.append("-u")
|
81 |
+
if args.module:
|
82 |
+
cmd_args.append("-m")
|
83 |
+
cmd_args.append(args.script)
|
84 |
+
else:
|
85 |
+
if args.module:
|
86 |
+
raise ValueError(
|
87 |
+
"Don't use both the '--no_python' flag"
|
88 |
+
" and the '--module' flag at the same time."
|
89 |
+
)
|
90 |
+
cmd = args.script
|
91 |
+
cmd_args.extend(args.script_args)
|
92 |
+
|
93 |
+
return cmd, cmd_args
|
94 |
+
|
95 |
+
|
96 |
+
def main():
|
97 |
+
args = parser.parse_args()
|
98 |
+
cmd, cmd_args = cmd_from_args(args)
|
99 |
+
|
100 |
+
model_cfgs = []
|
101 |
+
model_names = []
|
102 |
+
if args.model_list == 'all':
|
103 |
+
# NOTE should make this config, for validation / benchmark runs the focus is 1k models,
|
104 |
+
# so we filter out 21/22k and some other unusable heads. This will change in the future...
|
105 |
+
exclude_model_filters = ['*in21k', '*in22k', '*dino', '*_22k']
|
106 |
+
model_names = list_models(
|
107 |
+
pretrained=args.pretrained, # only include models w/ pretrained checkpoints if set
|
108 |
+
exclude_filters=exclude_model_filters
|
109 |
+
)
|
110 |
+
model_cfgs = [(n, None) for n in model_names]
|
111 |
+
elif not is_model(args.model_list):
|
112 |
+
# model name doesn't exist, try as wildcard filter
|
113 |
+
model_names = list_models(args.model_list)
|
114 |
+
model_cfgs = [(n, None) for n in model_names]
|
115 |
+
|
116 |
+
if not model_cfgs and os.path.exists(args.model_list):
|
117 |
+
with open(args.model_list) as f:
|
118 |
+
model_names = [line.rstrip() for line in f]
|
119 |
+
model_cfgs = [(n, None) for n in model_names]
|
120 |
+
|
121 |
+
if len(model_cfgs):
|
122 |
+
results_file = args.results_file or './results.csv'
|
123 |
+
results = []
|
124 |
+
errors = []
|
125 |
+
print('Running script on these models: {}'.format(', '.join(model_names)))
|
126 |
+
if not args.sort_key:
|
127 |
+
if 'benchmark' in args.script:
|
128 |
+
if any(['train' in a for a in args.script_args]):
|
129 |
+
sort_key = 'train_samples_per_sec'
|
130 |
+
else:
|
131 |
+
sort_key = 'infer_samples_per_sec'
|
132 |
+
else:
|
133 |
+
sort_key = 'top1'
|
134 |
+
else:
|
135 |
+
sort_key = args.sort_key
|
136 |
+
print(f'Script: {args.script}, Args: {args.script_args}, Sort key: {sort_key}')
|
137 |
+
|
138 |
+
try:
|
139 |
+
for m, _ in model_cfgs:
|
140 |
+
if not m:
|
141 |
+
continue
|
142 |
+
args_str = (cmd, *[str(e) for e in cmd_args], '--model', m)
|
143 |
+
try:
|
144 |
+
o = subprocess.check_output(args=args_str).decode('utf-8').split('--result')[-1]
|
145 |
+
r = json.loads(o)
|
146 |
+
results.append(r)
|
147 |
+
except Exception as e:
|
148 |
+
# FIXME batch_size retry loop is currently done in either validation.py or benchmark.py
|
149 |
+
# for further robustness (but more overhead), we may want to manage that by looping here...
|
150 |
+
errors.append(dict(model=m, error=str(e)))
|
151 |
+
if args.delay:
|
152 |
+
time.sleep(args.delay)
|
153 |
+
except KeyboardInterrupt as e:
|
154 |
+
pass
|
155 |
+
|
156 |
+
errors.extend(list(filter(lambda x: 'error' in x, results)))
|
157 |
+
if errors:
|
158 |
+
print(f'{len(errors)} models had errors during run.')
|
159 |
+
for e in errors:
|
160 |
+
print(f"\t {e['model']} ({e.get('error', 'Unknown')})")
|
161 |
+
results = list(filter(lambda x: 'error' not in x, results))
|
162 |
+
|
163 |
+
no_sortkey = list(filter(lambda x: sort_key not in x, results))
|
164 |
+
if no_sortkey:
|
165 |
+
print(f'{len(no_sortkey)} results missing sort key, skipping sort.')
|
166 |
+
else:
|
167 |
+
results = sorted(results, key=lambda x: x[sort_key], reverse=True)
|
168 |
+
|
169 |
+
if len(results):
|
170 |
+
print(f'{len(results)} models run successfully. Saving results to {results_file}.')
|
171 |
+
write_results(results_file, results)
|
172 |
+
|
173 |
+
|
174 |
+
def write_results(results_file, results):
|
175 |
+
with open(results_file, mode='w') as cf:
|
176 |
+
dw = csv.DictWriter(cf, fieldnames=results[0].keys())
|
177 |
+
dw.writeheader()
|
178 |
+
for r in results:
|
179 |
+
dw.writerow(r)
|
180 |
+
cf.flush()
|
181 |
+
|
182 |
+
|
183 |
+
if __name__ == '__main__':
|
184 |
+
main()
|
clean_checkpoint.py
ADDED
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
""" Checkpoint Cleaning Script
|
3 |
+
|
4 |
+
Takes training checkpoints with GPU tensors, optimizer state, extra dict keys, etc.
|
5 |
+
and outputs a CPU tensor checkpoint with only the `state_dict` along with SHA256
|
6 |
+
calculation for model zoo compatibility.
|
7 |
+
|
8 |
+
Hacked together by / Copyright 2020 Ross Wightman (https://github.com/rwightman)
|
9 |
+
"""
|
10 |
+
import torch
|
11 |
+
import argparse
|
12 |
+
import os
|
13 |
+
import hashlib
|
14 |
+
import shutil
|
15 |
+
import tempfile
|
16 |
+
from timm.models import load_state_dict
|
17 |
+
try:
|
18 |
+
import safetensors.torch
|
19 |
+
_has_safetensors = True
|
20 |
+
except ImportError:
|
21 |
+
_has_safetensors = False
|
22 |
+
|
23 |
+
parser = argparse.ArgumentParser(description='PyTorch Checkpoint Cleaner')
|
24 |
+
parser.add_argument('--checkpoint', default='', type=str, metavar='PATH',
|
25 |
+
help='path to latest checkpoint (default: none)')
|
26 |
+
parser.add_argument('--output', default='', type=str, metavar='PATH',
|
27 |
+
help='output path')
|
28 |
+
parser.add_argument('--no-use-ema', dest='no_use_ema', action='store_true',
|
29 |
+
help='use ema version of weights if present')
|
30 |
+
parser.add_argument('--no-hash', dest='no_hash', action='store_true',
|
31 |
+
help='no hash in output filename')
|
32 |
+
parser.add_argument('--clean-aux-bn', dest='clean_aux_bn', action='store_true',
|
33 |
+
help='remove auxiliary batch norm layers (from SplitBN training) from checkpoint')
|
34 |
+
parser.add_argument('--safetensors', action='store_true',
|
35 |
+
help='Save weights using safetensors instead of the default torch way (pickle).')
|
36 |
+
|
37 |
+
|
38 |
+
def main():
|
39 |
+
args = parser.parse_args()
|
40 |
+
|
41 |
+
if os.path.exists(args.output):
|
42 |
+
print("Error: Output filename ({}) already exists.".format(args.output))
|
43 |
+
exit(1)
|
44 |
+
|
45 |
+
clean_checkpoint(
|
46 |
+
args.checkpoint,
|
47 |
+
args.output,
|
48 |
+
not args.no_use_ema,
|
49 |
+
args.no_hash,
|
50 |
+
args.clean_aux_bn,
|
51 |
+
safe_serialization=args.safetensors,
|
52 |
+
)
|
53 |
+
|
54 |
+
|
55 |
+
def clean_checkpoint(
|
56 |
+
checkpoint,
|
57 |
+
output,
|
58 |
+
use_ema=True,
|
59 |
+
no_hash=False,
|
60 |
+
clean_aux_bn=False,
|
61 |
+
safe_serialization: bool=False,
|
62 |
+
):
|
63 |
+
# Load an existing checkpoint to CPU, strip everything but the state_dict and re-save
|
64 |
+
if checkpoint and os.path.isfile(checkpoint):
|
65 |
+
print("=> Loading checkpoint '{}'".format(checkpoint))
|
66 |
+
state_dict = load_state_dict(checkpoint, use_ema=use_ema)
|
67 |
+
new_state_dict = {}
|
68 |
+
for k, v in state_dict.items():
|
69 |
+
if clean_aux_bn and 'aux_bn' in k:
|
70 |
+
# If all aux_bn keys are removed, the SplitBN layers will end up as normal and
|
71 |
+
# load with the unmodified model using BatchNorm2d.
|
72 |
+
continue
|
73 |
+
name = k[7:] if k.startswith('module.') else k
|
74 |
+
new_state_dict[name] = v
|
75 |
+
print("=> Loaded state_dict from '{}'".format(checkpoint))
|
76 |
+
|
77 |
+
ext = ''
|
78 |
+
if output:
|
79 |
+
checkpoint_root, checkpoint_base = os.path.split(output)
|
80 |
+
checkpoint_base, ext = os.path.splitext(checkpoint_base)
|
81 |
+
else:
|
82 |
+
checkpoint_root = ''
|
83 |
+
checkpoint_base = os.path.split(checkpoint)[1]
|
84 |
+
checkpoint_base = os.path.splitext(checkpoint_base)[0]
|
85 |
+
|
86 |
+
temp_filename = '__' + checkpoint_base
|
87 |
+
if safe_serialization:
|
88 |
+
assert _has_safetensors, "`pip install safetensors` to use .safetensors"
|
89 |
+
safetensors.torch.save_file(new_state_dict, temp_filename)
|
90 |
+
else:
|
91 |
+
torch.save(new_state_dict, temp_filename)
|
92 |
+
|
93 |
+
with open(temp_filename, 'rb') as f:
|
94 |
+
sha_hash = hashlib.sha256(f.read()).hexdigest()
|
95 |
+
|
96 |
+
if ext:
|
97 |
+
final_ext = ext
|
98 |
+
else:
|
99 |
+
final_ext = ('.safetensors' if safe_serialization else '.pth')
|
100 |
+
|
101 |
+
if no_hash:
|
102 |
+
final_filename = checkpoint_base + final_ext
|
103 |
+
else:
|
104 |
+
final_filename = '-'.join([checkpoint_base, sha_hash[:8]]) + final_ext
|
105 |
+
|
106 |
+
shutil.move(temp_filename, os.path.join(checkpoint_root, final_filename))
|
107 |
+
print("=> Saved state_dict to '{}, SHA256: {}'".format(final_filename, sha_hash))
|
108 |
+
return final_filename
|
109 |
+
else:
|
110 |
+
print("Error: Checkpoint ({}) doesn't exist".format(checkpoint))
|
111 |
+
return ''
|
112 |
+
|
113 |
+
|
114 |
+
if __name__ == '__main__':
|
115 |
+
main()
|
convert/convert_from_mxnet.py
ADDED
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import argparse
|
2 |
+
import hashlib
|
3 |
+
import os
|
4 |
+
|
5 |
+
import mxnet as mx
|
6 |
+
import gluoncv
|
7 |
+
import torch
|
8 |
+
from timm import create_model
|
9 |
+
|
10 |
+
parser = argparse.ArgumentParser(description='Convert from MXNet')
|
11 |
+
parser.add_argument('--model', default='all', type=str, metavar='MODEL',
|
12 |
+
help='Name of model to train (default: "all"')
|
13 |
+
|
14 |
+
|
15 |
+
def convert(mxnet_name, torch_name):
|
16 |
+
# download and load the pre-trained model
|
17 |
+
net = gluoncv.model_zoo.get_model(mxnet_name, pretrained=True)
|
18 |
+
|
19 |
+
# create corresponding torch model
|
20 |
+
torch_net = create_model(torch_name)
|
21 |
+
|
22 |
+
mxp = [(k, v) for k, v in net.collect_params().items() if 'running' not in k]
|
23 |
+
torchp = list(torch_net.named_parameters())
|
24 |
+
torch_params = {}
|
25 |
+
|
26 |
+
# convert parameters
|
27 |
+
# NOTE: we are relying on the fact that the order of parameters
|
28 |
+
# are usually exactly the same between these models, thus no key name mapping
|
29 |
+
# is necessary. Asserts will trip if this is not the case.
|
30 |
+
for (tn, tv), (mn, mv) in zip(torchp, mxp):
|
31 |
+
m_split = mn.split('_')
|
32 |
+
t_split = tn.split('.')
|
33 |
+
print(t_split, m_split)
|
34 |
+
print(tv.shape, mv.shape)
|
35 |
+
|
36 |
+
# ensure ordering of BN params match since their sizes are not specific
|
37 |
+
if m_split[-1] == 'gamma':
|
38 |
+
assert t_split[-1] == 'weight'
|
39 |
+
if m_split[-1] == 'beta':
|
40 |
+
assert t_split[-1] == 'bias'
|
41 |
+
|
42 |
+
# ensure shapes match
|
43 |
+
assert all(t == m for t, m in zip(tv.shape, mv.shape))
|
44 |
+
|
45 |
+
torch_tensor = torch.from_numpy(mv.data().asnumpy())
|
46 |
+
torch_params[tn] = torch_tensor
|
47 |
+
|
48 |
+
# convert buffers (batch norm running stats)
|
49 |
+
mxb = [(k, v) for k, v in net.collect_params().items() if any(x in k for x in ['running_mean', 'running_var'])]
|
50 |
+
torchb = [(k, v) for k, v in torch_net.named_buffers() if 'num_batches' not in k]
|
51 |
+
for (tn, tv), (mn, mv) in zip(torchb, mxb):
|
52 |
+
print(tn, mn)
|
53 |
+
print(tv.shape, mv.shape)
|
54 |
+
|
55 |
+
# ensure ordering of BN params match since their sizes are not specific
|
56 |
+
if 'running_var' in tn:
|
57 |
+
assert 'running_var' in mn
|
58 |
+
if 'running_mean' in tn:
|
59 |
+
assert 'running_mean' in mn
|
60 |
+
|
61 |
+
torch_tensor = torch.from_numpy(mv.data().asnumpy())
|
62 |
+
torch_params[tn] = torch_tensor
|
63 |
+
|
64 |
+
torch_net.load_state_dict(torch_params)
|
65 |
+
torch_filename = './%s.pth' % torch_name
|
66 |
+
torch.save(torch_net.state_dict(), torch_filename)
|
67 |
+
with open(torch_filename, 'rb') as f:
|
68 |
+
sha_hash = hashlib.sha256(f.read()).hexdigest()
|
69 |
+
final_filename = os.path.splitext(torch_filename)[0] + '-' + sha_hash[:8] + '.pth'
|
70 |
+
os.rename(torch_filename, final_filename)
|
71 |
+
print("=> Saved converted model to '{}, SHA256: {}'".format(final_filename, sha_hash))
|
72 |
+
|
73 |
+
|
74 |
+
def map_mx_to_torch_model(mx_name):
|
75 |
+
torch_name = mx_name.lower()
|
76 |
+
if torch_name.startswith('se_'):
|
77 |
+
torch_name = torch_name.replace('se_', 'se')
|
78 |
+
elif torch_name.startswith('senet_'):
|
79 |
+
torch_name = torch_name.replace('senet_', 'senet')
|
80 |
+
elif torch_name.startswith('inceptionv3'):
|
81 |
+
torch_name = torch_name.replace('inceptionv3', 'inception_v3')
|
82 |
+
torch_name = 'gluon_' + torch_name
|
83 |
+
return torch_name
|
84 |
+
|
85 |
+
|
86 |
+
ALL = ['resnet18_v1b', 'resnet34_v1b', 'resnet50_v1b', 'resnet101_v1b', 'resnet152_v1b',
|
87 |
+
'resnet50_v1c', 'resnet101_v1c', 'resnet152_v1c', 'resnet50_v1d', 'resnet101_v1d', 'resnet152_v1d',
|
88 |
+
#'resnet50_v1e', 'resnet101_v1e', 'resnet152_v1e',
|
89 |
+
'resnet50_v1s', 'resnet101_v1s', 'resnet152_v1s', 'resnext50_32x4d', 'resnext101_32x4d', 'resnext101_64x4d',
|
90 |
+
'se_resnext50_32x4d', 'se_resnext101_32x4d', 'se_resnext101_64x4d', 'senet_154', 'inceptionv3']
|
91 |
+
|
92 |
+
|
93 |
+
def main():
|
94 |
+
args = parser.parse_args()
|
95 |
+
|
96 |
+
if not args.model or args.model == 'all':
|
97 |
+
for mx_model in ALL:
|
98 |
+
torch_model = map_mx_to_torch_model(mx_model)
|
99 |
+
convert(mx_model, torch_model)
|
100 |
+
else:
|
101 |
+
mx_model = args.model
|
102 |
+
torch_model = map_mx_to_torch_model(mx_model)
|
103 |
+
convert(mx_model, torch_model)
|
104 |
+
|
105 |
+
|
106 |
+
if __name__ == '__main__':
|
107 |
+
main()
|
convert/convert_nest_flax.py
ADDED
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Convert weights from https://github.com/google-research/nested-transformer
|
3 |
+
NOTE: You'll need https://github.com/google/CommonLoopUtils, not included in requirements.txt
|
4 |
+
"""
|
5 |
+
|
6 |
+
import sys
|
7 |
+
|
8 |
+
import numpy as np
|
9 |
+
import torch
|
10 |
+
|
11 |
+
from clu import checkpoint
|
12 |
+
|
13 |
+
|
14 |
+
arch_depths = {
|
15 |
+
'nest_base': [2, 2, 20],
|
16 |
+
'nest_small': [2, 2, 20],
|
17 |
+
'nest_tiny': [2, 2, 8],
|
18 |
+
}
|
19 |
+
|
20 |
+
|
21 |
+
def convert_nest(checkpoint_path, arch):
|
22 |
+
"""
|
23 |
+
Expects path to checkpoint which is a dir containing 4 files like in each of these folders
|
24 |
+
- https://console.cloud.google.com/storage/browser/gresearch/nest-checkpoints
|
25 |
+
`arch` is needed to
|
26 |
+
Returns a state dict that can be used with `torch.nn.Module.load_state_dict`
|
27 |
+
Hint: Follow timm.models.nest.Nest.__init__ and
|
28 |
+
https://github.com/google-research/nested-transformer/blob/main/models/nest_net.py
|
29 |
+
"""
|
30 |
+
assert arch in ['nest_base', 'nest_small', 'nest_tiny'], "Your `arch` is not supported"
|
31 |
+
|
32 |
+
flax_dict = checkpoint.load_state_dict(checkpoint_path)['optimizer']['target']
|
33 |
+
state_dict = {}
|
34 |
+
|
35 |
+
# Patch embedding
|
36 |
+
state_dict['patch_embed.proj.weight'] = torch.tensor(
|
37 |
+
flax_dict['PatchEmbedding_0']['Conv_0']['kernel']).permute(3, 2, 0, 1)
|
38 |
+
state_dict['patch_embed.proj.bias'] = torch.tensor(flax_dict['PatchEmbedding_0']['Conv_0']['bias'])
|
39 |
+
|
40 |
+
# Positional embeddings
|
41 |
+
posemb_keys = [k for k in flax_dict.keys() if k.startswith('PositionEmbedding')]
|
42 |
+
for i, k in enumerate(posemb_keys):
|
43 |
+
state_dict[f'levels.{i}.pos_embed'] = torch.tensor(flax_dict[k]['pos_embedding'])
|
44 |
+
|
45 |
+
# Transformer encoders
|
46 |
+
depths = arch_depths[arch]
|
47 |
+
for level in range(len(depths)):
|
48 |
+
for layer in range(depths[level]):
|
49 |
+
global_layer_ix = sum(depths[:level]) + layer
|
50 |
+
# Norms
|
51 |
+
for i in range(2):
|
52 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.norm{i+1}.weight'] = torch.tensor(
|
53 |
+
flax_dict[f'EncoderNDBlock_{global_layer_ix}'][f'LayerNorm_{i}']['scale'])
|
54 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.norm{i+1}.bias'] = torch.tensor(
|
55 |
+
flax_dict[f'EncoderNDBlock_{global_layer_ix}'][f'LayerNorm_{i}']['bias'])
|
56 |
+
# Attention qkv
|
57 |
+
w_q = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_0']['kernel']
|
58 |
+
w_kv = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_1']['kernel']
|
59 |
+
# Pay attention to dims here (maybe get pen and paper)
|
60 |
+
w_kv = np.concatenate(np.split(w_kv, 2, -1), 1)
|
61 |
+
w_qkv = np.concatenate([w_q, w_kv], 1)
|
62 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.qkv.weight'] = torch.tensor(w_qkv).flatten(1).permute(1,0)
|
63 |
+
b_q = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_0']['bias']
|
64 |
+
b_kv = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_1']['bias']
|
65 |
+
# Pay attention to dims here (maybe get pen and paper)
|
66 |
+
b_kv = np.concatenate(np.split(b_kv, 2, -1), 0)
|
67 |
+
b_qkv = np.concatenate([b_q, b_kv], 0)
|
68 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.qkv.bias'] = torch.tensor(b_qkv).reshape(-1)
|
69 |
+
# Attention proj
|
70 |
+
w_proj = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['proj_kernel']
|
71 |
+
w_proj = torch.tensor(w_proj).permute(2, 1, 0).flatten(1)
|
72 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.proj.weight'] = w_proj
|
73 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.proj.bias'] = torch.tensor(
|
74 |
+
flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['bias'])
|
75 |
+
# MLP
|
76 |
+
for i in range(2):
|
77 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.mlp.fc{i+1}.weight'] = torch.tensor(
|
78 |
+
flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MlpBlock_0'][f'Dense_{i}']['kernel']).permute(1, 0)
|
79 |
+
state_dict[f'levels.{level}.transformer_encoder.{layer}.mlp.fc{i+1}.bias'] = torch.tensor(
|
80 |
+
flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MlpBlock_0'][f'Dense_{i}']['bias'])
|
81 |
+
|
82 |
+
# Block aggregations (ConvPool)
|
83 |
+
for level in range(1, len(depths)):
|
84 |
+
# Convs
|
85 |
+
state_dict[f'levels.{level}.pool.conv.weight'] = torch.tensor(
|
86 |
+
flax_dict[f'ConvPool_{level-1}']['Conv_0']['kernel']).permute(3, 2, 0, 1)
|
87 |
+
state_dict[f'levels.{level}.pool.conv.bias'] = torch.tensor(
|
88 |
+
flax_dict[f'ConvPool_{level-1}']['Conv_0']['bias'])
|
89 |
+
# Norms
|
90 |
+
state_dict[f'levels.{level}.pool.norm.weight'] = torch.tensor(
|
91 |
+
flax_dict[f'ConvPool_{level-1}']['LayerNorm_0']['scale'])
|
92 |
+
state_dict[f'levels.{level}.pool.norm.bias'] = torch.tensor(
|
93 |
+
flax_dict[f'ConvPool_{level-1}']['LayerNorm_0']['bias'])
|
94 |
+
|
95 |
+
# Final norm
|
96 |
+
state_dict[f'norm.weight'] = torch.tensor(flax_dict['LayerNorm_0']['scale'])
|
97 |
+
state_dict[f'norm.bias'] = torch.tensor(flax_dict['LayerNorm_0']['bias'])
|
98 |
+
|
99 |
+
# Classifier
|
100 |
+
state_dict['head.weight'] = torch.tensor(flax_dict['Dense_0']['kernel']).permute(1, 0)
|
101 |
+
state_dict['head.bias'] = torch.tensor(flax_dict['Dense_0']['bias'])
|
102 |
+
|
103 |
+
return state_dict
|
104 |
+
|
105 |
+
|
106 |
+
if __name__ == '__main__':
|
107 |
+
variant = sys.argv[1] # base, small, or tiny
|
108 |
+
state_dict = convert_nest(f'./nest-{variant[0]}_imagenet', f'nest_{variant}')
|
109 |
+
torch.save(state_dict, f'./jx_nest_{variant}.pth')
|
demo.py
ADDED
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
import os, cv2, time, math
|
3 |
+
print("=> Loading libraries...")
|
4 |
+
start = time.time()
|
5 |
+
|
6 |
+
import requests, torch
|
7 |
+
import gradio as gr
|
8 |
+
from torchvision import transforms
|
9 |
+
from datasets import load_dataset
|
10 |
+
from timm.data import create_transform
|
11 |
+
from timm.models import create_model, load_checkpoint
|
12 |
+
from pytorch_grad_cam import GradCAM
|
13 |
+
from pytorch_grad_cam.utils.image import show_cam_on_image
|
14 |
+
|
15 |
+
|
16 |
+
print(f"=> Libraries loaded in {time.time()- start:.2f} sec(s).")
|
17 |
+
print("=> Loading model...")
|
18 |
+
start = time.time()
|
19 |
+
|
20 |
+
size = "b"
|
21 |
+
img_size = 224
|
22 |
+
crop_pct = 0.9
|
23 |
+
IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
|
24 |
+
IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225)
|
25 |
+
|
26 |
+
model = create_model(f"tpmlp_{size}").cuda()
|
27 |
+
load_checkpoint(model, f"../tpmlp_{size}.pth.tar", True)
|
28 |
+
model.eval()
|
29 |
+
|
30 |
+
response = requests.get("https://git.io/JJkYN")
|
31 |
+
labels = response.text.split("\n")
|
32 |
+
|
33 |
+
augs = create_transform(
|
34 |
+
input_size=(3, 224, 224),
|
35 |
+
is_training=False,
|
36 |
+
use_prefetcher=False,
|
37 |
+
crop_pct=0.9,
|
38 |
+
)
|
39 |
+
|
40 |
+
|
41 |
+
scale_size = math.floor(img_size / crop_pct)
|
42 |
+
resize = transforms.Compose([
|
43 |
+
transforms.Resize(scale_size),
|
44 |
+
transforms.CenterCrop(img_size),
|
45 |
+
transforms.ToTensor()
|
46 |
+
])
|
47 |
+
normalize = transforms.Normalize(mean=torch.tensor(IMAGENET_DEFAULT_MEAN), std=torch.tensor(IMAGENET_DEFAULT_STD))
|
48 |
+
|
49 |
+
def transform(img):
|
50 |
+
img = resize(img.convert("RGB"))
|
51 |
+
tensor = normalize(img)
|
52 |
+
return img, tensor
|
53 |
+
|
54 |
+
def predict(inp):
|
55 |
+
img, inp = transform(inp)
|
56 |
+
inp = inp.unsqueeze(0)
|
57 |
+
with GradCAM(model=model, target_layers=[model.layers[3]], use_cuda=True) as cam:
|
58 |
+
grayscale_cam, probs = cam(input_tensor=inp, aug_smooth=False, eigen_smooth=False, return_probs=True)
|
59 |
+
|
60 |
+
# Here grayscale_cam has only one image in the batch
|
61 |
+
grayscale_cam = grayscale_cam[0, :]
|
62 |
+
probs = probs[0, :]
|
63 |
+
|
64 |
+
cam_image = show_cam_on_image(img.permute(1, 2, 0).detach().cpu().numpy(), grayscale_cam, use_rgb=True, image_weight=0.5, colormap=cv2.COLORMAP_TWILIGHT_SHIFTED)
|
65 |
+
confidences = {labels[i]: float(probs[i]) for i in range(1000)}
|
66 |
+
return confidences, cam_image
|
67 |
+
|
68 |
+
print(f"=> Model (tpmlp_{size}) loaded in {time.time()- start:.2f} sec(s).")
|
69 |
+
|
70 |
+
if not os.path.isdir("../example-imgs"):
|
71 |
+
os.mkdir("../example-imgs")
|
72 |
+
|
73 |
+
print("=> Loading examples.")
|
74 |
+
indices = [
|
75 |
+
0, # Coucal
|
76 |
+
2, # Volcano
|
77 |
+
7, # Sombrero
|
78 |
+
9, # Balance beam
|
79 |
+
10, # Sulphur-crested cockatoo
|
80 |
+
11, # Shower cap
|
81 |
+
12, # Petri dish INCORRECTLY CLASSIFIED as lens
|
82 |
+
14, # Angora rabbit
|
83 |
+
]
|
84 |
+
ds = load_dataset("imagenet-1k", split="validation", streaming=True)
|
85 |
+
examples = []; idx = 0
|
86 |
+
start = time.time()
|
87 |
+
for data in ds:
|
88 |
+
if idx == indices:
|
89 |
+
data['image'].save(f"../example-imgs/{idx}.png")
|
90 |
+
idx += 1
|
91 |
+
if idx == max(indices):
|
92 |
+
break
|
93 |
+
del ds
|
94 |
+
print(f"=> Examples loaded in {time.time()- start:.2f} sec(s).")
|
95 |
+
|
96 |
+
# demo = gr.Interface(
|
97 |
+
# fn=predict,
|
98 |
+
# inputs=gr.inputs.Image(type="pil"),
|
99 |
+
# outputs=[gr.outputs.Label(num_top_classes=4), gr.outputs.Image(type="numpy")],
|
100 |
+
# examples=[f"../example-imgs/{idx}.png" for idx in indices],
|
101 |
+
# )
|
102 |
+
|
103 |
+
|
104 |
+
with gr.Blocks(theme=gr.themes.Monochrome(font=[gr.themes.GoogleFont("DM Sans"), "sans-serif"])) as demo:
|
105 |
+
gr.HTML("""
|
106 |
+
<h1 align="center">Interactive Demo</h1>
|
107 |
+
<h2 align="center">CS-Mixer: A Cross-Scale Vision MLP Model with Spatial–Channel Mixing</h2>
|
108 |
+
<br><br>
|
109 |
+
""")
|
110 |
+
with gr.Row():
|
111 |
+
input_image = gr.Image(type="pil", min_width=300, label="Input Image")
|
112 |
+
softmax = gr.Label(num_top_classes=4, min_width=200, label="Model Predictions")
|
113 |
+
grad_cam = gr.Image(type="numpy", min_width=300, label="Grad-CAM")
|
114 |
+
with gr.Row():
|
115 |
+
gr.Button("Predict").click(fn=predict, inputs=input_image, outputs=[softmax, grad_cam])
|
116 |
+
gr.ClearButton(input_image)
|
117 |
+
with gr.Row():
|
118 |
+
gr.Examples([f"../example-imgs/{idx}.png" for idx in indices], inputs=input_image, outputs=[softmax, grad_cam], fn=predict, run_on_click=True)
|
119 |
+
|
120 |
+
demo.launch(share=True, allowed_paths=["../example-imgs"])
|
distributed_train.sh
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
NUM_PROC=$1
|
3 |
+
shift
|
4 |
+
torchrun --nproc_per_node=$NUM_PROC train.py "$@"
|
5 |
+
|
docs/archived_changes.md
ADDED
@@ -0,0 +1,406 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Archived Changes
|
2 |
+
|
3 |
+
### Nov 22, 2021
|
4 |
+
* A number of updated weights anew new model defs
|
5 |
+
* `eca_halonext26ts` - 79.5 @ 256
|
6 |
+
* `resnet50_gn` (new) - 80.1 @ 224, 81.3 @ 288
|
7 |
+
* `resnet50` - 80.7 @ 224, 80.9 @ 288 (trained at 176, not replacing current a1 weights as default since these don't scale as well to higher res, [weights](https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1h2_176-001a1197.pth))
|
8 |
+
* `resnext50_32x4d` - 81.1 @ 224, 82.0 @ 288
|
9 |
+
* `sebotnet33ts_256` (new) - 81.2 @ 224
|
10 |
+
* `lamhalobotnet50ts_256` - 81.5 @ 256
|
11 |
+
* `halonet50ts` - 81.7 @ 256
|
12 |
+
* `halo2botnet50ts_256` - 82.0 @ 256
|
13 |
+
* `resnet101` - 82.0 @ 224, 82.8 @ 288
|
14 |
+
* `resnetv2_101` (new) - 82.1 @ 224, 83.0 @ 288
|
15 |
+
* `resnet152` - 82.8 @ 224, 83.5 @ 288
|
16 |
+
* `regnetz_d8` (new) - 83.5 @ 256, 84.0 @ 320
|
17 |
+
* `regnetz_e8` (new) - 84.5 @ 256, 85.0 @ 320
|
18 |
+
* `vit_base_patch8_224` (85.8 top-1) & `in21k` variant weights added thanks [Martins Bruveris](https://github.com/martinsbruveris)
|
19 |
+
* Groundwork in for FX feature extraction thanks to [Alexander Soare](https://github.com/alexander-soare)
|
20 |
+
* models updated for tracing compatibility (almost full support with some distlled transformer exceptions)
|
21 |
+
|
22 |
+
### Oct 19, 2021
|
23 |
+
* ResNet strikes back (https://arxiv.org/abs/2110.00476) weights added, plus any extra training components used. Model weights and some more details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-rsb-weights)
|
24 |
+
* BCE loss and Repeated Augmentation support for RSB paper
|
25 |
+
* 4 series of ResNet based attention model experiments being added (implemented across byobnet.py/byoanet.py). These include all sorts of attention, from channel attn like SE, ECA to 2D QKV self-attention layers such as Halo, Bottlneck, Lambda. Details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
26 |
+
* Working implementations of the following 2D self-attention modules (likely to be differences from paper or eventual official impl):
|
27 |
+
* Halo (https://arxiv.org/abs/2103.12731)
|
28 |
+
* Bottleneck Transformer (https://arxiv.org/abs/2101.11605)
|
29 |
+
* LambdaNetworks (https://arxiv.org/abs/2102.08602)
|
30 |
+
* A RegNetZ series of models with some attention experiments (being added to). These do not follow the paper (https://arxiv.org/abs/2103.06877) in any way other than block architecture, details of official models are not available. See more here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
31 |
+
* ConvMixer (https://openreview.net/forum?id=TVHS5Y4dNvM), CrossVit (https://arxiv.org/abs/2103.14899), and BeiT (https://arxiv.org/abs/2106.08254) architectures + weights added
|
32 |
+
* freeze/unfreeze helpers by [Alexander Soare](https://github.com/alexander-soare)
|
33 |
+
|
34 |
+
### Aug 18, 2021
|
35 |
+
* Optimizer bonanza!
|
36 |
+
* Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ `timm bits` [branch](https://github.com/rwightman/pytorch-image-models/tree/bits_and_tpu/timm/bits))
|
37 |
+
* Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
|
38 |
+
* Some cleanup on all optimizers and factory. No more `.data`, a bit more consistency, unit tests for all!
|
39 |
+
* SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
|
40 |
+
* EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
|
41 |
+
* Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.
|
42 |
+
|
43 |
+
### July 12, 2021
|
44 |
+
* Add XCiT models from [official facebook impl](https://github.com/facebookresearch/xcit). Contributed by [Alexander Soare](https://github.com/alexander-soare)
|
45 |
+
|
46 |
+
### July 5-9, 2021
|
47 |
+
* Add `efficientnetv2_rw_t` weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)
|
48 |
+
* top-1 82.34 @ 288x288 and 82.54 @ 320x320
|
49 |
+
* Add [SAM pretrained](https://arxiv.org/abs/2106.01548) in1k weight for ViT B/16 (`vit_base_patch16_sam_224`) and B/32 (`vit_base_patch32_sam_224`) models.
|
50 |
+
* Add 'Aggregating Nested Transformer' (NesT) w/ weights converted from official [Flax impl](https://github.com/google-research/nested-transformer). Contributed by [Alexander Soare](https://github.com/alexander-soare).
|
51 |
+
* `jx_nest_base` - 83.534, `jx_nest_small` - 83.120, `jx_nest_tiny` - 81.426
|
52 |
+
|
53 |
+
### June 23, 2021
|
54 |
+
* Reproduce gMLP model training, `gmlp_s16_224` trained to 79.6 top-1, matching [paper](https://arxiv.org/abs/2105.08050). Hparams for this and other recent MLP training [here](https://gist.github.com/rwightman/d6c264a9001f9167e06c209f630b2cc6)
|
55 |
+
|
56 |
+
### June 20, 2021
|
57 |
+
* Release Vision Transformer 'AugReg' weights from [How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers](https://arxiv.org/abs/2106.10270)
|
58 |
+
* .npz weight loading support added, can load any of the 50K+ weights from the [AugReg series](https://console.cloud.google.com/storage/browser/vit_models/augreg)
|
59 |
+
* See [example notebook](https://colab.research.google.com/github/google-research/vision_transformer/blob/master/vit_jax_augreg.ipynb) from [official impl](https://github.com/google-research/vision_transformer/) for navigating the augreg weights
|
60 |
+
* Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.
|
61 |
+
* Highlights: `vit_large_patch16_384` (87.1 top-1), `vit_large_r50_s32_384` (86.2 top-1), `vit_base_patch16_384` (86.0 top-1)
|
62 |
+
* `vit_deit_*` renamed to just `deit_*`
|
63 |
+
* Remove my old small model, replace with DeiT compatible small w/ AugReg weights
|
64 |
+
* Add 1st training of my `gmixer_24_224` MLP /w GLU, 78.1 top-1 w/ 25M params.
|
65 |
+
* Add weights from official ResMLP release (https://github.com/facebookresearch/deit)
|
66 |
+
* Add `eca_nfnet_l2` weights from my 'lightweight' series. 84.7 top-1 at 384x384.
|
67 |
+
* Add distilled BiT 50x1 student and 152x2 Teacher weights from [Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237)
|
68 |
+
* NFNets and ResNetV2-BiT models work w/ Pytorch XLA now
|
69 |
+
* weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)
|
70 |
+
* eps values adjusted, will be slight differences but should be quite close
|
71 |
+
* Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models
|
72 |
+
* Cleanup a few classifier / flatten details for models w/ conv classifiers or early global pool
|
73 |
+
* Please report any regressions, this PR touched quite a few models.
|
74 |
+
|
75 |
+
### June 8, 2021
|
76 |
+
* Add first ResMLP weights, trained in PyTorch XLA on TPU-VM w/ my XLA branch. 24 block variant, 79.2 top-1.
|
77 |
+
* Add ResNet51-Q model w/ pretrained weights at 82.36 top-1.
|
78 |
+
* NFNet inspired block layout with quad layer stem and no maxpool
|
79 |
+
* Same param count (35.7M) and throughput as ResNetRS-50 but +1.5 top-1 @ 224x224 and +2.5 top-1 at 288x288
|
80 |
+
|
81 |
+
### May 25, 2021
|
82 |
+
* Add LeViT, Visformer, Convit (PR by Aman Arora), Twins (PR by paper authors) transformer models
|
83 |
+
* Cleanup input_size/img_size override handling and testing for all vision transformer models
|
84 |
+
* Add `efficientnetv2_rw_m` model and weights (started training before official code). 84.8 top-1, 53M params.
|
85 |
+
|
86 |
+
### May 14, 2021
|
87 |
+
* Add EfficientNet-V2 official model defs w/ ported weights from official [Tensorflow/Keras](https://github.com/google/automl/tree/master/efficientnetv2) impl.
|
88 |
+
* 1k trained variants: `tf_efficientnetv2_s/m/l`
|
89 |
+
* 21k trained variants: `tf_efficientnetv2_s/m/l_in21k`
|
90 |
+
* 21k pretrained -> 1k fine-tuned: `tf_efficientnetv2_s/m/l_in21ft1k`
|
91 |
+
* v2 models w/ v1 scaling: `tf_efficientnetv2_b0` through `b3`
|
92 |
+
* Rename my prev V2 guess `efficientnet_v2s` -> `efficientnetv2_rw_s`
|
93 |
+
* Some blank `efficientnetv2_*` models in-place for future native PyTorch training
|
94 |
+
|
95 |
+
### May 5, 2021
|
96 |
+
* Add MLP-Mixer models and port pretrained weights from [Google JAX impl](https://github.com/google-research/vision_transformer/tree/linen)
|
97 |
+
* Add CaiT models and pretrained weights from [FB](https://github.com/facebookresearch/deit)
|
98 |
+
* Add ResNet-RS models and weights from [TF](https://github.com/tensorflow/tpu/tree/master/models/official/resnet/resnet_rs). Thanks [Aman Arora](https://github.com/amaarora)
|
99 |
+
* Add CoaT models and weights. Thanks [Mohammed Rizin](https://github.com/morizin)
|
100 |
+
* Add new ImageNet-21k weights & finetuned weights for TResNet, MobileNet-V3, ViT models. Thanks [mrT](https://github.com/mrT23)
|
101 |
+
* Add GhostNet models and weights. Thanks [Kai Han](https://github.com/iamhankai)
|
102 |
+
* Update ByoaNet attention modles
|
103 |
+
* Improve SA module inits
|
104 |
+
* Hack together experimental stand-alone Swin based attn module and `swinnet`
|
105 |
+
* Consistent '26t' model defs for experiments.
|
106 |
+
* Add improved Efficientnet-V2S (prelim model def) weights. 83.8 top-1.
|
107 |
+
* WandB logging support
|
108 |
+
|
109 |
+
### April 13, 2021
|
110 |
+
* Add Swin Transformer models and weights from https://github.com/microsoft/Swin-Transformer
|
111 |
+
|
112 |
+
### April 12, 2021
|
113 |
+
* Add ECA-NFNet-L1 (slimmed down F1 w/ SiLU, 41M params) trained with this code. 84% top-1 @ 320x320. Trained at 256x256.
|
114 |
+
* Add EfficientNet-V2S model (unverified model definition) weights. 83.3 top-1 @ 288x288. Only trained single res 224. Working on progressive training.
|
115 |
+
* Add ByoaNet model definition (Bring-your-own-attention) w/ SelfAttention block and corresponding SA/SA-like modules and model defs
|
116 |
+
* Lambda Networks - https://arxiv.org/abs/2102.08602
|
117 |
+
* Bottleneck Transformers - https://arxiv.org/abs/2101.11605
|
118 |
+
* Halo Nets - https://arxiv.org/abs/2103.12731
|
119 |
+
* Adabelief optimizer contributed by Juntang Zhuang
|
120 |
+
|
121 |
+
### April 1, 2021
|
122 |
+
* Add snazzy `benchmark.py` script for bulk `timm` model benchmarking of train and/or inference
|
123 |
+
* Add Pooling-based Vision Transformer (PiT) models (from https://github.com/naver-ai/pit)
|
124 |
+
* Merged distilled variant into main for torchscript compatibility
|
125 |
+
* Some `timm` cleanup/style tweaks and weights have hub download support
|
126 |
+
* Cleanup Vision Transformer (ViT) models
|
127 |
+
* Merge distilled (DeiT) model into main so that torchscript can work
|
128 |
+
* Support updated weight init (defaults to old still) that closer matches original JAX impl (possibly better training from scratch)
|
129 |
+
* Separate hybrid model defs into different file and add several new model defs to fiddle with, support patch_size != 1 for hybrids
|
130 |
+
* Fix fine-tuning num_class changes (PiT and ViT) and pos_embed resizing (Vit) with distilled variants
|
131 |
+
* nn.Sequential for block stack (does not break downstream compat)
|
132 |
+
* TnT (Transformer-in-Transformer) models contributed by author (from https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/TNT)
|
133 |
+
* Add RegNetY-160 weights from DeiT teacher model
|
134 |
+
* Add new NFNet-L0 w/ SE attn (rename `nfnet_l0b`->`nfnet_l0`) weights 82.75 top-1 @ 288x288
|
135 |
+
* Some fixes/improvements for TFDS dataset wrapper
|
136 |
+
|
137 |
+
### March 7, 2021
|
138 |
+
* First 0.4.x PyPi release w/ NFNets (& related), ByoB (GPU-Efficient, RepVGG, etc).
|
139 |
+
* Change feature extraction for pre-activation nets (NFNets, ResNetV2) to return features before activation.
|
140 |
+
|
141 |
+
### Feb 18, 2021
|
142 |
+
* Add pretrained weights and model variants for NFNet-F* models from [DeepMind Haiku impl](https://github.com/deepmind/deepmind-research/tree/master/nfnets).
|
143 |
+
* Models are prefixed with `dm_`. They require SAME padding conv, skipinit enabled, and activation gains applied in act fn.
|
144 |
+
* These models are big, expect to run out of GPU memory. With the GELU activiation + other options, they are roughly 1/2 the inference speed of my SiLU PyTorch optimized `s` variants.
|
145 |
+
* Original model results are based on pre-processing that is not the same as all other models so you'll see different results in the results csv (once updated).
|
146 |
+
* Matching the original pre-processing as closely as possible I get these results:
|
147 |
+
* `dm_nfnet_f6` - 86.352
|
148 |
+
* `dm_nfnet_f5` - 86.100
|
149 |
+
* `dm_nfnet_f4` - 85.834
|
150 |
+
* `dm_nfnet_f3` - 85.676
|
151 |
+
* `dm_nfnet_f2` - 85.178
|
152 |
+
* `dm_nfnet_f1` - 84.696
|
153 |
+
* `dm_nfnet_f0` - 83.464
|
154 |
+
|
155 |
+
### Feb 16, 2021
|
156 |
+
* Add Adaptive Gradient Clipping (AGC) as per https://arxiv.org/abs/2102.06171. Integrated w/ PyTorch gradient clipping via mode arg that defaults to prev 'norm' mode. For backward arg compat, clip-grad arg must be specified to enable when using train.py.
|
157 |
+
* AGC w/ default clipping factor `--clip-grad .01 --clip-mode agc`
|
158 |
+
* PyTorch global norm of 1.0 (old behaviour, always norm), `--clip-grad 1.0`
|
159 |
+
* PyTorch value clipping of 10, `--clip-grad 10. --clip-mode value`
|
160 |
+
* AGC performance is definitely sensitive to the clipping factor. More experimentation needed to determine good values for smaller batch sizes and optimizers besides those in paper. So far I've found .001-.005 is necessary for stable RMSProp training w/ NFNet/NF-ResNet.
|
161 |
+
|
162 |
+
### Feb 12, 2021
|
163 |
+
* Update Normalization-Free nets to include new NFNet-F (https://arxiv.org/abs/2102.06171) model defs
|
164 |
+
|
165 |
+
### Feb 10, 2021
|
166 |
+
* More model archs, incl a flexible ByobNet backbone ('Bring-your-own-blocks')
|
167 |
+
* GPU-Efficient-Networks (https://github.com/idstcv/GPU-Efficient-Networks), impl in `byobnet.py`
|
168 |
+
* RepVGG (https://github.com/DingXiaoH/RepVGG), impl in `byobnet.py`
|
169 |
+
* classic VGG (from torchvision, impl in `vgg`)
|
170 |
+
* Refinements to normalizer layer arg handling and normalizer+act layer handling in some models
|
171 |
+
* Default AMP mode changed to native PyTorch AMP instead of APEX. Issues not being fixed with APEX. Native works with `--channels-last` and `--torchscript` model training, APEX does not.
|
172 |
+
* Fix a few bugs introduced since last pypi release
|
173 |
+
|
174 |
+
### Feb 8, 2021
|
175 |
+
* Add several ResNet weights with ECA attention. 26t & 50t trained @ 256, test @ 320. 269d train @ 256, fine-tune @320, test @ 352.
|
176 |
+
* `ecaresnet26t` - 79.88 top-1 @ 320x320, 79.08 @ 256x256
|
177 |
+
* `ecaresnet50t` - 82.35 top-1 @ 320x320, 81.52 @ 256x256
|
178 |
+
* `ecaresnet269d` - 84.93 top-1 @ 352x352, 84.87 @ 320x320
|
179 |
+
* Remove separate tiered (`t`) vs tiered_narrow (`tn`) ResNet model defs, all `tn` changed to `t` and `t` models removed (`seresnext26t_32x4d` only model w/ weights that was removed).
|
180 |
+
* Support model default_cfgs with separate train vs test resolution `test_input_size` and remove extra `_320` suffix ResNet model defs that were just for test.
|
181 |
+
|
182 |
+
### Jan 30, 2021
|
183 |
+
* Add initial "Normalization Free" NF-RegNet-B* and NF-ResNet model definitions based on [paper](https://arxiv.org/abs/2101.08692)
|
184 |
+
|
185 |
+
### Jan 25, 2021
|
186 |
+
* Add ResNetV2 Big Transfer (BiT) models w/ ImageNet-1k and 21k weights from https://github.com/google-research/big_transfer
|
187 |
+
* Add official R50+ViT-B/16 hybrid models + weights from https://github.com/google-research/vision_transformer
|
188 |
+
* ImageNet-21k ViT weights are added w/ model defs and representation layer (pre logits) support
|
189 |
+
* NOTE: ImageNet-21k classifier heads were zero'd in original weights, they are only useful for transfer learning
|
190 |
+
* Add model defs and weights for DeiT Vision Transformer models from https://github.com/facebookresearch/deit
|
191 |
+
* Refactor dataset classes into ImageDataset/IterableImageDataset + dataset specific parser classes
|
192 |
+
* Add Tensorflow-Datasets (TFDS) wrapper to allow use of TFDS image classification sets with train script
|
193 |
+
* Ex: `train.py /data/tfds --dataset tfds/oxford_iiit_pet --val-split test --model resnet50 -b 256 --amp --num-classes 37 --opt adamw --lr 3e-4 --weight-decay .001 --pretrained -j 2`
|
194 |
+
* Add improved .tar dataset parser that reads images from .tar, folder of .tar files, or .tar within .tar
|
195 |
+
* Run validation on full ImageNet-21k directly from tar w/ BiT model: `validate.py /data/fall11_whole.tar --model resnetv2_50x1_bitm_in21k --amp`
|
196 |
+
* Models in this update should be stable w/ possible exception of ViT/BiT, possibility of some regressions with train/val scripts and dataset handling
|
197 |
+
|
198 |
+
### Jan 3, 2021
|
199 |
+
* Add SE-ResNet-152D weights
|
200 |
+
* 256x256 val, 0.94 crop top-1 - 83.75
|
201 |
+
* 320x320 val, 1.0 crop - 84.36
|
202 |
+
* Update results files
|
203 |
+
|
204 |
+
### Dec 18, 2020
|
205 |
+
* Add ResNet-101D, ResNet-152D, and ResNet-200D weights trained @ 256x256
|
206 |
+
* 256x256 val, 0.94 crop (top-1) - 101D (82.33), 152D (83.08), 200D (83.25)
|
207 |
+
* 288x288 val, 1.0 crop - 101D (82.64), 152D (83.48), 200D (83.76)
|
208 |
+
* 320x320 val, 1.0 crop - 101D (83.00), 152D (83.66), 200D (84.01)
|
209 |
+
|
210 |
+
### Dec 7, 2020
|
211 |
+
* Simplify EMA module (ModelEmaV2), compatible with fully torchscripted models
|
212 |
+
* Misc fixes for SiLU ONNX export, default_cfg missing from Feature extraction models, Linear layer w/ AMP + torchscript
|
213 |
+
* PyPi release @ 0.3.2 (needed by EfficientDet)
|
214 |
+
|
215 |
+
|
216 |
+
### Oct 30, 2020
|
217 |
+
* Test with PyTorch 1.7 and fix a small top-n metric view vs reshape issue.
|
218 |
+
* Convert newly added 224x224 Vision Transformer weights from official JAX repo. 81.8 top-1 for B/16, 83.1 L/16.
|
219 |
+
* Support PyTorch 1.7 optimized, native SiLU (aka Swish) activation. Add mapping to 'silu' name, custom swish will eventually be deprecated.
|
220 |
+
* Fix regression for loading pretrained classifier via direct model entrypoint functions. Didn't impact create_model() factory usage.
|
221 |
+
* PyPi release @ 0.3.0 version!
|
222 |
+
|
223 |
+
### Oct 26, 2020
|
224 |
+
* Update Vision Transformer models to be compatible with official code release at https://github.com/google-research/vision_transformer
|
225 |
+
* Add Vision Transformer weights (ImageNet-21k pretrain) for 384x384 base and large models converted from official jax impl
|
226 |
+
* ViT-B/16 - 84.2
|
227 |
+
* ViT-B/32 - 81.7
|
228 |
+
* ViT-L/16 - 85.2
|
229 |
+
* ViT-L/32 - 81.5
|
230 |
+
|
231 |
+
### Oct 21, 2020
|
232 |
+
* Weights added for Vision Transformer (ViT) models. 77.86 top-1 for 'small' and 79.35 for 'base'. Thanks to [Christof](https://www.kaggle.com/christofhenkel) for training the base model w/ lots of GPUs.
|
233 |
+
|
234 |
+
### Oct 13, 2020
|
235 |
+
* Initial impl of Vision Transformer models. Both patch and hybrid (CNN backbone) variants. Currently trying to train...
|
236 |
+
* Adafactor and AdaHessian (FP32 only, no AMP) optimizers
|
237 |
+
* EdgeTPU-M (`efficientnet_em`) model trained in PyTorch, 79.3 top-1
|
238 |
+
* Pip release, doc updates pending a few more changes...
|
239 |
+
|
240 |
+
### Sept 18, 2020
|
241 |
+
* New ResNet 'D' weights. 72.7 (top-1) ResNet-18-D, 77.1 ResNet-34-D, 80.5 ResNet-50-D
|
242 |
+
* Added a few untrained defs for other ResNet models (66D, 101D, 152D, 200/200D)
|
243 |
+
|
244 |
+
### Sept 3, 2020
|
245 |
+
* New weights
|
246 |
+
* Wide-ResNet50 - 81.5 top-1 (vs 78.5 torchvision)
|
247 |
+
* SEResNeXt50-32x4d - 81.3 top-1 (vs 79.1 cadene)
|
248 |
+
* Support for native Torch AMP and channels_last memory format added to train/validate scripts (`--channels-last`, `--native-amp` vs `--apex-amp`)
|
249 |
+
* Models tested with channels_last on latest NGC 20.08 container. AdaptiveAvgPool in attn layers changed to mean((2,3)) to work around bug with NHWC kernel.
|
250 |
+
|
251 |
+
### Aug 12, 2020
|
252 |
+
* New/updated weights from training experiments
|
253 |
+
* EfficientNet-B3 - 82.1 top-1 (vs 81.6 for official with AA and 81.9 for AdvProp)
|
254 |
+
* RegNetY-3.2GF - 82.0 top-1 (78.9 from official ver)
|
255 |
+
* CSPResNet50 - 79.6 top-1 (76.6 from official ver)
|
256 |
+
* Add CutMix integrated w/ Mixup. See [pull request](https://github.com/rwightman/pytorch-image-models/pull/218) for some usage examples
|
257 |
+
* Some fixes for using pretrained weights with `in_chans` != 3 on several models.
|
258 |
+
|
259 |
+
### Aug 5, 2020
|
260 |
+
Universal feature extraction, new models, new weights, new test sets.
|
261 |
+
* All models support the `features_only=True` argument for `create_model` call to return a network that extracts feature maps from the deepest layer at each stride.
|
262 |
+
* New models
|
263 |
+
* CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet
|
264 |
+
* ReXNet
|
265 |
+
* (Modified Aligned) Xception41/65/71 (a proper port of TF models)
|
266 |
+
* New trained weights
|
267 |
+
* SEResNet50 - 80.3 top-1
|
268 |
+
* CSPDarkNet53 - 80.1 top-1
|
269 |
+
* CSPResNeXt50 - 80.0 top-1
|
270 |
+
* DPN68b - 79.2 top-1
|
271 |
+
* EfficientNet-Lite0 (non-TF ver) - 75.5 (submitted by [@hal-314](https://github.com/hal-314))
|
272 |
+
* Add 'real' labels for ImageNet and ImageNet-Renditions test set, see [`results/README.md`](results/README.md)
|
273 |
+
* Test set ranking/top-n diff script by [@KushajveerSingh](https://github.com/KushajveerSingh)
|
274 |
+
* Train script and loader/transform tweaks to punch through more aug arguments
|
275 |
+
* README and documentation overhaul. See initial (WIP) documentation at https://rwightman.github.io/pytorch-image-models/
|
276 |
+
* adamp and sgdp optimizers added by [@hellbell](https://github.com/hellbell)
|
277 |
+
|
278 |
+
### June 11, 2020
|
279 |
+
Bunch of changes:
|
280 |
+
* DenseNet models updated with memory efficient addition from torchvision (fixed a bug), blur pooling and deep stem additions
|
281 |
+
* VoVNet V1 and V2 models added, 39 V2 variant (ese_vovnet_39b) trained to 79.3 top-1
|
282 |
+
* Activation factory added along with new activations:
|
283 |
+
* select act at model creation time for more flexibility in using activations compatible with scripting or tracing (ONNX export)
|
284 |
+
* hard_mish (experimental) added with memory-efficient grad, along with ME hard_swish
|
285 |
+
* context mgr for setting exportable/scriptable/no_jit states
|
286 |
+
* Norm + Activation combo layers added with initial trial support in DenseNet and VoVNet along with impl of EvoNorm and InplaceAbn wrapper that fit the interface
|
287 |
+
* Torchscript works for all but two of the model types as long as using Pytorch 1.5+, tests added for this
|
288 |
+
* Some import cleanup and classifier reset changes, all models will have classifier reset to nn.Identity on reset_classifer(0) call
|
289 |
+
* Prep for 0.1.28 pip release
|
290 |
+
|
291 |
+
### May 12, 2020
|
292 |
+
* Add ResNeSt models (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955))
|
293 |
+
|
294 |
+
### May 3, 2020
|
295 |
+
* Pruned EfficientNet B1, B2, and B3 (https://arxiv.org/abs/2002.08258) contributed by [Yonathan Aflalo](https://github.com/yoniaflalo)
|
296 |
+
|
297 |
+
### May 1, 2020
|
298 |
+
* Merged a number of execellent contributions in the ResNet model family over the past month
|
299 |
+
* BlurPool2D and resnetblur models initiated by [Chris Ha](https://github.com/VRandme), I trained resnetblur50 to 79.3.
|
300 |
+
* TResNet models and SpaceToDepth, AntiAliasDownsampleLayer layers by [mrT23](https://github.com/mrT23)
|
301 |
+
* ecaresnet (50d, 101d, light) models and two pruned variants using pruning as per (https://arxiv.org/abs/2002.08258) by [Yonathan Aflalo](https://github.com/yoniaflalo)
|
302 |
+
* 200 pretrained models in total now with updated results csv in results folder
|
303 |
+
|
304 |
+
### April 5, 2020
|
305 |
+
* Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite
|
306 |
+
* 3.5M param MobileNet-V2 100 @ 73%
|
307 |
+
* 4.5M param MobileNet-V2 110d @ 75%
|
308 |
+
* 6.1M param MobileNet-V2 140 @ 76.5%
|
309 |
+
* 5.8M param MobileNet-V2 120d @ 77.3%
|
310 |
+
|
311 |
+
### March 18, 2020
|
312 |
+
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
|
313 |
+
* Add RandAugment trained ResNeXt-50 32x4d weights with 79.8 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
|
314 |
+
|
315 |
+
### April 5, 2020
|
316 |
+
* Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite
|
317 |
+
* 3.5M param MobileNet-V2 100 @ 73%
|
318 |
+
* 4.5M param MobileNet-V2 110d @ 75%
|
319 |
+
* 6.1M param MobileNet-V2 140 @ 76.5%
|
320 |
+
* 5.8M param MobileNet-V2 120d @ 77.3%
|
321 |
+
|
322 |
+
### March 18, 2020
|
323 |
+
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
|
324 |
+
* Add RandAugment trained ResNeXt-50 32x4d weights with 79.8 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
|
325 |
+
|
326 |
+
### Feb 29, 2020
|
327 |
+
* New MobileNet-V3 Large weights trained from stratch with this code to 75.77% top-1
|
328 |
+
* IMPORTANT CHANGE - default weight init changed for all MobilenetV3 / EfficientNet / related models
|
329 |
+
* overall results similar to a bit better training from scratch on a few smaller models tried
|
330 |
+
* performance early in training seems consistently improved but less difference by end
|
331 |
+
* set `fix_group_fanout=False` in `_init_weight_goog` fn if you need to reproducte past behaviour
|
332 |
+
* Experimental LR noise feature added applies a random perturbation to LR each epoch in specified range of training
|
333 |
+
|
334 |
+
### Feb 18, 2020
|
335 |
+
* Big refactor of model layers and addition of several attention mechanisms. Several additions motivated by 'Compounding the Performance Improvements...' (https://arxiv.org/abs/2001.06268):
|
336 |
+
* Move layer/module impl into `layers` subfolder/module of `models` and organize in a more granular fashion
|
337 |
+
* ResNet downsample paths now properly support dilation (output stride != 32) for avg_pool ('D' variant) and 3x3 (SENets) networks
|
338 |
+
* Add Selective Kernel Nets on top of ResNet base, pretrained weights
|
339 |
+
* skresnet18 - 73% top-1
|
340 |
+
* skresnet34 - 76.9% top-1
|
341 |
+
* skresnext50_32x4d (equiv to SKNet50) - 80.2% top-1
|
342 |
+
* ECA and CECA (circular padding) attention layer contributed by [Chris Ha](https://github.com/VRandme)
|
343 |
+
* CBAM attention experiment (not the best results so far, may remove)
|
344 |
+
* Attention factory to allow dynamically selecting one of SE, ECA, CBAM in the `.se` position for all ResNets
|
345 |
+
* Add DropBlock and DropPath (formerly DropConnect for EfficientNet/MobileNetv3) support to all ResNet variants
|
346 |
+
* Full dataset results updated that incl NoisyStudent weights and 2 of the 3 SK weights
|
347 |
+
|
348 |
+
### Feb 12, 2020
|
349 |
+
* Add EfficientNet-L2 and B0-B7 NoisyStudent weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)
|
350 |
+
|
351 |
+
### Feb 6, 2020
|
352 |
+
* Add RandAugment trained EfficientNet-ES (EdgeTPU-Small) weights with 78.1 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
|
353 |
+
|
354 |
+
### Feb 1/2, 2020
|
355 |
+
* Port new EfficientNet-B8 (RandAugment) weights, these are different than the B8 AdvProp, different input normalization.
|
356 |
+
* Update results csv files on all models for ImageNet validation and three other test sets
|
357 |
+
* Push PyPi package update
|
358 |
+
|
359 |
+
### Jan 31, 2020
|
360 |
+
* Update ResNet50 weights with a new 79.038 result from further JSD / AugMix experiments. Full command line for reproduction in training section below.
|
361 |
+
|
362 |
+
### Jan 11/12, 2020
|
363 |
+
* Master may be a bit unstable wrt to training, these changes have been tested but not all combos
|
364 |
+
* Implementations of AugMix added to existing RA and AA. Including numerous supporting pieces like JSD loss (Jensen-Shannon divergence + CE), and AugMixDataset
|
365 |
+
* SplitBatchNorm adaptation layer added for implementing Auxiliary BN as per AdvProp paper
|
366 |
+
* ResNet-50 AugMix trained model w/ 79% top-1 added
|
367 |
+
* `seresnext26tn_32x4d` - 77.99 top-1, 93.75 top-5 added to tiered experiment, higher img/s than 't' and 'd'
|
368 |
+
|
369 |
+
### Jan 3, 2020
|
370 |
+
* Add RandAugment trained EfficientNet-B0 weight with 77.7 top-1. Trained by [Michael Klachko](https://github.com/michaelklachko) with this code and recent hparams (see Training section)
|
371 |
+
* Add `avg_checkpoints.py` script for post training weight averaging and update all scripts with header docstrings and shebangs.
|
372 |
+
|
373 |
+
### Dec 30, 2019
|
374 |
+
* Merge [Dushyant Mehta's](https://github.com/mehtadushy) PR for SelecSLS (Selective Short and Long Range Skip Connections) networks. Good GPU memory consumption and throughput. Original: https://github.com/mehtadushy/SelecSLS-Pytorch
|
375 |
+
|
376 |
+
### Dec 28, 2019
|
377 |
+
* Add new model weights and training hparams (see Training Hparams section)
|
378 |
+
* `efficientnet_b3` - 81.5 top-1, 95.7 top-5 at default res/crop, 81.9, 95.8 at 320x320 1.0 crop-pct
|
379 |
+
* trained with RandAugment, ended up with an interesting but less than perfect result (see training section)
|
380 |
+
* `seresnext26d_32x4d`- 77.6 top-1, 93.6 top-5
|
381 |
+
* deep stem (32, 32, 64), avgpool downsample
|
382 |
+
* stem/dowsample from bag-of-tricks paper
|
383 |
+
* `seresnext26t_32x4d`- 78.0 top-1, 93.7 top-5
|
384 |
+
* deep tiered stem (24, 48, 64), avgpool downsample (a modified 'D' variant)
|
385 |
+
* stem sizing mods from Jeremy Howard and fastai devs discussing ResNet architecture experiments
|
386 |
+
|
387 |
+
### Dec 23, 2019
|
388 |
+
* Add RandAugment trained MixNet-XL weights with 80.48 top-1.
|
389 |
+
* `--dist-bn` argument added to train.py, will distribute BN stats between nodes after each train epoch, before eval
|
390 |
+
|
391 |
+
### Dec 4, 2019
|
392 |
+
* Added weights from the first training from scratch of an EfficientNet (B2) with my new RandAugment implementation. Much better than my previous B2 and very close to the official AdvProp ones (80.4 top-1, 95.08 top-5).
|
393 |
+
|
394 |
+
### Nov 29, 2019
|
395 |
+
* Brought EfficientNet and MobileNetV3 up to date with my https://github.com/rwightman/gen-efficientnet-pytorch code. Torchscript and ONNX export compat excluded.
|
396 |
+
* AdvProp weights added
|
397 |
+
* Official TF MobileNetv3 weights added
|
398 |
+
* EfficientNet and MobileNetV3 hook based 'feature extraction' classes added. Will serve as basis for using models as backbones in obj detection/segmentation tasks. Lots more to be done here...
|
399 |
+
* HRNet classification models and weights added from https://github.com/HRNet/HRNet-Image-Classification
|
400 |
+
* Consistency in global pooling, `reset_classifer`, and `forward_features` across models
|
401 |
+
* `forward_features` always returns unpooled feature maps now
|
402 |
+
* Reasonable chance I broke something... let me know
|
403 |
+
|
404 |
+
### Nov 22, 2019
|
405 |
+
* Add ImageNet training RandAugment implementation alongside AutoAugment. PyTorch Transform compatible format, using PIL. Currently training two EfficientNet models from scratch with promising results... will update.
|
406 |
+
* `drop-connect` cmd line arg finally added to `train.py`, no need to hack model fns. Works for efficientnet/mobilenetv3 based models, ignored otherwise.
|
docs/changes.md
ADDED
@@ -0,0 +1,314 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Recent Changes
|
2 |
+
### Jan 5, 2023
|
3 |
+
* ConvNeXt-V2 models and weights added to existing `convnext.py`
|
4 |
+
* Paper: [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](http://arxiv.org/abs/2301.00808)
|
5 |
+
* Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
|
6 |
+
|
7 |
+
### Dec 23, 2022 🎄☃
|
8 |
+
* Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
|
9 |
+
* NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
|
10 |
+
* Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
|
11 |
+
* More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
|
12 |
+
* More ImageNet-12k (subset of 22k) pretrain models popping up:
|
13 |
+
* `efficientnet_b5.in12k_ft_in1k` - 85.9 @ 448x448
|
14 |
+
* `vit_medium_patch16_gap_384.in12k_ft_in1k` - 85.5 @ 384x384
|
15 |
+
* `vit_medium_patch16_gap_256.in12k_ft_in1k` - 84.5 @ 256x256
|
16 |
+
* `convnext_nano.in12k_ft_in1k` - 82.9 @ 288x288
|
17 |
+
|
18 |
+
### Dec 8, 2022
|
19 |
+
* Add 'EVA l' to `vision_transformer.py`, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
|
20 |
+
* original source: https://github.com/baaivision/EVA
|
21 |
+
|
22 |
+
| model | top1 | param_count | gmac | macts | hub |
|
23 |
+
|:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------|
|
24 |
+
| eva_large_patch14_336.in22k_ft_in22k_in1k | 89.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
|
25 |
+
| eva_large_patch14_336.in22k_ft_in1k | 88.7 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
|
26 |
+
| eva_large_patch14_196.in22k_ft_in22k_in1k | 88.6 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
|
27 |
+
| eva_large_patch14_196.in22k_ft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
|
28 |
+
|
29 |
+
### Dec 6, 2022
|
30 |
+
* Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to `beit.py`.
|
31 |
+
* original source: https://github.com/baaivision/EVA
|
32 |
+
* paper: https://arxiv.org/abs/2211.07636
|
33 |
+
|
34 |
+
| model | top1 | param_count | gmac | macts | hub |
|
35 |
+
|:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------|
|
36 |
+
| eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | [link](https://huggingface.co/BAAI/EVA) |
|
37 |
+
| eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.6 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
|
38 |
+
| eva_giant_patch14_336.clip_ft_in1k | 89.4 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
|
39 |
+
| eva_giant_patch14_224.clip_ft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | [link](https://huggingface.co/BAAI/EVA) |
|
40 |
+
|
41 |
+
### Dec 5, 2022
|
42 |
+
|
43 |
+
* Pre-release (`0.8.0dev0`) of multi-weight support (`model_arch.pretrained_tag`). Install with `pip install --pre timm`
|
44 |
+
* vision_transformer, maxvit, convnext are the first three model impl w/ support
|
45 |
+
* model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
|
46 |
+
* bugs are likely, but I need feedback so please try it out
|
47 |
+
* if stability is needed, please use 0.6.x pypi releases or clone from [0.6.x branch](https://github.com/rwightman/pytorch-image-models/tree/0.6.x)
|
48 |
+
* Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use `--torchcompile` argument
|
49 |
+
* Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
|
50 |
+
* Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
|
51 |
+
|
52 |
+
| model | top1 | param_count | gmac | macts | hub |
|
53 |
+
|:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------|
|
54 |
+
| vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k) |
|
55 |
+
| vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.openai_ft_in12k_in1k) |
|
56 |
+
| vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k) |
|
57 |
+
| vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k) |
|
58 |
+
| vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k) |
|
59 |
+
| vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1k) |
|
60 |
+
| vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in1k) |
|
61 |
+
| vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in1k) |
|
62 |
+
| vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in1k) |
|
63 |
+
| vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in1k) |
|
64 |
+
| vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k) |
|
65 |
+
| vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k) |
|
66 |
+
| vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k) |
|
67 |
+
| vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k) |
|
68 |
+
| vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k) |
|
69 |
+
| vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k) |
|
70 |
+
| vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k) |
|
71 |
+
| vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k) |
|
72 |
+
| vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k) |
|
73 |
+
| vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k) |
|
74 |
+
| vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k) |
|
75 |
+
| vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k) |
|
76 |
+
| vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k) |
|
77 |
+
| vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k) |
|
78 |
+
|
79 |
+
* Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
|
80 |
+
* There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
|
81 |
+
|
82 |
+
| model | top1 | param_count | gmac | macts | hub |
|
83 |
+
|:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------|
|
84 |
+
| maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) |
|
85 |
+
| maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) |
|
86 |
+
| maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) |
|
87 |
+
| maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) |
|
88 |
+
| maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) |
|
89 |
+
| maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) |
|
90 |
+
| maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in1k) |
|
91 |
+
| maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in1k) |
|
92 |
+
| maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in1k) |
|
93 |
+
| maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in1k) |
|
94 |
+
| maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | [link](https://huggingface.co/timm/maxvit_small_tf_512.in1k) |
|
95 |
+
| maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | [link](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) |
|
96 |
+
| maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | [link](https://huggingface.co/timm/maxvit_small_tf_384.in1k) |
|
97 |
+
| maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | [link](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) |
|
98 |
+
| maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | [link](https://huggingface.co/timm/maxvit_large_tf_224.in1k) |
|
99 |
+
| maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | [link](https://huggingface.co/timm/maxvit_base_tf_224.in1k) |
|
100 |
+
| maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | [link](https://huggingface.co/timm/maxvit_small_tf_224.in1k) |
|
101 |
+
| maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | [link](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) |
|
102 |
+
|
103 |
+
### Oct 15, 2022
|
104 |
+
* Train and validation script enhancements
|
105 |
+
* Non-GPU (ie CPU) device support
|
106 |
+
* SLURM compatibility for train script
|
107 |
+
* HF datasets support (via ReaderHfds)
|
108 |
+
* TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
|
109 |
+
* in_chans !=3 support for scripts / loader
|
110 |
+
* Adan optimizer
|
111 |
+
* Can enable per-step LR scheduling via args
|
112 |
+
* Dataset 'parsers' renamed to 'readers', more descriptive of purpose
|
113 |
+
* AMP args changed, APEX via `--amp-impl apex`, bfloat16 supportedf via `--amp-dtype bfloat16`
|
114 |
+
* main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
|
115 |
+
* master -> main branch rename
|
116 |
+
|
117 |
+
### Oct 10, 2022
|
118 |
+
* More weights in `maxxvit` series, incl first ConvNeXt block based `coatnext` and `maxxvit` experiments:
|
119 |
+
* `coatnext_nano_rw_224` - 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm)
|
120 |
+
* `maxxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
|
121 |
+
* `maxvit_rmlp_small_rw_224` - 84.5 @ 224, 85.1 @ 320 (G)
|
122 |
+
* `maxxvit_rmlp_small_rw_256` - 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
|
123 |
+
* `coatnet_rmlp_2_rw_224` - 84.6 @ 224, 85 @ 320 (T)
|
124 |
+
* NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit -- some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.
|
125 |
+
|
126 |
+
### Sept 23, 2022
|
127 |
+
* LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)
|
128 |
+
* vit_base_patch32_224_clip_laion2b
|
129 |
+
* vit_large_patch14_224_clip_laion2b
|
130 |
+
* vit_huge_patch14_224_clip_laion2b
|
131 |
+
* vit_giant_patch14_224_clip_laion2b
|
132 |
+
|
133 |
+
### Sept 7, 2022
|
134 |
+
* Hugging Face [`timm` docs](https://huggingface.co/docs/hub/timm) home now exists, look for more here in the future
|
135 |
+
* Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
|
136 |
+
* Add more weights in `maxxvit` series incl a `pico` (7.5M params, 1.9 GMACs), two `tiny` variants:
|
137 |
+
* `maxvit_rmlp_pico_rw_256` - 80.5 @ 256, 81.3 @ 320 (T)
|
138 |
+
* `maxvit_tiny_rw_224` - 83.5 @ 224 (G)
|
139 |
+
* `maxvit_rmlp_tiny_rw_256` - 84.2 @ 256, 84.8 @ 320 (T)
|
140 |
+
|
141 |
+
### Aug 29, 2022
|
142 |
+
* MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
|
143 |
+
* `maxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.6 @ 320 (T)
|
144 |
+
|
145 |
+
### Aug 26, 2022
|
146 |
+
* CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697) `timm` original models
|
147 |
+
* both found in [`maxxvit.py`](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/maxxvit.py) model def, contains numerous experiments outside scope of original papers
|
148 |
+
* an unfinished Tensorflow version from MaxVit authors can be found https://github.com/google-research/maxvit
|
149 |
+
* Initial CoAtNet and MaxVit timm pretrained weights (working on more):
|
150 |
+
* `coatnet_nano_rw_224` - 81.7 @ 224 (T)
|
151 |
+
* `coatnet_rmlp_nano_rw_224` - 82.0 @ 224, 82.8 @ 320 (T)
|
152 |
+
* `coatnet_0_rw_224` - 82.4 (T) -- NOTE timm '0' coatnets have 2 more 3rd stage blocks
|
153 |
+
* `coatnet_bn_0_rw_224` - 82.4 (T)
|
154 |
+
* `maxvit_nano_rw_256` - 82.9 @ 256 (T)
|
155 |
+
* `coatnet_rmlp_1_rw_224` - 83.4 @ 224, 84 @ 320 (T)
|
156 |
+
* `coatnet_1_rw_224` - 83.6 @ 224 (G)
|
157 |
+
* (T) = TPU trained with `bits_and_tpu` branch training code, (G) = GPU trained
|
158 |
+
* GCVit (weights adapted from https://github.com/NVlabs/GCVit, code 100% `timm` re-write for license purposes)
|
159 |
+
* MViT-V2 (multi-scale vit, adapted from https://github.com/facebookresearch/mvit)
|
160 |
+
* EfficientFormer (adapted from https://github.com/snap-research/EfficientFormer)
|
161 |
+
* PyramidVisionTransformer-V2 (adapted from https://github.com/whai362/PVT)
|
162 |
+
* 'Fast Norm' support for LayerNorm and GroupNorm that avoids float32 upcast w/ AMP (uses APEX LN if available for further boost)
|
163 |
+
|
164 |
+
|
165 |
+
### Aug 15, 2022
|
166 |
+
* ConvNeXt atto weights added
|
167 |
+
* `convnext_atto` - 75.7 @ 224, 77.0 @ 288
|
168 |
+
* `convnext_atto_ols` - 75.9 @ 224, 77.2 @ 288
|
169 |
+
|
170 |
+
### Aug 5, 2022
|
171 |
+
* More custom ConvNeXt smaller model defs with weights
|
172 |
+
* `convnext_femto` - 77.5 @ 224, 78.7 @ 288
|
173 |
+
* `convnext_femto_ols` - 77.9 @ 224, 78.9 @ 288
|
174 |
+
* `convnext_pico` - 79.5 @ 224, 80.4 @ 288
|
175 |
+
* `convnext_pico_ols` - 79.5 @ 224, 80.5 @ 288
|
176 |
+
* `convnext_nano_ols` - 80.9 @ 224, 81.6 @ 288
|
177 |
+
* Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)
|
178 |
+
|
179 |
+
### July 28, 2022
|
180 |
+
* Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks [Hugo Touvron](https://github.com/TouvronHugo)!
|
181 |
+
|
182 |
+
### July 27, 2022
|
183 |
+
* All runtime benchmark and validation result csv files are up-to-date!
|
184 |
+
* A few more weights & model defs added:
|
185 |
+
* `darknetaa53` - 79.8 @ 256, 80.5 @ 288
|
186 |
+
* `convnext_nano` - 80.8 @ 224, 81.5 @ 288
|
187 |
+
* `cs3sedarknet_l` - 81.2 @ 256, 81.8 @ 288
|
188 |
+
* `cs3darknet_x` - 81.8 @ 256, 82.2 @ 288
|
189 |
+
* `cs3sedarknet_x` - 82.2 @ 256, 82.7 @ 288
|
190 |
+
* `cs3edgenet_x` - 82.2 @ 256, 82.7 @ 288
|
191 |
+
* `cs3se_edgenet_x` - 82.8 @ 256, 83.5 @ 320
|
192 |
+
* `cs3*` weights above all trained on TPU w/ `bits_and_tpu` branch. Thanks to TRC program!
|
193 |
+
* Add output_stride=8 and 16 support to ConvNeXt (dilation)
|
194 |
+
* deit3 models not being able to resize pos_emb fixed
|
195 |
+
* Version 0.6.7 PyPi release (/w above bug fixes and new weighs since 0.6.5)
|
196 |
+
|
197 |
+
### July 8, 2022
|
198 |
+
More models, more fixes
|
199 |
+
* Official research models (w/ weights) added:
|
200 |
+
* EdgeNeXt from (https://github.com/mmaaz60/EdgeNeXt)
|
201 |
+
* MobileViT-V2 from (https://github.com/apple/ml-cvnets)
|
202 |
+
* DeiT III (Revenge of the ViT) from (https://github.com/facebookresearch/deit)
|
203 |
+
* My own models:
|
204 |
+
* Small `ResNet` defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
|
205 |
+
* `CspNet` refactored with dataclass config, simplified CrossStage3 (`cs3`) option. These are closer to YOLO-v5+ backbone defs.
|
206 |
+
* More relative position vit fiddling. Two `srelpos` (shared relative position) models trained, and a medium w/ class token.
|
207 |
+
* Add an alternate downsample mode to EdgeNeXt and train a `small` model. Better than original small, but not their new USI trained weights.
|
208 |
+
* My own model weight results (all ImageNet-1k training)
|
209 |
+
* `resnet10t` - 66.5 @ 176, 68.3 @ 224
|
210 |
+
* `resnet14t` - 71.3 @ 176, 72.3 @ 224
|
211 |
+
* `resnetaa50` - 80.6 @ 224 , 81.6 @ 288
|
212 |
+
* `darknet53` - 80.0 @ 256, 80.5 @ 288
|
213 |
+
* `cs3darknet_m` - 77.0 @ 256, 77.6 @ 288
|
214 |
+
* `cs3darknet_focus_m` - 76.7 @ 256, 77.3 @ 288
|
215 |
+
* `cs3darknet_l` - 80.4 @ 256, 80.9 @ 288
|
216 |
+
* `cs3darknet_focus_l` - 80.3 @ 256, 80.9 @ 288
|
217 |
+
* `vit_srelpos_small_patch16_224` - 81.1 @ 224, 82.1 @ 320
|
218 |
+
* `vit_srelpos_medium_patch16_224` - 82.3 @ 224, 83.1 @ 320
|
219 |
+
* `vit_relpos_small_patch16_cls_224` - 82.6 @ 224, 83.6 @ 320
|
220 |
+
* `edgnext_small_rw` - 79.6 @ 224, 80.4 @ 320
|
221 |
+
* `cs3`, `darknet`, and `vit_*relpos` weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
|
222 |
+
* Hugging Face Hub support fixes verified, demo notebook TBA
|
223 |
+
* Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
|
224 |
+
* Add support to change image extensions scanned by `timm` datasets/parsers. See (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
|
225 |
+
* Default ConvNeXt LayerNorm impl to use `F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2)` via `LayerNorm2d` in all cases.
|
226 |
+
* a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
|
227 |
+
* previous impl exists as `LayerNormExp2d` in `models/layers/norm.py`
|
228 |
+
* Numerous bug fixes
|
229 |
+
* Currently testing for imminent PyPi 0.6.x release
|
230 |
+
* LeViT pretraining of larger models still a WIP, they don't train well / easily without distillation. Time to add distill support (finally)?
|
231 |
+
* ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) ...
|
232 |
+
|
233 |
+
### May 13, 2022
|
234 |
+
* Official Swin-V2 models and weights added from (https://github.com/microsoft/Swin-Transformer). Cleaned up to support torchscript.
|
235 |
+
* Some refactoring for existing `timm` Swin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects.
|
236 |
+
* More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)
|
237 |
+
* `vit_relpos_small_patch16_224` - 81.5 @ 224, 82.5 @ 320 -- rel pos, layer scale, no class token, avg pool
|
238 |
+
* `vit_relpos_medium_patch16_rpn_224` - 82.3 @ 224, 83.1 @ 320 -- rel pos + res-post-norm, no class token, avg pool
|
239 |
+
* `vit_relpos_medium_patch16_224` - 82.5 @ 224, 83.3 @ 320 -- rel pos, layer scale, no class token, avg pool
|
240 |
+
* `vit_relpos_base_patch16_gapcls_224` - 82.8 @ 224, 83.9 @ 320 -- rel pos, layer scale, class token, avg pool (by mistake)
|
241 |
+
* Bring 512 dim, 8-head 'medium' ViT model variant back to life (after using in a pre DeiT 'small' model for first ViT impl back in 2020)
|
242 |
+
* Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials
|
243 |
+
* Sequencer2D impl (https://arxiv.org/abs/2205.01972), added via PR from author (https://github.com/okojoalg)
|
244 |
+
|
245 |
+
### May 2, 2022
|
246 |
+
* Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (`vision_transformer_relpos.py`) and Residual Post-Norm branches (from Swin-V2) (`vision_transformer*.py`)
|
247 |
+
* `vit_relpos_base_patch32_plus_rpn_256` - 79.5 @ 256, 80.6 @ 320 -- rel pos + extended width + res-post-norm, no class token, avg pool
|
248 |
+
* `vit_relpos_base_patch16_224` - 82.5 @ 224, 83.6 @ 320 -- rel pos, layer scale, no class token, avg pool
|
249 |
+
* `vit_base_patch16_rpn_224` - 82.3 @ 224 -- rel pos + res-post-norm, no class token, avg pool
|
250 |
+
* Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie `How to Train Your ViT`)
|
251 |
+
* `vit_*` models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).
|
252 |
+
|
253 |
+
### April 22, 2022
|
254 |
+
* `timm` models are now officially supported in [fast.ai](https://www.fast.ai/)! Just in time for the new Practical Deep Learning course. `timmdocs` documentation link updated to [timm.fast.ai](http://timm.fast.ai/).
|
255 |
+
* Two more model weights added in the TPU trained [series](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights). Some In22k pretrain still in progress.
|
256 |
+
* `seresnext101d_32x8d` - 83.69 @ 224, 84.35 @ 288
|
257 |
+
* `seresnextaa101d_32x8d` (anti-aliased w/ AvgPool2d) - 83.85 @ 224, 84.57 @ 288
|
258 |
+
|
259 |
+
### March 23, 2022
|
260 |
+
* Add `ParallelBlock` and `LayerScale` option to base vit models to support model configs in [Three things everyone should know about ViT](https://arxiv.org/abs/2203.09795)
|
261 |
+
* `convnext_tiny_hnf` (head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.
|
262 |
+
|
263 |
+
### March 21, 2022
|
264 |
+
* Merge `norm_norm_norm`. **IMPORTANT** this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch [`0.5.x`](https://github.com/rwightman/pytorch-image-models/tree/0.5.x) or a previous 0.5.x release can be used if stability is required.
|
265 |
+
* Significant weights update (all TPU trained) as described in this [release](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights)
|
266 |
+
* `regnety_040` - 82.3 @ 224, 82.96 @ 288
|
267 |
+
* `regnety_064` - 83.0 @ 224, 83.65 @ 288
|
268 |
+
* `regnety_080` - 83.17 @ 224, 83.86 @ 288
|
269 |
+
* `regnetv_040` - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
|
270 |
+
* `regnetv_064` - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
|
271 |
+
* `regnetz_040` - 83.67 @ 256, 84.25 @ 320
|
272 |
+
* `regnetz_040h` - 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)
|
273 |
+
* `resnetv2_50d_gn` - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
|
274 |
+
* `resnetv2_50d_evos` 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
|
275 |
+
* `regnetz_c16_evos` - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
|
276 |
+
* `regnetz_d8_evos` - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
|
277 |
+
* `xception41p` - 82 @ 299 (timm pre-act)
|
278 |
+
* `xception65` - 83.17 @ 299
|
279 |
+
* `xception65p` - 83.14 @ 299 (timm pre-act)
|
280 |
+
* `resnext101_64x4d` - 82.46 @ 224, 83.16 @ 288
|
281 |
+
* `seresnext101_32x8d` - 83.57 @ 224, 84.270 @ 288
|
282 |
+
* `resnetrs200` - 83.85 @ 256, 84.44 @ 320
|
283 |
+
* HuggingFace hub support fixed w/ initial groundwork for allowing alternative 'config sources' for pretrained model definitions and weights (generic local file / remote url support soon)
|
284 |
+
* SwinTransformer-V2 implementation added. Submitted by [Christoph Reich](https://github.com/ChristophReich1996). Training experiments and model changes by myself are ongoing so expect compat breaks.
|
285 |
+
* Swin-S3 (AutoFormerV2) models / weights added from https://github.com/microsoft/Cream/tree/main/AutoFormerV2
|
286 |
+
* MobileViT models w/ weights adapted from https://github.com/apple/ml-cvnets
|
287 |
+
* PoolFormer models w/ weights adapted from https://github.com/sail-sg/poolformer
|
288 |
+
* VOLO models w/ weights adapted from https://github.com/sail-sg/volo
|
289 |
+
* Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc
|
290 |
+
* Enhance support for alternate norm + act ('NormAct') layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception
|
291 |
+
* Grouped conv support added to EfficientNet family
|
292 |
+
* Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler
|
293 |
+
* Gradient checkpointing support added to many models
|
294 |
+
* `forward_head(x, pre_logits=False)` fn added to all models to allow separate calls of `forward_features` + `forward_head`
|
295 |
+
* All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `foward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
|
296 |
+
|
297 |
+
### Feb 2, 2022
|
298 |
+
* [Chris Hughes](https://github.com/Chris-hughes10) posted an exhaustive run through of `timm` on his blog yesterday. Well worth a read. [Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055)
|
299 |
+
* I'm currently prepping to merge the `norm_norm_norm` branch back to master (ver 0.6.x) in next week or so.
|
300 |
+
* The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware `pip install git+https://github.com/rwightman/pytorch-image-models` installs!
|
301 |
+
* `0.5.x` releases and a `0.5.x` branch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.
|
302 |
+
|
303 |
+
### Jan 14, 2022
|
304 |
+
* Version 0.5.4 w/ release to be pushed to pypi. It's been a while since last pypi update and riskier changes will be merged to main branch soon....
|
305 |
+
* Add ConvNeXT models /w weights from official impl (https://github.com/facebookresearch/ConvNeXt), a few perf tweaks, compatible with timm features
|
306 |
+
* Tried training a few small (~1.8-3M param) / mobile optimized models, a few are good so far, more on the way...
|
307 |
+
* `mnasnet_small` - 65.6 top-1
|
308 |
+
* `mobilenetv2_050` - 65.9
|
309 |
+
* `lcnet_100/075/050` - 72.1 / 68.8 / 63.1
|
310 |
+
* `semnasnet_075` - 73
|
311 |
+
* `fbnetv3_b/d/g` - 79.1 / 79.7 / 82.0
|
312 |
+
* TinyNet models added by [rsomani95](https://github.com/rsomani95)
|
313 |
+
* LCNet added via MobileNetV3 architecture
|
314 |
+
|
docs/feature_extraction.md
ADDED
@@ -0,0 +1,174 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Feature Extraction
|
2 |
+
|
3 |
+
All of the models in `timm` have consistent mechanisms for obtaining various types of features from the model for tasks besides classification.
|
4 |
+
|
5 |
+
## Penultimate Layer Features (Pre-Classifier Features)
|
6 |
+
|
7 |
+
The features from the penultimate model layer can be obtained in several ways without requiring model surgery (although feel free to do surgery). One must first decide if they want pooled or un-pooled features.
|
8 |
+
|
9 |
+
### Unpooled
|
10 |
+
|
11 |
+
There are three ways to obtain unpooled features.
|
12 |
+
|
13 |
+
Without modifying the network, one can call `model.forward_features(input)` on any model instead of the usual `model(input)`. This will bypass the head classifier and global pooling for networks.
|
14 |
+
|
15 |
+
If one wants to explicitly modify the network to return unpooled features, they can either create the model without a classifier and pooling, or remove it later. Both paths remove the parameters associated with the classifier from the network.
|
16 |
+
|
17 |
+
#### forward_features()
|
18 |
+
```python hl_lines="3 6"
|
19 |
+
import torch
|
20 |
+
import timm
|
21 |
+
m = timm.create_model('xception41', pretrained=True)
|
22 |
+
o = m(torch.randn(2, 3, 299, 299))
|
23 |
+
print(f'Original shape: {o.shape}')
|
24 |
+
o = m.forward_features(torch.randn(2, 3, 299, 299))
|
25 |
+
print(f'Unpooled shape: {o.shape}')
|
26 |
+
```
|
27 |
+
Output:
|
28 |
+
```text
|
29 |
+
Original shape: torch.Size([2, 1000])
|
30 |
+
Unpooled shape: torch.Size([2, 2048, 10, 10])
|
31 |
+
```
|
32 |
+
|
33 |
+
#### Create with no classifier and pooling
|
34 |
+
```python hl_lines="3"
|
35 |
+
import torch
|
36 |
+
import timm
|
37 |
+
m = timm.create_model('resnet50', pretrained=True, num_classes=0, global_pool='')
|
38 |
+
o = m(torch.randn(2, 3, 224, 224))
|
39 |
+
print(f'Unpooled shape: {o.shape}')
|
40 |
+
```
|
41 |
+
Output:
|
42 |
+
```text
|
43 |
+
Unpooled shape: torch.Size([2, 2048, 7, 7])
|
44 |
+
```
|
45 |
+
|
46 |
+
#### Remove it later
|
47 |
+
```python hl_lines="3 6"
|
48 |
+
import torch
|
49 |
+
import timm
|
50 |
+
m = timm.create_model('densenet121', pretrained=True)
|
51 |
+
o = m(torch.randn(2, 3, 224, 224))
|
52 |
+
print(f'Original shape: {o.shape}')
|
53 |
+
m.reset_classifier(0, '')
|
54 |
+
o = m(torch.randn(2, 3, 224, 224))
|
55 |
+
print(f'Unpooled shape: {o.shape}')
|
56 |
+
```
|
57 |
+
Output:
|
58 |
+
```text
|
59 |
+
Original shape: torch.Size([2, 1000])
|
60 |
+
Unpooled shape: torch.Size([2, 1024, 7, 7])
|
61 |
+
```
|
62 |
+
|
63 |
+
### Pooled
|
64 |
+
|
65 |
+
To modify the network to return pooled features, one can use `forward_features()` and pool/flatten the result themselves, or modify the network like above but keep pooling intact.
|
66 |
+
|
67 |
+
#### Create with no classifier
|
68 |
+
```python hl_lines="3"
|
69 |
+
import torch
|
70 |
+
import timm
|
71 |
+
m = timm.create_model('resnet50', pretrained=True, num_classes=0)
|
72 |
+
o = m(torch.randn(2, 3, 224, 224))
|
73 |
+
print(f'Pooled shape: {o.shape}')
|
74 |
+
```
|
75 |
+
Output:
|
76 |
+
```text
|
77 |
+
Pooled shape: torch.Size([2, 2048])
|
78 |
+
```
|
79 |
+
|
80 |
+
#### Remove it later
|
81 |
+
```python hl_lines="3 6"
|
82 |
+
import torch
|
83 |
+
import timm
|
84 |
+
m = timm.create_model('ese_vovnet19b_dw', pretrained=True)
|
85 |
+
o = m(torch.randn(2, 3, 224, 224))
|
86 |
+
print(f'Original shape: {o.shape}')
|
87 |
+
m.reset_classifier(0)
|
88 |
+
o = m(torch.randn(2, 3, 224, 224))
|
89 |
+
print(f'Pooled shape: {o.shape}')
|
90 |
+
```
|
91 |
+
Output:
|
92 |
+
```text
|
93 |
+
Original shape: torch.Size([2, 1000])
|
94 |
+
Pooled shape: torch.Size([2, 1024])
|
95 |
+
```
|
96 |
+
|
97 |
+
|
98 |
+
## Multi-scale Feature Maps (Feature Pyramid)
|
99 |
+
|
100 |
+
Object detection, segmentation, keypoint, and a variety of dense pixel tasks require access to feature maps from the backbone network at multiple scales. This is often done by modifying the original classification network. Since each network varies quite a bit in structure, it's not uncommon to see only a few backbones supported in any given obj detection or segmentation library.
|
101 |
+
|
102 |
+
`timm` allows a consistent interface for creating any of the included models as feature backbones that output feature maps for selected levels.
|
103 |
+
|
104 |
+
A feature backbone can be created by adding the argument `features_only=True` to any `create_model` call. By default 5 strides will be output from most models (not all have that many), with the first starting at 2 (some start at 1 or 4).
|
105 |
+
|
106 |
+
### Create a feature map extraction model
|
107 |
+
```python hl_lines="3"
|
108 |
+
import torch
|
109 |
+
import timm
|
110 |
+
m = timm.create_model('resnest26d', features_only=True, pretrained=True)
|
111 |
+
o = m(torch.randn(2, 3, 224, 224))
|
112 |
+
for x in o:
|
113 |
+
print(x.shape)
|
114 |
+
```
|
115 |
+
Output:
|
116 |
+
```text
|
117 |
+
torch.Size([2, 64, 112, 112])
|
118 |
+
torch.Size([2, 256, 56, 56])
|
119 |
+
torch.Size([2, 512, 28, 28])
|
120 |
+
torch.Size([2, 1024, 14, 14])
|
121 |
+
torch.Size([2, 2048, 7, 7])
|
122 |
+
```
|
123 |
+
|
124 |
+
### Query the feature information
|
125 |
+
|
126 |
+
After a feature backbone has been created, it can be queried to provide channel or resolution reduction information to the downstream heads without requiring static config or hardcoded constants. The `.feature_info` attribute is a class encapsulating the information about the feature extraction points.
|
127 |
+
|
128 |
+
```python hl_lines="3 4"
|
129 |
+
import torch
|
130 |
+
import timm
|
131 |
+
m = timm.create_model('regnety_032', features_only=True, pretrained=True)
|
132 |
+
print(f'Feature channels: {m.feature_info.channels()}')
|
133 |
+
o = m(torch.randn(2, 3, 224, 224))
|
134 |
+
for x in o:
|
135 |
+
print(x.shape)
|
136 |
+
```
|
137 |
+
Output:
|
138 |
+
```text
|
139 |
+
Feature channels: [32, 72, 216, 576, 1512]
|
140 |
+
torch.Size([2, 32, 112, 112])
|
141 |
+
torch.Size([2, 72, 56, 56])
|
142 |
+
torch.Size([2, 216, 28, 28])
|
143 |
+
torch.Size([2, 576, 14, 14])
|
144 |
+
torch.Size([2, 1512, 7, 7])
|
145 |
+
```
|
146 |
+
|
147 |
+
### Select specific feature levels or limit the stride
|
148 |
+
|
149 |
+
There are two additional creation arguments impacting the output features.
|
150 |
+
|
151 |
+
* `out_indices` selects which indices to output
|
152 |
+
* `output_stride` limits the feature output stride of the network (also works in classification mode BTW)
|
153 |
+
|
154 |
+
`out_indices` is supported by all models, but not all models have the same index to feature stride mapping. Look at the code or check feature_info to compare. The out indices generally correspond to the `C(i+1)th` feature level (a `2^(i+1)` reduction). For most models, index 0 is the stride 2 features, and index 4 is stride 32.
|
155 |
+
|
156 |
+
`output_stride` is achieved by converting layers to use dilated convolutions. Doing so is not always straightforward, some networks only support `output_stride=32`.
|
157 |
+
|
158 |
+
```python hl_lines="3 4 5"
|
159 |
+
import torch
|
160 |
+
import timm
|
161 |
+
m = timm.create_model('ecaresnet101d', features_only=True, output_stride=8, out_indices=(2, 4), pretrained=True)
|
162 |
+
print(f'Feature channels: {m.feature_info.channels()}')
|
163 |
+
print(f'Feature reduction: {m.feature_info.reduction()}')
|
164 |
+
o = m(torch.randn(2, 3, 320, 320))
|
165 |
+
for x in o:
|
166 |
+
print(x.shape)
|
167 |
+
```
|
168 |
+
Output:
|
169 |
+
```text
|
170 |
+
Feature channels: [512, 2048]
|
171 |
+
Feature reduction: [8, 8]
|
172 |
+
torch.Size([2, 512, 40, 40])
|
173 |
+
torch.Size([2, 2048, 40, 40])
|
174 |
+
```
|
docs/index.md
ADDED
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Getting Started
|
2 |
+
|
3 |
+
## Welcome
|
4 |
+
|
5 |
+
Welcome to the `timm` documentation, a lean set of docs that covers the basics of `timm`.
|
6 |
+
|
7 |
+
For a more comprehensive set of docs (currently under development), please visit [timmdocs](http://timm.fast.ai) by [Aman Arora](https://github.com/amaarora).
|
8 |
+
|
9 |
+
## Install
|
10 |
+
|
11 |
+
The library can be installed with pip:
|
12 |
+
|
13 |
+
```
|
14 |
+
pip install timm
|
15 |
+
```
|
16 |
+
|
17 |
+
I update the PyPi (pip) packages when I'm confident there are no significant model regressions from previous releases. If you want to pip install the bleeding edge from GitHub, use:
|
18 |
+
```
|
19 |
+
pip install git+https://github.com/rwightman/pytorch-image-models.git
|
20 |
+
```
|
21 |
+
|
22 |
+
!!! info "Conda Environment"
|
23 |
+
All development and testing has been done in Conda Python 3 environments on Linux x86-64 systems, specifically 3.7, 3.8, 3.9, 3.10
|
24 |
+
|
25 |
+
Little to no care has been taken to be Python 2.x friendly and will not support it. If you run into any challenges running on Windows, or other OS, I'm definitely open to looking into those issues so long as it's in a reproducible (read Conda) environment.
|
26 |
+
|
27 |
+
PyTorch versions 1.9, 1.10, 1.11 have been tested with the latest versions of this code.
|
28 |
+
|
29 |
+
I've tried to keep the dependencies minimal, the setup is as per the PyTorch default install instructions for Conda:
|
30 |
+
```
|
31 |
+
conda create -n torch-env
|
32 |
+
conda activate torch-env
|
33 |
+
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
|
34 |
+
conda install pyyaml
|
35 |
+
```
|
36 |
+
|
37 |
+
## Load a Pretrained Model
|
38 |
+
|
39 |
+
Pretrained models can be loaded using `timm.create_model`
|
40 |
+
|
41 |
+
```python
|
42 |
+
import timm
|
43 |
+
|
44 |
+
m = timm.create_model('mobilenetv3_large_100', pretrained=True)
|
45 |
+
m.eval()
|
46 |
+
```
|
47 |
+
|
48 |
+
## List Models with Pretrained Weights
|
49 |
+
```python
|
50 |
+
import timm
|
51 |
+
from pprint import pprint
|
52 |
+
model_names = timm.list_models(pretrained=True)
|
53 |
+
pprint(model_names)
|
54 |
+
>>> ['adv_inception_v3',
|
55 |
+
'cspdarknet53',
|
56 |
+
'cspresnext50',
|
57 |
+
'densenet121',
|
58 |
+
'densenet161',
|
59 |
+
'densenet169',
|
60 |
+
'densenet201',
|
61 |
+
'densenetblur121d',
|
62 |
+
'dla34',
|
63 |
+
'dla46_c',
|
64 |
+
...
|
65 |
+
]
|
66 |
+
```
|
67 |
+
|
68 |
+
## List Model Architectures by Wildcard
|
69 |
+
```python
|
70 |
+
import timm
|
71 |
+
from pprint import pprint
|
72 |
+
model_names = timm.list_models('*resne*t*')
|
73 |
+
pprint(model_names)
|
74 |
+
>>> ['cspresnet50',
|
75 |
+
'cspresnet50d',
|
76 |
+
'cspresnet50w',
|
77 |
+
'cspresnext50',
|
78 |
+
...
|
79 |
+
]
|
80 |
+
```
|
docs/javascripts/tables.js
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
app.location$.subscribe(function() {
|
2 |
+
var tables = document.querySelectorAll("article table")
|
3 |
+
tables.forEach(function(table) {
|
4 |
+
new Tablesort(table)
|
5 |
+
})
|
6 |
+
})
|
docs/models.md
ADDED
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Model Summaries
|
2 |
+
|
3 |
+
The model architectures included come from a wide variety of sources. Sources, including papers, original impl ("reference code") that I rewrote / adapted, and PyTorch impl that I leveraged directly ("code") are listed below.
|
4 |
+
|
5 |
+
Most included models have pretrained weights. The weights are either:
|
6 |
+
|
7 |
+
1. from their original sources
|
8 |
+
2. ported by myself from their original impl in a different framework (e.g. Tensorflow models)
|
9 |
+
3. trained from scratch using the included training script
|
10 |
+
|
11 |
+
The validation results for the pretrained weights are [here](results.md)
|
12 |
+
|
13 |
+
A more exciting view (with pretty pictures) of the models within `timm` can be found at [paperswithcode](https://paperswithcode.com/lib/timm).
|
14 |
+
|
15 |
+
## Big Transfer ResNetV2 (BiT) [[resnetv2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnetv2.py)]
|
16 |
+
* Paper: `Big Transfer (BiT): General Visual Representation Learning` - https://arxiv.org/abs/1912.11370
|
17 |
+
* Reference code: https://github.com/google-research/big_transfer
|
18 |
+
|
19 |
+
## Cross-Stage Partial Networks [[cspnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/cspnet.py)]
|
20 |
+
* Paper: `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
|
21 |
+
* Reference impl: https://github.com/WongKinYiu/CrossStagePartialNetworks
|
22 |
+
|
23 |
+
## DenseNet [[densenet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/densenet.py)]
|
24 |
+
* Paper: `Densely Connected Convolutional Networks` - https://arxiv.org/abs/1608.06993
|
25 |
+
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
26 |
+
|
27 |
+
## DLA [[dla.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dla.py)]
|
28 |
+
* Paper: https://arxiv.org/abs/1707.06484
|
29 |
+
* Code: https://github.com/ucbdrive/dla
|
30 |
+
|
31 |
+
## Dual-Path Networks [[dpn.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dpn.py)]
|
32 |
+
* Paper: `Dual Path Networks` - https://arxiv.org/abs/1707.01629
|
33 |
+
* My PyTorch code: https://github.com/rwightman/pytorch-dpn-pretrained
|
34 |
+
* Reference code: https://github.com/cypw/DPNs
|
35 |
+
|
36 |
+
## GPU-Efficient Networks [[byobnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/byobnet.py)]
|
37 |
+
* Paper: `Neural Architecture Design for GPU-Efficient Networks` - https://arxiv.org/abs/2006.14090
|
38 |
+
* Reference code: https://github.com/idstcv/GPU-Efficient-Networks
|
39 |
+
|
40 |
+
## HRNet [[hrnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hrnet.py)]
|
41 |
+
* Paper: `Deep High-Resolution Representation Learning for Visual Recognition` - https://arxiv.org/abs/1908.07919
|
42 |
+
* Code: https://github.com/HRNet/HRNet-Image-Classification
|
43 |
+
|
44 |
+
## Inception-V3 [[inception_v3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v3.py)]
|
45 |
+
* Paper: `Rethinking the Inception Architecture for Computer Vision` - https://arxiv.org/abs/1512.00567
|
46 |
+
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
47 |
+
|
48 |
+
## Inception-V4 [[inception_v4.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v4.py)]
|
49 |
+
* Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
|
50 |
+
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
51 |
+
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
|
52 |
+
|
53 |
+
## Inception-ResNet-V2 [[inception_resnet_v2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_resnet_v2.py)]
|
54 |
+
* Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
|
55 |
+
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
56 |
+
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
|
57 |
+
|
58 |
+
## NASNet-A [[nasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/nasnet.py)]
|
59 |
+
* Papers: `Learning Transferable Architectures for Scalable Image Recognition` - https://arxiv.org/abs/1707.07012
|
60 |
+
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
61 |
+
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
|
62 |
+
|
63 |
+
## PNasNet-5 [[pnasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/pnasnet.py)]
|
64 |
+
* Papers: `Progressive Neural Architecture Search` - https://arxiv.org/abs/1712.00559
|
65 |
+
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
66 |
+
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
|
67 |
+
|
68 |
+
## EfficientNet [[efficientnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/efficientnet.py)]
|
69 |
+
|
70 |
+
* Papers:
|
71 |
+
* EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
|
72 |
+
* EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
|
73 |
+
* EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
|
74 |
+
* EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
|
75 |
+
* MixNet - https://arxiv.org/abs/1907.09595
|
76 |
+
* MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
|
77 |
+
* MobileNet-V2 - https://arxiv.org/abs/1801.04381
|
78 |
+
* FBNet-C - https://arxiv.org/abs/1812.03443
|
79 |
+
* Single-Path NAS - https://arxiv.org/abs/1904.02877
|
80 |
+
* My PyTorch code: https://github.com/rwightman/gen-efficientnet-pytorch
|
81 |
+
* Reference code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
|
82 |
+
|
83 |
+
## MobileNet-V3 [[mobilenetv3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mobilenetv3.py)]
|
84 |
+
* Paper: `Searching for MobileNetV3` - https://arxiv.org/abs/1905.02244
|
85 |
+
* Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
|
86 |
+
|
87 |
+
## RegNet [[regnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/regnet.py)]
|
88 |
+
* Paper: `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
|
89 |
+
* Reference code: https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
|
90 |
+
|
91 |
+
## RepVGG [[byobnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/byobnet.py)]
|
92 |
+
* Paper: `Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
|
93 |
+
* Reference code: https://github.com/DingXiaoH/RepVGG
|
94 |
+
|
95 |
+
## ResNet, ResNeXt [[resnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnet.py)]
|
96 |
+
|
97 |
+
* ResNet (V1B)
|
98 |
+
* Paper: `Deep Residual Learning for Image Recognition` - https://arxiv.org/abs/1512.03385
|
99 |
+
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
100 |
+
* ResNeXt
|
101 |
+
* Paper: `Aggregated Residual Transformations for Deep Neural Networks` - https://arxiv.org/abs/1611.05431
|
102 |
+
* Code: https://github.com/pytorch/vision/tree/master/torchvision/models
|
103 |
+
* 'Bag of Tricks' / Gluon C, D, E, S ResNet variants
|
104 |
+
* Paper: `Bag of Tricks for Image Classification with CNNs` - https://arxiv.org/abs/1812.01187
|
105 |
+
* Code: https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/resnetv1b.py
|
106 |
+
* Instagram pretrained / ImageNet tuned ResNeXt101
|
107 |
+
* Paper: `Exploring the Limits of Weakly Supervised Pretraining` - https://arxiv.org/abs/1805.00932
|
108 |
+
* Weights: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
|
109 |
+
* Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet and ResNeXts
|
110 |
+
* Paper: `Billion-scale semi-supervised learning for image classification` - https://arxiv.org/abs/1905.00546
|
111 |
+
* Weights: https://github.com/facebookresearch/semi-supervised-ImageNet1K-models (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
|
112 |
+
* Squeeze-and-Excitation Networks
|
113 |
+
* Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
|
114 |
+
* Code: Added to ResNet base, this is current version going forward, old `senet.py` is being deprecated
|
115 |
+
* ECAResNet (ECA-Net)
|
116 |
+
* Paper: `ECA-Net: Efficient Channel Attention for Deep CNN` - https://arxiv.org/abs/1910.03151v4
|
117 |
+
* Code: Added to ResNet base, ECA module contributed by @VRandme, reference https://github.com/BangguWu/ECANet
|
118 |
+
|
119 |
+
## Res2Net [[res2net.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/res2net.py)]
|
120 |
+
* Paper: `Res2Net: A New Multi-scale Backbone Architecture` - https://arxiv.org/abs/1904.01169
|
121 |
+
* Code: https://github.com/gasvn/Res2Net
|
122 |
+
|
123 |
+
## ResNeSt [[resnest.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnest.py)]
|
124 |
+
* Paper: `ResNeSt: Split-Attention Networks` - https://arxiv.org/abs/2004.08955
|
125 |
+
* Code: https://github.com/zhanghang1989/ResNeSt
|
126 |
+
|
127 |
+
## ReXNet [[rexnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/rexnet.py)]
|
128 |
+
* Paper: `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
|
129 |
+
* Code: https://github.com/clovaai/rexnet
|
130 |
+
|
131 |
+
## Selective-Kernel Networks [[sknet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/sknet.py)]
|
132 |
+
* Paper: `Selective-Kernel Networks` - https://arxiv.org/abs/1903.06586
|
133 |
+
* Code: https://github.com/implus/SKNet, https://github.com/clovaai/assembled-cnn
|
134 |
+
|
135 |
+
## SelecSLS [[selecsls.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/selecsls.py)]
|
136 |
+
* Paper: `XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera` - https://arxiv.org/abs/1907.00837
|
137 |
+
* Code: https://github.com/mehtadushy/SelecSLS-Pytorch
|
138 |
+
|
139 |
+
## Squeeze-and-Excitation Networks [[senet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/senet.py)]
|
140 |
+
NOTE: I am deprecating this version of the networks, the new ones are part of `resnet.py`
|
141 |
+
|
142 |
+
* Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
|
143 |
+
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
144 |
+
|
145 |
+
## TResNet [[tresnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tresnet.py)]
|
146 |
+
* Paper: `TResNet: High Performance GPU-Dedicated Architecture` - https://arxiv.org/abs/2003.13630
|
147 |
+
* Code: https://github.com/mrT23/TResNet
|
148 |
+
|
149 |
+
## VGG [[vgg.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vgg.py)]
|
150 |
+
* Paper: `Very Deep Convolutional Networks For Large-Scale Image Recognition` - https://arxiv.org/pdf/1409.1556.pdf
|
151 |
+
* Reference code: https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
|
152 |
+
|
153 |
+
## Vision Transformer [[vision_transformer.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py)]
|
154 |
+
* Paper: `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale` - https://arxiv.org/abs/2010.11929
|
155 |
+
* Reference code and pretrained weights: https://github.com/google-research/vision_transformer
|
156 |
+
|
157 |
+
## VovNet V2 and V1 [[vovnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vovnet.py)]
|
158 |
+
* Paper: `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
|
159 |
+
* Reference code: https://github.com/youngwanLEE/vovnet-detectron2
|
160 |
+
|
161 |
+
## Xception [[xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/xception.py)]
|
162 |
+
* Paper: `Xception: Deep Learning with Depthwise Separable Convolutions` - https://arxiv.org/abs/1610.02357
|
163 |
+
* Code: https://github.com/Cadene/pretrained-models.pytorch
|
164 |
+
|
165 |
+
## Xception (Modified Aligned, Gluon) [[gluon_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/gluon_xception.py)]
|
166 |
+
* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
|
167 |
+
* Reference code: https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo, https://github.com/jfzhang95/pytorch-deeplab-xception/
|
168 |
+
|
169 |
+
## Xception (Modified Aligned, TF) [[aligned_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/aligned_xception.py)]
|
170 |
+
* Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
|
171 |
+
* Reference code: https://github.com/tensorflow/models/tree/master/research/deeplab
|
docs/models/.pages
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
title: Model Pages
|
docs/models/.templates/code_snippets.md
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## How do I use this model on an image?
|
2 |
+
To load a pretrained model:
|
3 |
+
|
4 |
+
```python
|
5 |
+
import timm
|
6 |
+
model = timm.create_model('{{ model_name }}', pretrained=True)
|
7 |
+
model.eval()
|
8 |
+
```
|
9 |
+
|
10 |
+
To load and preprocess the image:
|
11 |
+
```python
|
12 |
+
import urllib
|
13 |
+
from PIL import Image
|
14 |
+
from timm.data import resolve_data_config
|
15 |
+
from timm.data.transforms_factory import create_transform
|
16 |
+
|
17 |
+
config = resolve_data_config({}, model=model)
|
18 |
+
transform = create_transform(**config)
|
19 |
+
|
20 |
+
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
|
21 |
+
urllib.request.urlretrieve(url, filename)
|
22 |
+
img = Image.open(filename).convert('RGB')
|
23 |
+
tensor = transform(img).unsqueeze(0) # transform and add batch dimension
|
24 |
+
```
|
25 |
+
|
26 |
+
To get the model predictions:
|
27 |
+
```python
|
28 |
+
import torch
|
29 |
+
with torch.no_grad():
|
30 |
+
out = model(tensor)
|
31 |
+
probabilities = torch.nn.functional.softmax(out[0], dim=0)
|
32 |
+
print(probabilities.shape)
|
33 |
+
# prints: torch.Size([1000])
|
34 |
+
```
|
35 |
+
|
36 |
+
To get the top-5 predictions class names:
|
37 |
+
```python
|
38 |
+
# Get imagenet class mappings
|
39 |
+
url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
|
40 |
+
urllib.request.urlretrieve(url, filename)
|
41 |
+
with open("imagenet_classes.txt", "r") as f:
|
42 |
+
categories = [s.strip() for s in f.readlines()]
|
43 |
+
|
44 |
+
# Print top categories per image
|
45 |
+
top5_prob, top5_catid = torch.topk(probabilities, 5)
|
46 |
+
for i in range(top5_prob.size(0)):
|
47 |
+
print(categories[top5_catid[i]], top5_prob[i].item())
|
48 |
+
# prints class names and probabilities like:
|
49 |
+
# [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]
|
50 |
+
```
|
51 |
+
|
52 |
+
Replace the model name with the variant you want to use, e.g. `{{ model_name }}`. You can find the IDs in the model summaries at the top of this page.
|
53 |
+
|
54 |
+
To extract image features with this model, follow the [timm feature extraction examples](https://rwightman.github.io/pytorch-image-models/feature_extraction/), just change the name of the model you want to use.
|
55 |
+
|
56 |
+
## How do I finetune this model?
|
57 |
+
You can finetune any of the pre-trained models just by changing the classifier (the last layer).
|
58 |
+
```python
|
59 |
+
model = timm.create_model('{{ model_name }}', pretrained=True, num_classes=NUM_FINETUNE_CLASSES)
|
60 |
+
```
|
61 |
+
To finetune on your own dataset, you have to write a training loop or adapt [timm's training
|
62 |
+
script](https://github.com/rwightman/pytorch-image-models/blob/master/train.py) to use your dataset.
|
docs/models/.templates/generate_readmes.py
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Run this script to generate the model-index files in `models` from the templates in `.templates/models`.
|
3 |
+
"""
|
4 |
+
|
5 |
+
import argparse
|
6 |
+
from pathlib import Path
|
7 |
+
|
8 |
+
from jinja2 import Environment, FileSystemLoader
|
9 |
+
|
10 |
+
import modelindex
|
11 |
+
|
12 |
+
|
13 |
+
def generate_readmes(templates_path: Path, dest_path: Path):
|
14 |
+
"""Add the code snippet template to the readmes"""
|
15 |
+
readme_templates_path = templates_path / "models"
|
16 |
+
code_template_path = templates_path / "code_snippets.md"
|
17 |
+
|
18 |
+
env = Environment(
|
19 |
+
loader=FileSystemLoader([readme_templates_path, readme_templates_path.parent]),
|
20 |
+
)
|
21 |
+
|
22 |
+
for readme in readme_templates_path.iterdir():
|
23 |
+
if readme.suffix == ".md":
|
24 |
+
template = env.get_template(readme.name)
|
25 |
+
|
26 |
+
# get the first model_name for this model family
|
27 |
+
mi = modelindex.load(str(readme))
|
28 |
+
model_name = mi.models[0].name
|
29 |
+
|
30 |
+
full_content = template.render(model_name=model_name)
|
31 |
+
|
32 |
+
# generate full_readme
|
33 |
+
with open(dest_path / readme.name, "w") as f:
|
34 |
+
f.write(full_content)
|
35 |
+
|
36 |
+
|
37 |
+
def main():
|
38 |
+
parser = argparse.ArgumentParser(description="Model index generation config")
|
39 |
+
parser.add_argument(
|
40 |
+
"-t",
|
41 |
+
"--templates",
|
42 |
+
default=Path(__file__).parent / ".templates",
|
43 |
+
type=str,
|
44 |
+
help="Location of the markdown templates",
|
45 |
+
)
|
46 |
+
parser.add_argument(
|
47 |
+
"-d",
|
48 |
+
"--dest",
|
49 |
+
default=Path(__file__).parent / "models",
|
50 |
+
type=str,
|
51 |
+
help="Destination folder that contains the generated model-index files.",
|
52 |
+
)
|
53 |
+
args = parser.parse_args()
|
54 |
+
templates_path = Path(args.templates)
|
55 |
+
dest_readmes_path = Path(args.dest)
|
56 |
+
|
57 |
+
generate_readmes(
|
58 |
+
templates_path,
|
59 |
+
dest_readmes_path,
|
60 |
+
)
|
61 |
+
|
62 |
+
|
63 |
+
if __name__ == "__main__":
|
64 |
+
main()
|
docs/models/.templates/models/adversarial-inception-v3.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Adversarial Inception v3
|
2 |
+
|
3 |
+
**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
|
4 |
+
|
5 |
+
This particular model was trained for study of adversarial examples (adversarial training).
|
6 |
+
|
7 |
+
The weights from this model were ported from [Tensorflow/Models](https://github.com/tensorflow/models).
|
8 |
+
|
9 |
+
{% include 'code_snippets.md' %}
|
10 |
+
|
11 |
+
## How do I train this model?
|
12 |
+
|
13 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
14 |
+
|
15 |
+
## Citation
|
16 |
+
|
17 |
+
```BibTeX
|
18 |
+
@article{DBLP:journals/corr/abs-1804-00097,
|
19 |
+
author = {Alexey Kurakin and
|
20 |
+
Ian J. Goodfellow and
|
21 |
+
Samy Bengio and
|
22 |
+
Yinpeng Dong and
|
23 |
+
Fangzhou Liao and
|
24 |
+
Ming Liang and
|
25 |
+
Tianyu Pang and
|
26 |
+
Jun Zhu and
|
27 |
+
Xiaolin Hu and
|
28 |
+
Cihang Xie and
|
29 |
+
Jianyu Wang and
|
30 |
+
Zhishuai Zhang and
|
31 |
+
Zhou Ren and
|
32 |
+
Alan L. Yuille and
|
33 |
+
Sangxia Huang and
|
34 |
+
Yao Zhao and
|
35 |
+
Yuzhe Zhao and
|
36 |
+
Zhonglin Han and
|
37 |
+
Junjiajia Long and
|
38 |
+
Yerkebulan Berdibekov and
|
39 |
+
Takuya Akiba and
|
40 |
+
Seiya Tokui and
|
41 |
+
Motoki Abe},
|
42 |
+
title = {Adversarial Attacks and Defences Competition},
|
43 |
+
journal = {CoRR},
|
44 |
+
volume = {abs/1804.00097},
|
45 |
+
year = {2018},
|
46 |
+
url = {http://arxiv.org/abs/1804.00097},
|
47 |
+
archivePrefix = {arXiv},
|
48 |
+
eprint = {1804.00097},
|
49 |
+
timestamp = {Thu, 31 Oct 2019 16:31:22 +0100},
|
50 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-1804-00097.bib},
|
51 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
52 |
+
}
|
53 |
+
```
|
54 |
+
|
55 |
+
<!--
|
56 |
+
Type: model-index
|
57 |
+
Collections:
|
58 |
+
- Name: Adversarial Inception v3
|
59 |
+
Paper:
|
60 |
+
Title: Adversarial Attacks and Defences Competition
|
61 |
+
URL: https://paperswithcode.com/paper/adversarial-attacks-and-defences-competition
|
62 |
+
Models:
|
63 |
+
- Name: adv_inception_v3
|
64 |
+
In Collection: Adversarial Inception v3
|
65 |
+
Metadata:
|
66 |
+
FLOPs: 7352418880
|
67 |
+
Parameters: 23830000
|
68 |
+
File Size: 95549439
|
69 |
+
Architecture:
|
70 |
+
- 1x1 Convolution
|
71 |
+
- Auxiliary Classifier
|
72 |
+
- Average Pooling
|
73 |
+
- Average Pooling
|
74 |
+
- Batch Normalization
|
75 |
+
- Convolution
|
76 |
+
- Dense Connections
|
77 |
+
- Dropout
|
78 |
+
- Inception-v3 Module
|
79 |
+
- Max Pooling
|
80 |
+
- ReLU
|
81 |
+
- Softmax
|
82 |
+
Tasks:
|
83 |
+
- Image Classification
|
84 |
+
Training Data:
|
85 |
+
- ImageNet
|
86 |
+
ID: adv_inception_v3
|
87 |
+
Crop Pct: '0.875'
|
88 |
+
Image Size: '299'
|
89 |
+
Interpolation: bicubic
|
90 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_v3.py#L456
|
91 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/adv_inception_v3-9e27bd63.pth
|
92 |
+
Results:
|
93 |
+
- Task: Image Classification
|
94 |
+
Dataset: ImageNet
|
95 |
+
Metrics:
|
96 |
+
Top 1 Accuracy: 77.58%
|
97 |
+
Top 5 Accuracy: 93.74%
|
98 |
+
-->
|
docs/models/.templates/models/advprop.md
ADDED
@@ -0,0 +1,457 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# AdvProp (EfficientNet)
|
2 |
+
|
3 |
+
**AdvProp** is an adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to the method is the usage of a separate auxiliary batch norm for adversarial examples, as they have different underlying distributions to normal examples.
|
4 |
+
|
5 |
+
The weights from this model were ported from [Tensorflow/TPU](https://github.com/tensorflow/tpu).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{xie2020adversarial,
|
17 |
+
title={Adversarial Examples Improve Image Recognition},
|
18 |
+
author={Cihang Xie and Mingxing Tan and Boqing Gong and Jiang Wang and Alan Yuille and Quoc V. Le},
|
19 |
+
year={2020},
|
20 |
+
eprint={1911.09665},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: AdvProp
|
30 |
+
Paper:
|
31 |
+
Title: Adversarial Examples Improve Image Recognition
|
32 |
+
URL: https://paperswithcode.com/paper/adversarial-examples-improve-image
|
33 |
+
Models:
|
34 |
+
- Name: tf_efficientnet_b0_ap
|
35 |
+
In Collection: AdvProp
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 488688572
|
38 |
+
Parameters: 5290000
|
39 |
+
File Size: 21385973
|
40 |
+
Architecture:
|
41 |
+
- 1x1 Convolution
|
42 |
+
- Average Pooling
|
43 |
+
- Batch Normalization
|
44 |
+
- Convolution
|
45 |
+
- Dense Connections
|
46 |
+
- Dropout
|
47 |
+
- Inverted Residual Block
|
48 |
+
- Squeeze-and-Excitation Block
|
49 |
+
- Swish
|
50 |
+
Tasks:
|
51 |
+
- Image Classification
|
52 |
+
Training Techniques:
|
53 |
+
- AdvProp
|
54 |
+
- AutoAugment
|
55 |
+
- Label Smoothing
|
56 |
+
- RMSProp
|
57 |
+
- Stochastic Depth
|
58 |
+
- Weight Decay
|
59 |
+
Training Data:
|
60 |
+
- ImageNet
|
61 |
+
ID: tf_efficientnet_b0_ap
|
62 |
+
LR: 0.256
|
63 |
+
Epochs: 350
|
64 |
+
Crop Pct: '0.875'
|
65 |
+
Momentum: 0.9
|
66 |
+
Batch Size: 2048
|
67 |
+
Image Size: '224'
|
68 |
+
Weight Decay: 1.0e-05
|
69 |
+
Interpolation: bicubic
|
70 |
+
RMSProp Decay: 0.9
|
71 |
+
Label Smoothing: 0.1
|
72 |
+
BatchNorm Momentum: 0.99
|
73 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1334
|
74 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b0_ap-f262efe1.pth
|
75 |
+
Results:
|
76 |
+
- Task: Image Classification
|
77 |
+
Dataset: ImageNet
|
78 |
+
Metrics:
|
79 |
+
Top 1 Accuracy: 77.1%
|
80 |
+
Top 5 Accuracy: 93.26%
|
81 |
+
- Name: tf_efficientnet_b1_ap
|
82 |
+
In Collection: AdvProp
|
83 |
+
Metadata:
|
84 |
+
FLOPs: 883633200
|
85 |
+
Parameters: 7790000
|
86 |
+
File Size: 31515350
|
87 |
+
Architecture:
|
88 |
+
- 1x1 Convolution
|
89 |
+
- Average Pooling
|
90 |
+
- Batch Normalization
|
91 |
+
- Convolution
|
92 |
+
- Dense Connections
|
93 |
+
- Dropout
|
94 |
+
- Inverted Residual Block
|
95 |
+
- Squeeze-and-Excitation Block
|
96 |
+
- Swish
|
97 |
+
Tasks:
|
98 |
+
- Image Classification
|
99 |
+
Training Techniques:
|
100 |
+
- AdvProp
|
101 |
+
- AutoAugment
|
102 |
+
- Label Smoothing
|
103 |
+
- RMSProp
|
104 |
+
- Stochastic Depth
|
105 |
+
- Weight Decay
|
106 |
+
Training Data:
|
107 |
+
- ImageNet
|
108 |
+
ID: tf_efficientnet_b1_ap
|
109 |
+
LR: 0.256
|
110 |
+
Epochs: 350
|
111 |
+
Crop Pct: '0.882'
|
112 |
+
Momentum: 0.9
|
113 |
+
Batch Size: 2048
|
114 |
+
Image Size: '240'
|
115 |
+
Weight Decay: 1.0e-05
|
116 |
+
Interpolation: bicubic
|
117 |
+
RMSProp Decay: 0.9
|
118 |
+
Label Smoothing: 0.1
|
119 |
+
BatchNorm Momentum: 0.99
|
120 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1344
|
121 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b1_ap-44ef0a3d.pth
|
122 |
+
Results:
|
123 |
+
- Task: Image Classification
|
124 |
+
Dataset: ImageNet
|
125 |
+
Metrics:
|
126 |
+
Top 1 Accuracy: 79.28%
|
127 |
+
Top 5 Accuracy: 94.3%
|
128 |
+
- Name: tf_efficientnet_b2_ap
|
129 |
+
In Collection: AdvProp
|
130 |
+
Metadata:
|
131 |
+
FLOPs: 1234321170
|
132 |
+
Parameters: 9110000
|
133 |
+
File Size: 36800745
|
134 |
+
Architecture:
|
135 |
+
- 1x1 Convolution
|
136 |
+
- Average Pooling
|
137 |
+
- Batch Normalization
|
138 |
+
- Convolution
|
139 |
+
- Dense Connections
|
140 |
+
- Dropout
|
141 |
+
- Inverted Residual Block
|
142 |
+
- Squeeze-and-Excitation Block
|
143 |
+
- Swish
|
144 |
+
Tasks:
|
145 |
+
- Image Classification
|
146 |
+
Training Techniques:
|
147 |
+
- AdvProp
|
148 |
+
- AutoAugment
|
149 |
+
- Label Smoothing
|
150 |
+
- RMSProp
|
151 |
+
- Stochastic Depth
|
152 |
+
- Weight Decay
|
153 |
+
Training Data:
|
154 |
+
- ImageNet
|
155 |
+
ID: tf_efficientnet_b2_ap
|
156 |
+
LR: 0.256
|
157 |
+
Epochs: 350
|
158 |
+
Crop Pct: '0.89'
|
159 |
+
Momentum: 0.9
|
160 |
+
Batch Size: 2048
|
161 |
+
Image Size: '260'
|
162 |
+
Weight Decay: 1.0e-05
|
163 |
+
Interpolation: bicubic
|
164 |
+
RMSProp Decay: 0.9
|
165 |
+
Label Smoothing: 0.1
|
166 |
+
BatchNorm Momentum: 0.99
|
167 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1354
|
168 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b2_ap-2f8e7636.pth
|
169 |
+
Results:
|
170 |
+
- Task: Image Classification
|
171 |
+
Dataset: ImageNet
|
172 |
+
Metrics:
|
173 |
+
Top 1 Accuracy: 80.3%
|
174 |
+
Top 5 Accuracy: 95.03%
|
175 |
+
- Name: tf_efficientnet_b3_ap
|
176 |
+
In Collection: AdvProp
|
177 |
+
Metadata:
|
178 |
+
FLOPs: 2275247568
|
179 |
+
Parameters: 12230000
|
180 |
+
File Size: 49384538
|
181 |
+
Architecture:
|
182 |
+
- 1x1 Convolution
|
183 |
+
- Average Pooling
|
184 |
+
- Batch Normalization
|
185 |
+
- Convolution
|
186 |
+
- Dense Connections
|
187 |
+
- Dropout
|
188 |
+
- Inverted Residual Block
|
189 |
+
- Squeeze-and-Excitation Block
|
190 |
+
- Swish
|
191 |
+
Tasks:
|
192 |
+
- Image Classification
|
193 |
+
Training Techniques:
|
194 |
+
- AdvProp
|
195 |
+
- AutoAugment
|
196 |
+
- Label Smoothing
|
197 |
+
- RMSProp
|
198 |
+
- Stochastic Depth
|
199 |
+
- Weight Decay
|
200 |
+
Training Data:
|
201 |
+
- ImageNet
|
202 |
+
ID: tf_efficientnet_b3_ap
|
203 |
+
LR: 0.256
|
204 |
+
Epochs: 350
|
205 |
+
Crop Pct: '0.904'
|
206 |
+
Momentum: 0.9
|
207 |
+
Batch Size: 2048
|
208 |
+
Image Size: '300'
|
209 |
+
Weight Decay: 1.0e-05
|
210 |
+
Interpolation: bicubic
|
211 |
+
RMSProp Decay: 0.9
|
212 |
+
Label Smoothing: 0.1
|
213 |
+
BatchNorm Momentum: 0.99
|
214 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1364
|
215 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b3_ap-aad25bdd.pth
|
216 |
+
Results:
|
217 |
+
- Task: Image Classification
|
218 |
+
Dataset: ImageNet
|
219 |
+
Metrics:
|
220 |
+
Top 1 Accuracy: 81.82%
|
221 |
+
Top 5 Accuracy: 95.62%
|
222 |
+
- Name: tf_efficientnet_b4_ap
|
223 |
+
In Collection: AdvProp
|
224 |
+
Metadata:
|
225 |
+
FLOPs: 5749638672
|
226 |
+
Parameters: 19340000
|
227 |
+
File Size: 77993585
|
228 |
+
Architecture:
|
229 |
+
- 1x1 Convolution
|
230 |
+
- Average Pooling
|
231 |
+
- Batch Normalization
|
232 |
+
- Convolution
|
233 |
+
- Dense Connections
|
234 |
+
- Dropout
|
235 |
+
- Inverted Residual Block
|
236 |
+
- Squeeze-and-Excitation Block
|
237 |
+
- Swish
|
238 |
+
Tasks:
|
239 |
+
- Image Classification
|
240 |
+
Training Techniques:
|
241 |
+
- AdvProp
|
242 |
+
- AutoAugment
|
243 |
+
- Label Smoothing
|
244 |
+
- RMSProp
|
245 |
+
- Stochastic Depth
|
246 |
+
- Weight Decay
|
247 |
+
Training Data:
|
248 |
+
- ImageNet
|
249 |
+
ID: tf_efficientnet_b4_ap
|
250 |
+
LR: 0.256
|
251 |
+
Epochs: 350
|
252 |
+
Crop Pct: '0.922'
|
253 |
+
Momentum: 0.9
|
254 |
+
Batch Size: 2048
|
255 |
+
Image Size: '380'
|
256 |
+
Weight Decay: 1.0e-05
|
257 |
+
Interpolation: bicubic
|
258 |
+
RMSProp Decay: 0.9
|
259 |
+
Label Smoothing: 0.1
|
260 |
+
BatchNorm Momentum: 0.99
|
261 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1374
|
262 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b4_ap-dedb23e6.pth
|
263 |
+
Results:
|
264 |
+
- Task: Image Classification
|
265 |
+
Dataset: ImageNet
|
266 |
+
Metrics:
|
267 |
+
Top 1 Accuracy: 83.26%
|
268 |
+
Top 5 Accuracy: 96.39%
|
269 |
+
- Name: tf_efficientnet_b5_ap
|
270 |
+
In Collection: AdvProp
|
271 |
+
Metadata:
|
272 |
+
FLOPs: 13176501888
|
273 |
+
Parameters: 30390000
|
274 |
+
File Size: 122403150
|
275 |
+
Architecture:
|
276 |
+
- 1x1 Convolution
|
277 |
+
- Average Pooling
|
278 |
+
- Batch Normalization
|
279 |
+
- Convolution
|
280 |
+
- Dense Connections
|
281 |
+
- Dropout
|
282 |
+
- Inverted Residual Block
|
283 |
+
- Squeeze-and-Excitation Block
|
284 |
+
- Swish
|
285 |
+
Tasks:
|
286 |
+
- Image Classification
|
287 |
+
Training Techniques:
|
288 |
+
- AdvProp
|
289 |
+
- AutoAugment
|
290 |
+
- Label Smoothing
|
291 |
+
- RMSProp
|
292 |
+
- Stochastic Depth
|
293 |
+
- Weight Decay
|
294 |
+
Training Data:
|
295 |
+
- ImageNet
|
296 |
+
ID: tf_efficientnet_b5_ap
|
297 |
+
LR: 0.256
|
298 |
+
Epochs: 350
|
299 |
+
Crop Pct: '0.934'
|
300 |
+
Momentum: 0.9
|
301 |
+
Batch Size: 2048
|
302 |
+
Image Size: '456'
|
303 |
+
Weight Decay: 1.0e-05
|
304 |
+
Interpolation: bicubic
|
305 |
+
RMSProp Decay: 0.9
|
306 |
+
Label Smoothing: 0.1
|
307 |
+
BatchNorm Momentum: 0.99
|
308 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1384
|
309 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b5_ap-9e82fae8.pth
|
310 |
+
Results:
|
311 |
+
- Task: Image Classification
|
312 |
+
Dataset: ImageNet
|
313 |
+
Metrics:
|
314 |
+
Top 1 Accuracy: 84.25%
|
315 |
+
Top 5 Accuracy: 96.97%
|
316 |
+
- Name: tf_efficientnet_b6_ap
|
317 |
+
In Collection: AdvProp
|
318 |
+
Metadata:
|
319 |
+
FLOPs: 24180518488
|
320 |
+
Parameters: 43040000
|
321 |
+
File Size: 173237466
|
322 |
+
Architecture:
|
323 |
+
- 1x1 Convolution
|
324 |
+
- Average Pooling
|
325 |
+
- Batch Normalization
|
326 |
+
- Convolution
|
327 |
+
- Dense Connections
|
328 |
+
- Dropout
|
329 |
+
- Inverted Residual Block
|
330 |
+
- Squeeze-and-Excitation Block
|
331 |
+
- Swish
|
332 |
+
Tasks:
|
333 |
+
- Image Classification
|
334 |
+
Training Techniques:
|
335 |
+
- AdvProp
|
336 |
+
- AutoAugment
|
337 |
+
- Label Smoothing
|
338 |
+
- RMSProp
|
339 |
+
- Stochastic Depth
|
340 |
+
- Weight Decay
|
341 |
+
Training Data:
|
342 |
+
- ImageNet
|
343 |
+
ID: tf_efficientnet_b6_ap
|
344 |
+
LR: 0.256
|
345 |
+
Epochs: 350
|
346 |
+
Crop Pct: '0.942'
|
347 |
+
Momentum: 0.9
|
348 |
+
Batch Size: 2048
|
349 |
+
Image Size: '528'
|
350 |
+
Weight Decay: 1.0e-05
|
351 |
+
Interpolation: bicubic
|
352 |
+
RMSProp Decay: 0.9
|
353 |
+
Label Smoothing: 0.1
|
354 |
+
BatchNorm Momentum: 0.99
|
355 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1394
|
356 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b6_ap-4ffb161f.pth
|
357 |
+
Results:
|
358 |
+
- Task: Image Classification
|
359 |
+
Dataset: ImageNet
|
360 |
+
Metrics:
|
361 |
+
Top 1 Accuracy: 84.79%
|
362 |
+
Top 5 Accuracy: 97.14%
|
363 |
+
- Name: tf_efficientnet_b7_ap
|
364 |
+
In Collection: AdvProp
|
365 |
+
Metadata:
|
366 |
+
FLOPs: 48205304880
|
367 |
+
Parameters: 66349999
|
368 |
+
File Size: 266850607
|
369 |
+
Architecture:
|
370 |
+
- 1x1 Convolution
|
371 |
+
- Average Pooling
|
372 |
+
- Batch Normalization
|
373 |
+
- Convolution
|
374 |
+
- Dense Connections
|
375 |
+
- Dropout
|
376 |
+
- Inverted Residual Block
|
377 |
+
- Squeeze-and-Excitation Block
|
378 |
+
- Swish
|
379 |
+
Tasks:
|
380 |
+
- Image Classification
|
381 |
+
Training Techniques:
|
382 |
+
- AdvProp
|
383 |
+
- AutoAugment
|
384 |
+
- Label Smoothing
|
385 |
+
- RMSProp
|
386 |
+
- Stochastic Depth
|
387 |
+
- Weight Decay
|
388 |
+
Training Data:
|
389 |
+
- ImageNet
|
390 |
+
ID: tf_efficientnet_b7_ap
|
391 |
+
LR: 0.256
|
392 |
+
Epochs: 350
|
393 |
+
Crop Pct: '0.949'
|
394 |
+
Momentum: 0.9
|
395 |
+
Batch Size: 2048
|
396 |
+
Image Size: '600'
|
397 |
+
Weight Decay: 1.0e-05
|
398 |
+
Interpolation: bicubic
|
399 |
+
RMSProp Decay: 0.9
|
400 |
+
Label Smoothing: 0.1
|
401 |
+
BatchNorm Momentum: 0.99
|
402 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1405
|
403 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b7_ap-ddb28fec.pth
|
404 |
+
Results:
|
405 |
+
- Task: Image Classification
|
406 |
+
Dataset: ImageNet
|
407 |
+
Metrics:
|
408 |
+
Top 1 Accuracy: 85.12%
|
409 |
+
Top 5 Accuracy: 97.25%
|
410 |
+
- Name: tf_efficientnet_b8_ap
|
411 |
+
In Collection: AdvProp
|
412 |
+
Metadata:
|
413 |
+
FLOPs: 80962956270
|
414 |
+
Parameters: 87410000
|
415 |
+
File Size: 351412563
|
416 |
+
Architecture:
|
417 |
+
- 1x1 Convolution
|
418 |
+
- Average Pooling
|
419 |
+
- Batch Normalization
|
420 |
+
- Convolution
|
421 |
+
- Dense Connections
|
422 |
+
- Dropout
|
423 |
+
- Inverted Residual Block
|
424 |
+
- Squeeze-and-Excitation Block
|
425 |
+
- Swish
|
426 |
+
Tasks:
|
427 |
+
- Image Classification
|
428 |
+
Training Techniques:
|
429 |
+
- AdvProp
|
430 |
+
- AutoAugment
|
431 |
+
- Label Smoothing
|
432 |
+
- RMSProp
|
433 |
+
- Stochastic Depth
|
434 |
+
- Weight Decay
|
435 |
+
Training Data:
|
436 |
+
- ImageNet
|
437 |
+
ID: tf_efficientnet_b8_ap
|
438 |
+
LR: 0.128
|
439 |
+
Epochs: 350
|
440 |
+
Crop Pct: '0.954'
|
441 |
+
Momentum: 0.9
|
442 |
+
Batch Size: 2048
|
443 |
+
Image Size: '672'
|
444 |
+
Weight Decay: 1.0e-05
|
445 |
+
Interpolation: bicubic
|
446 |
+
RMSProp Decay: 0.9
|
447 |
+
Label Smoothing: 0.1
|
448 |
+
BatchNorm Momentum: 0.99
|
449 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1416
|
450 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b8_ap-00e169fa.pth
|
451 |
+
Results:
|
452 |
+
- Task: Image Classification
|
453 |
+
Dataset: ImageNet
|
454 |
+
Metrics:
|
455 |
+
Top 1 Accuracy: 85.37%
|
456 |
+
Top 5 Accuracy: 97.3%
|
457 |
+
-->
|
docs/models/.templates/models/big-transfer.md
ADDED
@@ -0,0 +1,295 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Big Transfer (BiT)
|
2 |
+
|
3 |
+
**Big Transfer (BiT)** is a type of pretraining recipe that pre-trains on a large supervised source dataset, and fine-tunes the weights on the target task. Models are trained on the JFT-300M dataset. The finetuned models contained in this collection are finetuned on ImageNet.
|
4 |
+
|
5 |
+
{% include 'code_snippets.md' %}
|
6 |
+
|
7 |
+
## How do I train this model?
|
8 |
+
|
9 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
10 |
+
|
11 |
+
## Citation
|
12 |
+
|
13 |
+
```BibTeX
|
14 |
+
@misc{kolesnikov2020big,
|
15 |
+
title={Big Transfer (BiT): General Visual Representation Learning},
|
16 |
+
author={Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Joan Puigcerver and Jessica Yung and Sylvain Gelly and Neil Houlsby},
|
17 |
+
year={2020},
|
18 |
+
eprint={1912.11370},
|
19 |
+
archivePrefix={arXiv},
|
20 |
+
primaryClass={cs.CV}
|
21 |
+
}
|
22 |
+
```
|
23 |
+
|
24 |
+
<!--
|
25 |
+
Type: model-index
|
26 |
+
Collections:
|
27 |
+
- Name: Big Transfer
|
28 |
+
Paper:
|
29 |
+
Title: 'Big Transfer (BiT): General Visual Representation Learning'
|
30 |
+
URL: https://paperswithcode.com/paper/large-scale-learning-of-general-visual
|
31 |
+
Models:
|
32 |
+
- Name: resnetv2_101x1_bitm
|
33 |
+
In Collection: Big Transfer
|
34 |
+
Metadata:
|
35 |
+
FLOPs: 5330896
|
36 |
+
Parameters: 44540000
|
37 |
+
File Size: 178256468
|
38 |
+
Architecture:
|
39 |
+
- 1x1 Convolution
|
40 |
+
- Bottleneck Residual Block
|
41 |
+
- Convolution
|
42 |
+
- Global Average Pooling
|
43 |
+
- Group Normalization
|
44 |
+
- Max Pooling
|
45 |
+
- ReLU
|
46 |
+
- Residual Block
|
47 |
+
- Residual Connection
|
48 |
+
- Softmax
|
49 |
+
- Weight Standardization
|
50 |
+
Tasks:
|
51 |
+
- Image Classification
|
52 |
+
Training Techniques:
|
53 |
+
- Mixup
|
54 |
+
- SGD with Momentum
|
55 |
+
- Weight Decay
|
56 |
+
Training Data:
|
57 |
+
- ImageNet
|
58 |
+
- JFT-300M
|
59 |
+
Training Resources: Cloud TPUv3-512
|
60 |
+
ID: resnetv2_101x1_bitm
|
61 |
+
LR: 0.03
|
62 |
+
Epochs: 90
|
63 |
+
Layers: 101
|
64 |
+
Crop Pct: '1.0'
|
65 |
+
Momentum: 0.9
|
66 |
+
Batch Size: 4096
|
67 |
+
Image Size: '480'
|
68 |
+
Weight Decay: 0.0001
|
69 |
+
Interpolation: bilinear
|
70 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L444
|
71 |
+
Weights: https://storage.googleapis.com/bit_models/BiT-M-R101x1-ILSVRC2012.npz
|
72 |
+
Results:
|
73 |
+
- Task: Image Classification
|
74 |
+
Dataset: ImageNet
|
75 |
+
Metrics:
|
76 |
+
Top 1 Accuracy: 82.21%
|
77 |
+
Top 5 Accuracy: 96.47%
|
78 |
+
- Name: resnetv2_101x3_bitm
|
79 |
+
In Collection: Big Transfer
|
80 |
+
Metadata:
|
81 |
+
FLOPs: 15988688
|
82 |
+
Parameters: 387930000
|
83 |
+
File Size: 1551830100
|
84 |
+
Architecture:
|
85 |
+
- 1x1 Convolution
|
86 |
+
- Bottleneck Residual Block
|
87 |
+
- Convolution
|
88 |
+
- Global Average Pooling
|
89 |
+
- Group Normalization
|
90 |
+
- Max Pooling
|
91 |
+
- ReLU
|
92 |
+
- Residual Block
|
93 |
+
- Residual Connection
|
94 |
+
- Softmax
|
95 |
+
- Weight Standardization
|
96 |
+
Tasks:
|
97 |
+
- Image Classification
|
98 |
+
Training Techniques:
|
99 |
+
- Mixup
|
100 |
+
- SGD with Momentum
|
101 |
+
- Weight Decay
|
102 |
+
Training Data:
|
103 |
+
- ImageNet
|
104 |
+
- JFT-300M
|
105 |
+
Training Resources: Cloud TPUv3-512
|
106 |
+
ID: resnetv2_101x3_bitm
|
107 |
+
LR: 0.03
|
108 |
+
Epochs: 90
|
109 |
+
Layers: 101
|
110 |
+
Crop Pct: '1.0'
|
111 |
+
Momentum: 0.9
|
112 |
+
Batch Size: 4096
|
113 |
+
Image Size: '480'
|
114 |
+
Weight Decay: 0.0001
|
115 |
+
Interpolation: bilinear
|
116 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L451
|
117 |
+
Weights: https://storage.googleapis.com/bit_models/BiT-M-R101x3-ILSVRC2012.npz
|
118 |
+
Results:
|
119 |
+
- Task: Image Classification
|
120 |
+
Dataset: ImageNet
|
121 |
+
Metrics:
|
122 |
+
Top 1 Accuracy: 84.38%
|
123 |
+
Top 5 Accuracy: 97.37%
|
124 |
+
- Name: resnetv2_152x2_bitm
|
125 |
+
In Collection: Big Transfer
|
126 |
+
Metadata:
|
127 |
+
FLOPs: 10659792
|
128 |
+
Parameters: 236340000
|
129 |
+
File Size: 945476668
|
130 |
+
Architecture:
|
131 |
+
- 1x1 Convolution
|
132 |
+
- Bottleneck Residual Block
|
133 |
+
- Convolution
|
134 |
+
- Global Average Pooling
|
135 |
+
- Group Normalization
|
136 |
+
- Max Pooling
|
137 |
+
- ReLU
|
138 |
+
- Residual Block
|
139 |
+
- Residual Connection
|
140 |
+
- Softmax
|
141 |
+
- Weight Standardization
|
142 |
+
Tasks:
|
143 |
+
- Image Classification
|
144 |
+
Training Techniques:
|
145 |
+
- Mixup
|
146 |
+
- SGD with Momentum
|
147 |
+
- Weight Decay
|
148 |
+
Training Data:
|
149 |
+
- ImageNet
|
150 |
+
- JFT-300M
|
151 |
+
ID: resnetv2_152x2_bitm
|
152 |
+
Crop Pct: '1.0'
|
153 |
+
Image Size: '480'
|
154 |
+
Interpolation: bilinear
|
155 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L458
|
156 |
+
Weights: https://storage.googleapis.com/bit_models/BiT-M-R152x2-ILSVRC2012.npz
|
157 |
+
Results:
|
158 |
+
- Task: Image Classification
|
159 |
+
Dataset: ImageNet
|
160 |
+
Metrics:
|
161 |
+
Top 1 Accuracy: 84.4%
|
162 |
+
Top 5 Accuracy: 97.43%
|
163 |
+
- Name: resnetv2_152x4_bitm
|
164 |
+
In Collection: Big Transfer
|
165 |
+
Metadata:
|
166 |
+
FLOPs: 21317584
|
167 |
+
Parameters: 936530000
|
168 |
+
File Size: 3746270104
|
169 |
+
Architecture:
|
170 |
+
- 1x1 Convolution
|
171 |
+
- Bottleneck Residual Block
|
172 |
+
- Convolution
|
173 |
+
- Global Average Pooling
|
174 |
+
- Group Normalization
|
175 |
+
- Max Pooling
|
176 |
+
- ReLU
|
177 |
+
- Residual Block
|
178 |
+
- Residual Connection
|
179 |
+
- Softmax
|
180 |
+
- Weight Standardization
|
181 |
+
Tasks:
|
182 |
+
- Image Classification
|
183 |
+
Training Techniques:
|
184 |
+
- Mixup
|
185 |
+
- SGD with Momentum
|
186 |
+
- Weight Decay
|
187 |
+
Training Data:
|
188 |
+
- ImageNet
|
189 |
+
- JFT-300M
|
190 |
+
Training Resources: Cloud TPUv3-512
|
191 |
+
ID: resnetv2_152x4_bitm
|
192 |
+
Crop Pct: '1.0'
|
193 |
+
Image Size: '480'
|
194 |
+
Interpolation: bilinear
|
195 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L465
|
196 |
+
Weights: https://storage.googleapis.com/bit_models/BiT-M-R152x4-ILSVRC2012.npz
|
197 |
+
Results:
|
198 |
+
- Task: Image Classification
|
199 |
+
Dataset: ImageNet
|
200 |
+
Metrics:
|
201 |
+
Top 1 Accuracy: 84.95%
|
202 |
+
Top 5 Accuracy: 97.45%
|
203 |
+
- Name: resnetv2_50x1_bitm
|
204 |
+
In Collection: Big Transfer
|
205 |
+
Metadata:
|
206 |
+
FLOPs: 5330896
|
207 |
+
Parameters: 25550000
|
208 |
+
File Size: 102242668
|
209 |
+
Architecture:
|
210 |
+
- 1x1 Convolution
|
211 |
+
- Bottleneck Residual Block
|
212 |
+
- Convolution
|
213 |
+
- Global Average Pooling
|
214 |
+
- Group Normalization
|
215 |
+
- Max Pooling
|
216 |
+
- ReLU
|
217 |
+
- Residual Block
|
218 |
+
- Residual Connection
|
219 |
+
- Softmax
|
220 |
+
- Weight Standardization
|
221 |
+
Tasks:
|
222 |
+
- Image Classification
|
223 |
+
Training Techniques:
|
224 |
+
- Mixup
|
225 |
+
- SGD with Momentum
|
226 |
+
- Weight Decay
|
227 |
+
Training Data:
|
228 |
+
- ImageNet
|
229 |
+
- JFT-300M
|
230 |
+
Training Resources: Cloud TPUv3-512
|
231 |
+
ID: resnetv2_50x1_bitm
|
232 |
+
LR: 0.03
|
233 |
+
Epochs: 90
|
234 |
+
Layers: 50
|
235 |
+
Crop Pct: '1.0'
|
236 |
+
Momentum: 0.9
|
237 |
+
Batch Size: 4096
|
238 |
+
Image Size: '480'
|
239 |
+
Weight Decay: 0.0001
|
240 |
+
Interpolation: bilinear
|
241 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L430
|
242 |
+
Weights: https://storage.googleapis.com/bit_models/BiT-M-R50x1-ILSVRC2012.npz
|
243 |
+
Results:
|
244 |
+
- Task: Image Classification
|
245 |
+
Dataset: ImageNet
|
246 |
+
Metrics:
|
247 |
+
Top 1 Accuracy: 80.19%
|
248 |
+
Top 5 Accuracy: 95.63%
|
249 |
+
- Name: resnetv2_50x3_bitm
|
250 |
+
In Collection: Big Transfer
|
251 |
+
Metadata:
|
252 |
+
FLOPs: 15988688
|
253 |
+
Parameters: 217320000
|
254 |
+
File Size: 869321580
|
255 |
+
Architecture:
|
256 |
+
- 1x1 Convolution
|
257 |
+
- Bottleneck Residual Block
|
258 |
+
- Convolution
|
259 |
+
- Global Average Pooling
|
260 |
+
- Group Normalization
|
261 |
+
- Max Pooling
|
262 |
+
- ReLU
|
263 |
+
- Residual Block
|
264 |
+
- Residual Connection
|
265 |
+
- Softmax
|
266 |
+
- Weight Standardization
|
267 |
+
Tasks:
|
268 |
+
- Image Classification
|
269 |
+
Training Techniques:
|
270 |
+
- Mixup
|
271 |
+
- SGD with Momentum
|
272 |
+
- Weight Decay
|
273 |
+
Training Data:
|
274 |
+
- ImageNet
|
275 |
+
- JFT-300M
|
276 |
+
Training Resources: Cloud TPUv3-512
|
277 |
+
ID: resnetv2_50x3_bitm
|
278 |
+
LR: 0.03
|
279 |
+
Epochs: 90
|
280 |
+
Layers: 50
|
281 |
+
Crop Pct: '1.0'
|
282 |
+
Momentum: 0.9
|
283 |
+
Batch Size: 4096
|
284 |
+
Image Size: '480'
|
285 |
+
Weight Decay: 0.0001
|
286 |
+
Interpolation: bilinear
|
287 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L437
|
288 |
+
Weights: https://storage.googleapis.com/bit_models/BiT-M-R50x3-ILSVRC2012.npz
|
289 |
+
Results:
|
290 |
+
- Task: Image Classification
|
291 |
+
Dataset: ImageNet
|
292 |
+
Metrics:
|
293 |
+
Top 1 Accuracy: 83.75%
|
294 |
+
Top 5 Accuracy: 97.12%
|
295 |
+
-->
|
docs/models/.templates/models/csp-darknet.md
ADDED
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CSP-DarkNet
|
2 |
+
|
3 |
+
**CSPDarknet53** is a convolutional neural network and backbone for object detection that uses [DarkNet-53](https://paperswithcode.com/method/darknet-53). It employs a CSPNet strategy to partition the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network.
|
4 |
+
|
5 |
+
This CNN is used as the backbone for [YOLOv4](https://paperswithcode.com/method/yolov4).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{bochkovskiy2020yolov4,
|
17 |
+
title={YOLOv4: Optimal Speed and Accuracy of Object Detection},
|
18 |
+
author={Alexey Bochkovskiy and Chien-Yao Wang and Hong-Yuan Mark Liao},
|
19 |
+
year={2020},
|
20 |
+
eprint={2004.10934},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: CSP DarkNet
|
30 |
+
Paper:
|
31 |
+
Title: 'YOLOv4: Optimal Speed and Accuracy of Object Detection'
|
32 |
+
URL: https://paperswithcode.com/paper/yolov4-optimal-speed-and-accuracy-of-object
|
33 |
+
Models:
|
34 |
+
- Name: cspdarknet53
|
35 |
+
In Collection: CSP DarkNet
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 8545018880
|
38 |
+
Parameters: 27640000
|
39 |
+
File Size: 110775135
|
40 |
+
Architecture:
|
41 |
+
- 1x1 Convolution
|
42 |
+
- Batch Normalization
|
43 |
+
- Convolution
|
44 |
+
- Global Average Pooling
|
45 |
+
- Mish
|
46 |
+
- Residual Connection
|
47 |
+
- Softmax
|
48 |
+
Tasks:
|
49 |
+
- Image Classification
|
50 |
+
Training Techniques:
|
51 |
+
- CutMix
|
52 |
+
- Label Smoothing
|
53 |
+
- Mosaic
|
54 |
+
- Polynomial Learning Rate Decay
|
55 |
+
- SGD with Momentum
|
56 |
+
- Self-Adversarial Training
|
57 |
+
- Weight Decay
|
58 |
+
Training Data:
|
59 |
+
- ImageNet
|
60 |
+
Training Resources: 1x NVIDIA RTX 2070 GPU
|
61 |
+
ID: cspdarknet53
|
62 |
+
LR: 0.1
|
63 |
+
Layers: 53
|
64 |
+
Crop Pct: '0.887'
|
65 |
+
Momentum: 0.9
|
66 |
+
Batch Size: 128
|
67 |
+
Image Size: '256'
|
68 |
+
Warmup Steps: 1000
|
69 |
+
Weight Decay: 0.0005
|
70 |
+
Interpolation: bilinear
|
71 |
+
Training Steps: 8000000
|
72 |
+
FPS (GPU RTX 2070): 66
|
73 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/cspnet.py#L441
|
74 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/cspdarknet53_ra_256-d05c7c21.pth
|
75 |
+
Results:
|
76 |
+
- Task: Image Classification
|
77 |
+
Dataset: ImageNet
|
78 |
+
Metrics:
|
79 |
+
Top 1 Accuracy: 80.05%
|
80 |
+
Top 5 Accuracy: 95.09%
|
81 |
+
-->
|
docs/models/.templates/models/csp-resnet.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CSP-ResNet
|
2 |
+
|
3 |
+
**CSPResNet** is a convolutional neural network where we apply the Cross Stage Partial Network (CSPNet) approach to [ResNet](https://paperswithcode.com/method/resnet). The CSPNet partitions the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network.
|
4 |
+
|
5 |
+
{% include 'code_snippets.md' %}
|
6 |
+
|
7 |
+
## How do I train this model?
|
8 |
+
|
9 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
10 |
+
|
11 |
+
## Citation
|
12 |
+
|
13 |
+
```BibTeX
|
14 |
+
@misc{wang2019cspnet,
|
15 |
+
title={CSPNet: A New Backbone that can Enhance Learning Capability of CNN},
|
16 |
+
author={Chien-Yao Wang and Hong-Yuan Mark Liao and I-Hau Yeh and Yueh-Hua Wu and Ping-Yang Chen and Jun-Wei Hsieh},
|
17 |
+
year={2019},
|
18 |
+
eprint={1911.11929},
|
19 |
+
archivePrefix={arXiv},
|
20 |
+
primaryClass={cs.CV}
|
21 |
+
}
|
22 |
+
```
|
23 |
+
|
24 |
+
<!--
|
25 |
+
Type: model-index
|
26 |
+
Collections:
|
27 |
+
- Name: CSP ResNet
|
28 |
+
Paper:
|
29 |
+
Title: 'CSPNet: A New Backbone that can Enhance Learning Capability of CNN'
|
30 |
+
URL: https://paperswithcode.com/paper/cspnet-a-new-backbone-that-can-enhance
|
31 |
+
Models:
|
32 |
+
- Name: cspresnet50
|
33 |
+
In Collection: CSP ResNet
|
34 |
+
Metadata:
|
35 |
+
FLOPs: 5924992000
|
36 |
+
Parameters: 21620000
|
37 |
+
File Size: 86679303
|
38 |
+
Architecture:
|
39 |
+
- 1x1 Convolution
|
40 |
+
- Batch Normalization
|
41 |
+
- Bottleneck Residual Block
|
42 |
+
- Convolution
|
43 |
+
- Global Average Pooling
|
44 |
+
- Max Pooling
|
45 |
+
- ReLU
|
46 |
+
- Residual Block
|
47 |
+
- Residual Connection
|
48 |
+
- Softmax
|
49 |
+
Tasks:
|
50 |
+
- Image Classification
|
51 |
+
Training Techniques:
|
52 |
+
- Label Smoothing
|
53 |
+
- Polynomial Learning Rate Decay
|
54 |
+
- SGD with Momentum
|
55 |
+
- Weight Decay
|
56 |
+
Training Data:
|
57 |
+
- ImageNet
|
58 |
+
ID: cspresnet50
|
59 |
+
LR: 0.1
|
60 |
+
Layers: 50
|
61 |
+
Crop Pct: '0.887'
|
62 |
+
Momentum: 0.9
|
63 |
+
Batch Size: 128
|
64 |
+
Image Size: '256'
|
65 |
+
Weight Decay: 0.005
|
66 |
+
Interpolation: bilinear
|
67 |
+
Training Steps: 8000000
|
68 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/cspnet.py#L415
|
69 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/cspresnet50_ra-d3e8d487.pth
|
70 |
+
Results:
|
71 |
+
- Task: Image Classification
|
72 |
+
Dataset: ImageNet
|
73 |
+
Metrics:
|
74 |
+
Top 1 Accuracy: 79.57%
|
75 |
+
Top 5 Accuracy: 94.71%
|
76 |
+
-->
|
docs/models/.templates/models/csp-resnext.md
ADDED
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CSP-ResNeXt
|
2 |
+
|
3 |
+
**CSPResNeXt** is a convolutional neural network where we apply the Cross Stage Partial Network (CSPNet) approach to [ResNeXt](https://paperswithcode.com/method/resnext). The CSPNet partitions the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network.
|
4 |
+
|
5 |
+
{% include 'code_snippets.md' %}
|
6 |
+
|
7 |
+
## How do I train this model?
|
8 |
+
|
9 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
10 |
+
|
11 |
+
## Citation
|
12 |
+
|
13 |
+
```BibTeX
|
14 |
+
@misc{wang2019cspnet,
|
15 |
+
title={CSPNet: A New Backbone that can Enhance Learning Capability of CNN},
|
16 |
+
author={Chien-Yao Wang and Hong-Yuan Mark Liao and I-Hau Yeh and Yueh-Hua Wu and Ping-Yang Chen and Jun-Wei Hsieh},
|
17 |
+
year={2019},
|
18 |
+
eprint={1911.11929},
|
19 |
+
archivePrefix={arXiv},
|
20 |
+
primaryClass={cs.CV}
|
21 |
+
}
|
22 |
+
```
|
23 |
+
|
24 |
+
<!--
|
25 |
+
Type: model-index
|
26 |
+
Collections:
|
27 |
+
- Name: CSP ResNeXt
|
28 |
+
Paper:
|
29 |
+
Title: 'CSPNet: A New Backbone that can Enhance Learning Capability of CNN'
|
30 |
+
URL: https://paperswithcode.com/paper/cspnet-a-new-backbone-that-can-enhance
|
31 |
+
Models:
|
32 |
+
- Name: cspresnext50
|
33 |
+
In Collection: CSP ResNeXt
|
34 |
+
Metadata:
|
35 |
+
FLOPs: 3962945536
|
36 |
+
Parameters: 20570000
|
37 |
+
File Size: 82562887
|
38 |
+
Architecture:
|
39 |
+
- 1x1 Convolution
|
40 |
+
- Batch Normalization
|
41 |
+
- Convolution
|
42 |
+
- Global Average Pooling
|
43 |
+
- Grouped Convolution
|
44 |
+
- Max Pooling
|
45 |
+
- ReLU
|
46 |
+
- ResNeXt Block
|
47 |
+
- Residual Connection
|
48 |
+
- Softmax
|
49 |
+
Tasks:
|
50 |
+
- Image Classification
|
51 |
+
Training Techniques:
|
52 |
+
- Label Smoothing
|
53 |
+
- Polynomial Learning Rate Decay
|
54 |
+
- SGD with Momentum
|
55 |
+
- Weight Decay
|
56 |
+
Training Data:
|
57 |
+
- ImageNet
|
58 |
+
Training Resources: 1x GPU
|
59 |
+
ID: cspresnext50
|
60 |
+
LR: 0.1
|
61 |
+
Layers: 50
|
62 |
+
Crop Pct: '0.875'
|
63 |
+
Momentum: 0.9
|
64 |
+
Batch Size: 128
|
65 |
+
Image Size: '224'
|
66 |
+
Weight Decay: 0.005
|
67 |
+
Interpolation: bilinear
|
68 |
+
Training Steps: 8000000
|
69 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/cspnet.py#L430
|
70 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/cspresnext50_ra_224-648b4713.pth
|
71 |
+
Results:
|
72 |
+
- Task: Image Classification
|
73 |
+
Dataset: ImageNet
|
74 |
+
Metrics:
|
75 |
+
Top 1 Accuracy: 80.05%
|
76 |
+
Top 5 Accuracy: 94.94%
|
77 |
+
-->
|
docs/models/.templates/models/densenet.md
ADDED
@@ -0,0 +1,305 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# DenseNet
|
2 |
+
|
3 |
+
**DenseNet** is a type of convolutional neural network that utilises dense connections between layers, through [Dense Blocks](http://www.paperswithcode.com/method/dense-block), where we connect *all layers* (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.
|
4 |
+
|
5 |
+
The **DenseNet Blur** variant in this collection by Ross Wightman employs [Blur Pooling](http://www.paperswithcode.com/method/blur-pooling)
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@article{DBLP:journals/corr/HuangLW16a,
|
17 |
+
author = {Gao Huang and
|
18 |
+
Zhuang Liu and
|
19 |
+
Kilian Q. Weinberger},
|
20 |
+
title = {Densely Connected Convolutional Networks},
|
21 |
+
journal = {CoRR},
|
22 |
+
volume = {abs/1608.06993},
|
23 |
+
year = {2016},
|
24 |
+
url = {http://arxiv.org/abs/1608.06993},
|
25 |
+
archivePrefix = {arXiv},
|
26 |
+
eprint = {1608.06993},
|
27 |
+
timestamp = {Mon, 10 Sep 2018 15:49:32 +0200},
|
28 |
+
biburl = {https://dblp.org/rec/journals/corr/HuangLW16a.bib},
|
29 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
30 |
+
}
|
31 |
+
```
|
32 |
+
|
33 |
+
```
|
34 |
+
@misc{rw2019timm,
|
35 |
+
author = {Ross Wightman},
|
36 |
+
title = {PyTorch Image Models},
|
37 |
+
year = {2019},
|
38 |
+
publisher = {GitHub},
|
39 |
+
journal = {GitHub repository},
|
40 |
+
doi = {10.5281/zenodo.4414861},
|
41 |
+
howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
|
42 |
+
}
|
43 |
+
```
|
44 |
+
|
45 |
+
<!--
|
46 |
+
Type: model-index
|
47 |
+
Collections:
|
48 |
+
- Name: DenseNet
|
49 |
+
Paper:
|
50 |
+
Title: Densely Connected Convolutional Networks
|
51 |
+
URL: https://paperswithcode.com/paper/densely-connected-convolutional-networks
|
52 |
+
Models:
|
53 |
+
- Name: densenet121
|
54 |
+
In Collection: DenseNet
|
55 |
+
Metadata:
|
56 |
+
FLOPs: 3641843200
|
57 |
+
Parameters: 7980000
|
58 |
+
File Size: 32376726
|
59 |
+
Architecture:
|
60 |
+
- 1x1 Convolution
|
61 |
+
- Average Pooling
|
62 |
+
- Batch Normalization
|
63 |
+
- Convolution
|
64 |
+
- Dense Block
|
65 |
+
- Dense Connections
|
66 |
+
- Dropout
|
67 |
+
- Max Pooling
|
68 |
+
- ReLU
|
69 |
+
- Softmax
|
70 |
+
Tasks:
|
71 |
+
- Image Classification
|
72 |
+
Training Techniques:
|
73 |
+
- Kaiming Initialization
|
74 |
+
- Nesterov Accelerated Gradient
|
75 |
+
- Weight Decay
|
76 |
+
Training Data:
|
77 |
+
- ImageNet
|
78 |
+
ID: densenet121
|
79 |
+
LR: 0.1
|
80 |
+
Epochs: 90
|
81 |
+
Layers: 121
|
82 |
+
Dropout: 0.2
|
83 |
+
Crop Pct: '0.875'
|
84 |
+
Momentum: 0.9
|
85 |
+
Batch Size: 256
|
86 |
+
Image Size: '224'
|
87 |
+
Weight Decay: 0.0001
|
88 |
+
Interpolation: bicubic
|
89 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L295
|
90 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/densenet121_ra-50efcf5c.pth
|
91 |
+
Results:
|
92 |
+
- Task: Image Classification
|
93 |
+
Dataset: ImageNet
|
94 |
+
Metrics:
|
95 |
+
Top 1 Accuracy: 75.56%
|
96 |
+
Top 5 Accuracy: 92.65%
|
97 |
+
- Name: densenet161
|
98 |
+
In Collection: DenseNet
|
99 |
+
Metadata:
|
100 |
+
FLOPs: 9931959264
|
101 |
+
Parameters: 28680000
|
102 |
+
File Size: 115730790
|
103 |
+
Architecture:
|
104 |
+
- 1x1 Convolution
|
105 |
+
- Average Pooling
|
106 |
+
- Batch Normalization
|
107 |
+
- Convolution
|
108 |
+
- Dense Block
|
109 |
+
- Dense Connections
|
110 |
+
- Dropout
|
111 |
+
- Max Pooling
|
112 |
+
- ReLU
|
113 |
+
- Softmax
|
114 |
+
Tasks:
|
115 |
+
- Image Classification
|
116 |
+
Training Techniques:
|
117 |
+
- Kaiming Initialization
|
118 |
+
- Nesterov Accelerated Gradient
|
119 |
+
- Weight Decay
|
120 |
+
Training Data:
|
121 |
+
- ImageNet
|
122 |
+
ID: densenet161
|
123 |
+
LR: 0.1
|
124 |
+
Epochs: 90
|
125 |
+
Layers: 161
|
126 |
+
Dropout: 0.2
|
127 |
+
Crop Pct: '0.875'
|
128 |
+
Momentum: 0.9
|
129 |
+
Batch Size: 256
|
130 |
+
Image Size: '224'
|
131 |
+
Weight Decay: 0.0001
|
132 |
+
Interpolation: bicubic
|
133 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L347
|
134 |
+
Weights: https://download.pytorch.org/models/densenet161-8d451a50.pth
|
135 |
+
Results:
|
136 |
+
- Task: Image Classification
|
137 |
+
Dataset: ImageNet
|
138 |
+
Metrics:
|
139 |
+
Top 1 Accuracy: 77.36%
|
140 |
+
Top 5 Accuracy: 93.63%
|
141 |
+
- Name: densenet169
|
142 |
+
In Collection: DenseNet
|
143 |
+
Metadata:
|
144 |
+
FLOPs: 4316945792
|
145 |
+
Parameters: 14150000
|
146 |
+
File Size: 57365526
|
147 |
+
Architecture:
|
148 |
+
- 1x1 Convolution
|
149 |
+
- Average Pooling
|
150 |
+
- Batch Normalization
|
151 |
+
- Convolution
|
152 |
+
- Dense Block
|
153 |
+
- Dense Connections
|
154 |
+
- Dropout
|
155 |
+
- Max Pooling
|
156 |
+
- ReLU
|
157 |
+
- Softmax
|
158 |
+
Tasks:
|
159 |
+
- Image Classification
|
160 |
+
Training Techniques:
|
161 |
+
- Kaiming Initialization
|
162 |
+
- Nesterov Accelerated Gradient
|
163 |
+
- Weight Decay
|
164 |
+
Training Data:
|
165 |
+
- ImageNet
|
166 |
+
ID: densenet169
|
167 |
+
LR: 0.1
|
168 |
+
Epochs: 90
|
169 |
+
Layers: 169
|
170 |
+
Dropout: 0.2
|
171 |
+
Crop Pct: '0.875'
|
172 |
+
Momentum: 0.9
|
173 |
+
Batch Size: 256
|
174 |
+
Image Size: '224'
|
175 |
+
Weight Decay: 0.0001
|
176 |
+
Interpolation: bicubic
|
177 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L327
|
178 |
+
Weights: https://download.pytorch.org/models/densenet169-b2777c0a.pth
|
179 |
+
Results:
|
180 |
+
- Task: Image Classification
|
181 |
+
Dataset: ImageNet
|
182 |
+
Metrics:
|
183 |
+
Top 1 Accuracy: 75.9%
|
184 |
+
Top 5 Accuracy: 93.02%
|
185 |
+
- Name: densenet201
|
186 |
+
In Collection: DenseNet
|
187 |
+
Metadata:
|
188 |
+
FLOPs: 5514321024
|
189 |
+
Parameters: 20010000
|
190 |
+
File Size: 81131730
|
191 |
+
Architecture:
|
192 |
+
- 1x1 Convolution
|
193 |
+
- Average Pooling
|
194 |
+
- Batch Normalization
|
195 |
+
- Convolution
|
196 |
+
- Dense Block
|
197 |
+
- Dense Connections
|
198 |
+
- Dropout
|
199 |
+
- Max Pooling
|
200 |
+
- ReLU
|
201 |
+
- Softmax
|
202 |
+
Tasks:
|
203 |
+
- Image Classification
|
204 |
+
Training Techniques:
|
205 |
+
- Kaiming Initialization
|
206 |
+
- Nesterov Accelerated Gradient
|
207 |
+
- Weight Decay
|
208 |
+
Training Data:
|
209 |
+
- ImageNet
|
210 |
+
ID: densenet201
|
211 |
+
LR: 0.1
|
212 |
+
Epochs: 90
|
213 |
+
Layers: 201
|
214 |
+
Dropout: 0.2
|
215 |
+
Crop Pct: '0.875'
|
216 |
+
Momentum: 0.9
|
217 |
+
Batch Size: 256
|
218 |
+
Image Size: '224'
|
219 |
+
Weight Decay: 0.0001
|
220 |
+
Interpolation: bicubic
|
221 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L337
|
222 |
+
Weights: https://download.pytorch.org/models/densenet201-c1103571.pth
|
223 |
+
Results:
|
224 |
+
- Task: Image Classification
|
225 |
+
Dataset: ImageNet
|
226 |
+
Metrics:
|
227 |
+
Top 1 Accuracy: 77.29%
|
228 |
+
Top 5 Accuracy: 93.48%
|
229 |
+
- Name: densenetblur121d
|
230 |
+
In Collection: DenseNet
|
231 |
+
Metadata:
|
232 |
+
FLOPs: 3947812864
|
233 |
+
Parameters: 8000000
|
234 |
+
File Size: 32456500
|
235 |
+
Architecture:
|
236 |
+
- 1x1 Convolution
|
237 |
+
- Batch Normalization
|
238 |
+
- Blur Pooling
|
239 |
+
- Convolution
|
240 |
+
- Dense Block
|
241 |
+
- Dense Connections
|
242 |
+
- Dropout
|
243 |
+
- Max Pooling
|
244 |
+
- ReLU
|
245 |
+
- Softmax
|
246 |
+
Tasks:
|
247 |
+
- Image Classification
|
248 |
+
Training Data:
|
249 |
+
- ImageNet
|
250 |
+
ID: densenetblur121d
|
251 |
+
Crop Pct: '0.875'
|
252 |
+
Image Size: '224'
|
253 |
+
Interpolation: bicubic
|
254 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L305
|
255 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/densenetblur121d_ra-100dcfbc.pth
|
256 |
+
Results:
|
257 |
+
- Task: Image Classification
|
258 |
+
Dataset: ImageNet
|
259 |
+
Metrics:
|
260 |
+
Top 1 Accuracy: 76.59%
|
261 |
+
Top 5 Accuracy: 93.2%
|
262 |
+
- Name: tv_densenet121
|
263 |
+
In Collection: DenseNet
|
264 |
+
Metadata:
|
265 |
+
FLOPs: 3641843200
|
266 |
+
Parameters: 7980000
|
267 |
+
File Size: 32342954
|
268 |
+
Architecture:
|
269 |
+
- 1x1 Convolution
|
270 |
+
- Average Pooling
|
271 |
+
- Batch Normalization
|
272 |
+
- Convolution
|
273 |
+
- Dense Block
|
274 |
+
- Dense Connections
|
275 |
+
- Dropout
|
276 |
+
- Max Pooling
|
277 |
+
- ReLU
|
278 |
+
- Softmax
|
279 |
+
Tasks:
|
280 |
+
- Image Classification
|
281 |
+
Training Techniques:
|
282 |
+
- SGD with Momentum
|
283 |
+
- Weight Decay
|
284 |
+
Training Data:
|
285 |
+
- ImageNet
|
286 |
+
ID: tv_densenet121
|
287 |
+
LR: 0.1
|
288 |
+
Epochs: 90
|
289 |
+
Crop Pct: '0.875'
|
290 |
+
LR Gamma: 0.1
|
291 |
+
Momentum: 0.9
|
292 |
+
Batch Size: 32
|
293 |
+
Image Size: '224'
|
294 |
+
LR Step Size: 30
|
295 |
+
Weight Decay: 0.0001
|
296 |
+
Interpolation: bicubic
|
297 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L379
|
298 |
+
Weights: https://download.pytorch.org/models/densenet121-a639ec97.pth
|
299 |
+
Results:
|
300 |
+
- Task: Image Classification
|
301 |
+
Dataset: ImageNet
|
302 |
+
Metrics:
|
303 |
+
Top 1 Accuracy: 74.74%
|
304 |
+
Top 5 Accuracy: 92.15%
|
305 |
+
-->
|
docs/models/.templates/models/dla.md
ADDED
@@ -0,0 +1,545 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Deep Layer Aggregation
|
2 |
+
|
3 |
+
Extending “shallow” skip connections, **Dense Layer Aggregation (DLA)** incorporates more depth and sharing. The authors introduce two structures for deep layer aggregation (DLA): iterative deep aggregation (IDA) and hierarchical deep aggregation (HDA). These structures are expressed through an architectural framework, independent of the choice of backbone, for compatibility with current and future networks.
|
4 |
+
|
5 |
+
IDA focuses on fusing resolutions and scales while HDA focuses on merging features from all modules and channels. IDA follows the base hierarchy to refine resolution and aggregate scale stage-bystage. HDA assembles its own hierarchy of tree-structured connections that cross and merge stages to aggregate different levels of representation.
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{yu2019deep,
|
17 |
+
title={Deep Layer Aggregation},
|
18 |
+
author={Fisher Yu and Dequan Wang and Evan Shelhamer and Trevor Darrell},
|
19 |
+
year={2019},
|
20 |
+
eprint={1707.06484},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: DLA
|
30 |
+
Paper:
|
31 |
+
Title: Deep Layer Aggregation
|
32 |
+
URL: https://paperswithcode.com/paper/deep-layer-aggregation
|
33 |
+
Models:
|
34 |
+
- Name: dla102
|
35 |
+
In Collection: DLA
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 7192952808
|
38 |
+
Parameters: 33270000
|
39 |
+
File Size: 135290579
|
40 |
+
Architecture:
|
41 |
+
- 1x1 Convolution
|
42 |
+
- Batch Normalization
|
43 |
+
- Convolution
|
44 |
+
- DLA Bottleneck Residual Block
|
45 |
+
- DLA Residual Block
|
46 |
+
- Global Average Pooling
|
47 |
+
- Max Pooling
|
48 |
+
- ReLU
|
49 |
+
- Residual Block
|
50 |
+
- Residual Connection
|
51 |
+
- Softmax
|
52 |
+
Tasks:
|
53 |
+
- Image Classification
|
54 |
+
Training Techniques:
|
55 |
+
- SGD with Momentum
|
56 |
+
- Weight Decay
|
57 |
+
Training Data:
|
58 |
+
- ImageNet
|
59 |
+
Training Resources: 8x GPUs
|
60 |
+
ID: dla102
|
61 |
+
LR: 0.1
|
62 |
+
Epochs: 120
|
63 |
+
Layers: 102
|
64 |
+
Crop Pct: '0.875'
|
65 |
+
Momentum: 0.9
|
66 |
+
Batch Size: 256
|
67 |
+
Image Size: '224'
|
68 |
+
Weight Decay: 0.0001
|
69 |
+
Interpolation: bilinear
|
70 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L410
|
71 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla102-d94d9790.pth
|
72 |
+
Results:
|
73 |
+
- Task: Image Classification
|
74 |
+
Dataset: ImageNet
|
75 |
+
Metrics:
|
76 |
+
Top 1 Accuracy: 78.03%
|
77 |
+
Top 5 Accuracy: 93.95%
|
78 |
+
- Name: dla102x
|
79 |
+
In Collection: DLA
|
80 |
+
Metadata:
|
81 |
+
FLOPs: 5886821352
|
82 |
+
Parameters: 26310000
|
83 |
+
File Size: 107552695
|
84 |
+
Architecture:
|
85 |
+
- 1x1 Convolution
|
86 |
+
- Batch Normalization
|
87 |
+
- Convolution
|
88 |
+
- DLA Bottleneck Residual Block
|
89 |
+
- DLA Residual Block
|
90 |
+
- Global Average Pooling
|
91 |
+
- Max Pooling
|
92 |
+
- ReLU
|
93 |
+
- Residual Block
|
94 |
+
- Residual Connection
|
95 |
+
- Softmax
|
96 |
+
Tasks:
|
97 |
+
- Image Classification
|
98 |
+
Training Techniques:
|
99 |
+
- SGD with Momentum
|
100 |
+
- Weight Decay
|
101 |
+
Training Data:
|
102 |
+
- ImageNet
|
103 |
+
Training Resources: 8x GPUs
|
104 |
+
ID: dla102x
|
105 |
+
LR: 0.1
|
106 |
+
Epochs: 120
|
107 |
+
Layers: 102
|
108 |
+
Crop Pct: '0.875'
|
109 |
+
Momentum: 0.9
|
110 |
+
Batch Size: 256
|
111 |
+
Image Size: '224'
|
112 |
+
Weight Decay: 0.0001
|
113 |
+
Interpolation: bilinear
|
114 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L418
|
115 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla102x-ad62be81.pth
|
116 |
+
Results:
|
117 |
+
- Task: Image Classification
|
118 |
+
Dataset: ImageNet
|
119 |
+
Metrics:
|
120 |
+
Top 1 Accuracy: 78.51%
|
121 |
+
Top 5 Accuracy: 94.23%
|
122 |
+
- Name: dla102x2
|
123 |
+
In Collection: DLA
|
124 |
+
Metadata:
|
125 |
+
FLOPs: 9343847400
|
126 |
+
Parameters: 41280000
|
127 |
+
File Size: 167645295
|
128 |
+
Architecture:
|
129 |
+
- 1x1 Convolution
|
130 |
+
- Batch Normalization
|
131 |
+
- Convolution
|
132 |
+
- DLA Bottleneck Residual Block
|
133 |
+
- DLA Residual Block
|
134 |
+
- Global Average Pooling
|
135 |
+
- Max Pooling
|
136 |
+
- ReLU
|
137 |
+
- Residual Block
|
138 |
+
- Residual Connection
|
139 |
+
- Softmax
|
140 |
+
Tasks:
|
141 |
+
- Image Classification
|
142 |
+
Training Techniques:
|
143 |
+
- SGD with Momentum
|
144 |
+
- Weight Decay
|
145 |
+
Training Data:
|
146 |
+
- ImageNet
|
147 |
+
Training Resources: 8x GPUs
|
148 |
+
ID: dla102x2
|
149 |
+
LR: 0.1
|
150 |
+
Epochs: 120
|
151 |
+
Layers: 102
|
152 |
+
Crop Pct: '0.875'
|
153 |
+
Momentum: 0.9
|
154 |
+
Batch Size: 256
|
155 |
+
Image Size: '224'
|
156 |
+
Weight Decay: 0.0001
|
157 |
+
Interpolation: bilinear
|
158 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L426
|
159 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla102x2-262837b6.pth
|
160 |
+
Results:
|
161 |
+
- Task: Image Classification
|
162 |
+
Dataset: ImageNet
|
163 |
+
Metrics:
|
164 |
+
Top 1 Accuracy: 79.44%
|
165 |
+
Top 5 Accuracy: 94.65%
|
166 |
+
- Name: dla169
|
167 |
+
In Collection: DLA
|
168 |
+
Metadata:
|
169 |
+
FLOPs: 11598004200
|
170 |
+
Parameters: 53390000
|
171 |
+
File Size: 216547113
|
172 |
+
Architecture:
|
173 |
+
- 1x1 Convolution
|
174 |
+
- Batch Normalization
|
175 |
+
- Convolution
|
176 |
+
- DLA Bottleneck Residual Block
|
177 |
+
- DLA Residual Block
|
178 |
+
- Global Average Pooling
|
179 |
+
- Max Pooling
|
180 |
+
- ReLU
|
181 |
+
- Residual Block
|
182 |
+
- Residual Connection
|
183 |
+
- Softmax
|
184 |
+
Tasks:
|
185 |
+
- Image Classification
|
186 |
+
Training Techniques:
|
187 |
+
- SGD with Momentum
|
188 |
+
- Weight Decay
|
189 |
+
Training Data:
|
190 |
+
- ImageNet
|
191 |
+
Training Resources: 8x GPUs
|
192 |
+
ID: dla169
|
193 |
+
LR: 0.1
|
194 |
+
Epochs: 120
|
195 |
+
Layers: 169
|
196 |
+
Crop Pct: '0.875'
|
197 |
+
Momentum: 0.9
|
198 |
+
Batch Size: 256
|
199 |
+
Image Size: '224'
|
200 |
+
Weight Decay: 0.0001
|
201 |
+
Interpolation: bilinear
|
202 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L434
|
203 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla169-0914e092.pth
|
204 |
+
Results:
|
205 |
+
- Task: Image Classification
|
206 |
+
Dataset: ImageNet
|
207 |
+
Metrics:
|
208 |
+
Top 1 Accuracy: 78.69%
|
209 |
+
Top 5 Accuracy: 94.33%
|
210 |
+
- Name: dla34
|
211 |
+
In Collection: DLA
|
212 |
+
Metadata:
|
213 |
+
FLOPs: 3070105576
|
214 |
+
Parameters: 15740000
|
215 |
+
File Size: 63228658
|
216 |
+
Architecture:
|
217 |
+
- 1x1 Convolution
|
218 |
+
- Batch Normalization
|
219 |
+
- Convolution
|
220 |
+
- DLA Bottleneck Residual Block
|
221 |
+
- DLA Residual Block
|
222 |
+
- Global Average Pooling
|
223 |
+
- Max Pooling
|
224 |
+
- ReLU
|
225 |
+
- Residual Block
|
226 |
+
- Residual Connection
|
227 |
+
- Softmax
|
228 |
+
Tasks:
|
229 |
+
- Image Classification
|
230 |
+
Training Techniques:
|
231 |
+
- SGD with Momentum
|
232 |
+
- Weight Decay
|
233 |
+
Training Data:
|
234 |
+
- ImageNet
|
235 |
+
ID: dla34
|
236 |
+
LR: 0.1
|
237 |
+
Epochs: 120
|
238 |
+
Layers: 32
|
239 |
+
Crop Pct: '0.875'
|
240 |
+
Momentum: 0.9
|
241 |
+
Batch Size: 256
|
242 |
+
Image Size: '224'
|
243 |
+
Weight Decay: 0.0001
|
244 |
+
Interpolation: bilinear
|
245 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L362
|
246 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla34-ba72cf86.pth
|
247 |
+
Results:
|
248 |
+
- Task: Image Classification
|
249 |
+
Dataset: ImageNet
|
250 |
+
Metrics:
|
251 |
+
Top 1 Accuracy: 74.62%
|
252 |
+
Top 5 Accuracy: 92.06%
|
253 |
+
- Name: dla46_c
|
254 |
+
In Collection: DLA
|
255 |
+
Metadata:
|
256 |
+
FLOPs: 583277288
|
257 |
+
Parameters: 1300000
|
258 |
+
File Size: 5307963
|
259 |
+
Architecture:
|
260 |
+
- 1x1 Convolution
|
261 |
+
- Batch Normalization
|
262 |
+
- Convolution
|
263 |
+
- DLA Bottleneck Residual Block
|
264 |
+
- DLA Residual Block
|
265 |
+
- Global Average Pooling
|
266 |
+
- Max Pooling
|
267 |
+
- ReLU
|
268 |
+
- Residual Block
|
269 |
+
- Residual Connection
|
270 |
+
- Softmax
|
271 |
+
Tasks:
|
272 |
+
- Image Classification
|
273 |
+
Training Techniques:
|
274 |
+
- SGD with Momentum
|
275 |
+
- Weight Decay
|
276 |
+
Training Data:
|
277 |
+
- ImageNet
|
278 |
+
ID: dla46_c
|
279 |
+
LR: 0.1
|
280 |
+
Epochs: 120
|
281 |
+
Layers: 46
|
282 |
+
Crop Pct: '0.875'
|
283 |
+
Momentum: 0.9
|
284 |
+
Batch Size: 256
|
285 |
+
Image Size: '224'
|
286 |
+
Weight Decay: 0.0001
|
287 |
+
Interpolation: bilinear
|
288 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L369
|
289 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla46_c-2bfd52c3.pth
|
290 |
+
Results:
|
291 |
+
- Task: Image Classification
|
292 |
+
Dataset: ImageNet
|
293 |
+
Metrics:
|
294 |
+
Top 1 Accuracy: 64.87%
|
295 |
+
Top 5 Accuracy: 86.29%
|
296 |
+
- Name: dla46x_c
|
297 |
+
In Collection: DLA
|
298 |
+
Metadata:
|
299 |
+
FLOPs: 544052200
|
300 |
+
Parameters: 1070000
|
301 |
+
File Size: 4387641
|
302 |
+
Architecture:
|
303 |
+
- 1x1 Convolution
|
304 |
+
- Batch Normalization
|
305 |
+
- Convolution
|
306 |
+
- DLA Bottleneck Residual Block
|
307 |
+
- DLA Residual Block
|
308 |
+
- Global Average Pooling
|
309 |
+
- Max Pooling
|
310 |
+
- ReLU
|
311 |
+
- Residual Block
|
312 |
+
- Residual Connection
|
313 |
+
- Softmax
|
314 |
+
Tasks:
|
315 |
+
- Image Classification
|
316 |
+
Training Techniques:
|
317 |
+
- SGD with Momentum
|
318 |
+
- Weight Decay
|
319 |
+
Training Data:
|
320 |
+
- ImageNet
|
321 |
+
ID: dla46x_c
|
322 |
+
LR: 0.1
|
323 |
+
Epochs: 120
|
324 |
+
Layers: 46
|
325 |
+
Crop Pct: '0.875'
|
326 |
+
Momentum: 0.9
|
327 |
+
Batch Size: 256
|
328 |
+
Image Size: '224'
|
329 |
+
Weight Decay: 0.0001
|
330 |
+
Interpolation: bilinear
|
331 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L378
|
332 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla46x_c-d761bae7.pth
|
333 |
+
Results:
|
334 |
+
- Task: Image Classification
|
335 |
+
Dataset: ImageNet
|
336 |
+
Metrics:
|
337 |
+
Top 1 Accuracy: 65.98%
|
338 |
+
Top 5 Accuracy: 86.99%
|
339 |
+
- Name: dla60
|
340 |
+
In Collection: DLA
|
341 |
+
Metadata:
|
342 |
+
FLOPs: 4256251880
|
343 |
+
Parameters: 22040000
|
344 |
+
File Size: 89560235
|
345 |
+
Architecture:
|
346 |
+
- 1x1 Convolution
|
347 |
+
- Batch Normalization
|
348 |
+
- Convolution
|
349 |
+
- DLA Bottleneck Residual Block
|
350 |
+
- DLA Residual Block
|
351 |
+
- Global Average Pooling
|
352 |
+
- Max Pooling
|
353 |
+
- ReLU
|
354 |
+
- Residual Block
|
355 |
+
- Residual Connection
|
356 |
+
- Softmax
|
357 |
+
Tasks:
|
358 |
+
- Image Classification
|
359 |
+
Training Techniques:
|
360 |
+
- SGD with Momentum
|
361 |
+
- Weight Decay
|
362 |
+
Training Data:
|
363 |
+
- ImageNet
|
364 |
+
ID: dla60
|
365 |
+
LR: 0.1
|
366 |
+
Epochs: 120
|
367 |
+
Layers: 60
|
368 |
+
Dropout: 0.2
|
369 |
+
Crop Pct: '0.875'
|
370 |
+
Momentum: 0.9
|
371 |
+
Batch Size: 256
|
372 |
+
Image Size: '224'
|
373 |
+
Weight Decay: 0.0001
|
374 |
+
Interpolation: bilinear
|
375 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L394
|
376 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla60-24839fc4.pth
|
377 |
+
Results:
|
378 |
+
- Task: Image Classification
|
379 |
+
Dataset: ImageNet
|
380 |
+
Metrics:
|
381 |
+
Top 1 Accuracy: 77.04%
|
382 |
+
Top 5 Accuracy: 93.32%
|
383 |
+
- Name: dla60_res2net
|
384 |
+
In Collection: DLA
|
385 |
+
Metadata:
|
386 |
+
FLOPs: 4147578504
|
387 |
+
Parameters: 20850000
|
388 |
+
File Size: 84886593
|
389 |
+
Architecture:
|
390 |
+
- 1x1 Convolution
|
391 |
+
- Batch Normalization
|
392 |
+
- Convolution
|
393 |
+
- DLA Bottleneck Residual Block
|
394 |
+
- DLA Residual Block
|
395 |
+
- Global Average Pooling
|
396 |
+
- Max Pooling
|
397 |
+
- ReLU
|
398 |
+
- Residual Block
|
399 |
+
- Residual Connection
|
400 |
+
- Softmax
|
401 |
+
Tasks:
|
402 |
+
- Image Classification
|
403 |
+
Training Techniques:
|
404 |
+
- SGD with Momentum
|
405 |
+
- Weight Decay
|
406 |
+
Training Data:
|
407 |
+
- ImageNet
|
408 |
+
ID: dla60_res2net
|
409 |
+
Layers: 60
|
410 |
+
Crop Pct: '0.875'
|
411 |
+
Image Size: '224'
|
412 |
+
Interpolation: bilinear
|
413 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L346
|
414 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-res2net/res2net_dla60_4s-d88db7f9.pth
|
415 |
+
Results:
|
416 |
+
- Task: Image Classification
|
417 |
+
Dataset: ImageNet
|
418 |
+
Metrics:
|
419 |
+
Top 1 Accuracy: 78.46%
|
420 |
+
Top 5 Accuracy: 94.21%
|
421 |
+
- Name: dla60_res2next
|
422 |
+
In Collection: DLA
|
423 |
+
Metadata:
|
424 |
+
FLOPs: 3485335272
|
425 |
+
Parameters: 17030000
|
426 |
+
File Size: 69639245
|
427 |
+
Architecture:
|
428 |
+
- 1x1 Convolution
|
429 |
+
- Batch Normalization
|
430 |
+
- Convolution
|
431 |
+
- DLA Bottleneck Residual Block
|
432 |
+
- DLA Residual Block
|
433 |
+
- Global Average Pooling
|
434 |
+
- Max Pooling
|
435 |
+
- ReLU
|
436 |
+
- Residual Block
|
437 |
+
- Residual Connection
|
438 |
+
- Softmax
|
439 |
+
Tasks:
|
440 |
+
- Image Classification
|
441 |
+
Training Techniques:
|
442 |
+
- SGD with Momentum
|
443 |
+
- Weight Decay
|
444 |
+
Training Data:
|
445 |
+
- ImageNet
|
446 |
+
ID: dla60_res2next
|
447 |
+
Layers: 60
|
448 |
+
Crop Pct: '0.875'
|
449 |
+
Image Size: '224'
|
450 |
+
Interpolation: bilinear
|
451 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L354
|
452 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-res2net/res2next_dla60_4s-d327927b.pth
|
453 |
+
Results:
|
454 |
+
- Task: Image Classification
|
455 |
+
Dataset: ImageNet
|
456 |
+
Metrics:
|
457 |
+
Top 1 Accuracy: 78.44%
|
458 |
+
Top 5 Accuracy: 94.16%
|
459 |
+
- Name: dla60x
|
460 |
+
In Collection: DLA
|
461 |
+
Metadata:
|
462 |
+
FLOPs: 3544204264
|
463 |
+
Parameters: 17350000
|
464 |
+
File Size: 70883139
|
465 |
+
Architecture:
|
466 |
+
- 1x1 Convolution
|
467 |
+
- Batch Normalization
|
468 |
+
- Convolution
|
469 |
+
- DLA Bottleneck Residual Block
|
470 |
+
- DLA Residual Block
|
471 |
+
- Global Average Pooling
|
472 |
+
- Max Pooling
|
473 |
+
- ReLU
|
474 |
+
- Residual Block
|
475 |
+
- Residual Connection
|
476 |
+
- Softmax
|
477 |
+
Tasks:
|
478 |
+
- Image Classification
|
479 |
+
Training Techniques:
|
480 |
+
- SGD with Momentum
|
481 |
+
- Weight Decay
|
482 |
+
Training Data:
|
483 |
+
- ImageNet
|
484 |
+
ID: dla60x
|
485 |
+
LR: 0.1
|
486 |
+
Epochs: 120
|
487 |
+
Layers: 60
|
488 |
+
Crop Pct: '0.875'
|
489 |
+
Momentum: 0.9
|
490 |
+
Batch Size: 256
|
491 |
+
Image Size: '224'
|
492 |
+
Weight Decay: 0.0001
|
493 |
+
Interpolation: bilinear
|
494 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L402
|
495 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla60x-d15cacda.pth
|
496 |
+
Results:
|
497 |
+
- Task: Image Classification
|
498 |
+
Dataset: ImageNet
|
499 |
+
Metrics:
|
500 |
+
Top 1 Accuracy: 78.25%
|
501 |
+
Top 5 Accuracy: 94.02%
|
502 |
+
- Name: dla60x_c
|
503 |
+
In Collection: DLA
|
504 |
+
Metadata:
|
505 |
+
FLOPs: 593325032
|
506 |
+
Parameters: 1320000
|
507 |
+
File Size: 5454396
|
508 |
+
Architecture:
|
509 |
+
- 1x1 Convolution
|
510 |
+
- Batch Normalization
|
511 |
+
- Convolution
|
512 |
+
- DLA Bottleneck Residual Block
|
513 |
+
- DLA Residual Block
|
514 |
+
- Global Average Pooling
|
515 |
+
- Max Pooling
|
516 |
+
- ReLU
|
517 |
+
- Residual Block
|
518 |
+
- Residual Connection
|
519 |
+
- Softmax
|
520 |
+
Tasks:
|
521 |
+
- Image Classification
|
522 |
+
Training Techniques:
|
523 |
+
- SGD with Momentum
|
524 |
+
- Weight Decay
|
525 |
+
Training Data:
|
526 |
+
- ImageNet
|
527 |
+
ID: dla60x_c
|
528 |
+
LR: 0.1
|
529 |
+
Epochs: 120
|
530 |
+
Layers: 60
|
531 |
+
Crop Pct: '0.875'
|
532 |
+
Momentum: 0.9
|
533 |
+
Batch Size: 256
|
534 |
+
Image Size: '224'
|
535 |
+
Weight Decay: 0.0001
|
536 |
+
Interpolation: bilinear
|
537 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L386
|
538 |
+
Weights: http://dl.yf.io/dla/models/imagenet/dla60x_c-b870c45c.pth
|
539 |
+
Results:
|
540 |
+
- Task: Image Classification
|
541 |
+
Dataset: ImageNet
|
542 |
+
Metrics:
|
543 |
+
Top 1 Accuracy: 67.91%
|
544 |
+
Top 5 Accuracy: 88.42%
|
545 |
+
-->
|
docs/models/.templates/models/dpn.md
ADDED
@@ -0,0 +1,256 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Dual Path Network (DPN)
|
2 |
+
|
3 |
+
A **Dual Path Network (DPN)** is a convolutional neural network which presents a new topology of connection paths internally. The intuition is that [ResNets](https://paperswithcode.com/method/resnet) enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures.
|
4 |
+
|
5 |
+
The principal building block is an [DPN Block](https://paperswithcode.com/method/dpn-block).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{chen2017dual,
|
17 |
+
title={Dual Path Networks},
|
18 |
+
author={Yunpeng Chen and Jianan Li and Huaxin Xiao and Xiaojie Jin and Shuicheng Yan and Jiashi Feng},
|
19 |
+
year={2017},
|
20 |
+
eprint={1707.01629},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: DPN
|
30 |
+
Paper:
|
31 |
+
Title: Dual Path Networks
|
32 |
+
URL: https://paperswithcode.com/paper/dual-path-networks
|
33 |
+
Models:
|
34 |
+
- Name: dpn107
|
35 |
+
In Collection: DPN
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 23524280296
|
38 |
+
Parameters: 86920000
|
39 |
+
File Size: 348612331
|
40 |
+
Architecture:
|
41 |
+
- Batch Normalization
|
42 |
+
- Convolution
|
43 |
+
- DPN Block
|
44 |
+
- Dense Connections
|
45 |
+
- Global Average Pooling
|
46 |
+
- Max Pooling
|
47 |
+
- Softmax
|
48 |
+
Tasks:
|
49 |
+
- Image Classification
|
50 |
+
Training Techniques:
|
51 |
+
- SGD with Momentum
|
52 |
+
- Weight Decay
|
53 |
+
Training Data:
|
54 |
+
- ImageNet
|
55 |
+
Training Resources: 40x K80 GPUs
|
56 |
+
ID: dpn107
|
57 |
+
LR: 0.316
|
58 |
+
Layers: 107
|
59 |
+
Crop Pct: '0.875'
|
60 |
+
Batch Size: 1280
|
61 |
+
Image Size: '224'
|
62 |
+
Interpolation: bicubic
|
63 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L310
|
64 |
+
Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn107_extra-1ac7121e2.pth
|
65 |
+
Results:
|
66 |
+
- Task: Image Classification
|
67 |
+
Dataset: ImageNet
|
68 |
+
Metrics:
|
69 |
+
Top 1 Accuracy: 80.16%
|
70 |
+
Top 5 Accuracy: 94.91%
|
71 |
+
- Name: dpn131
|
72 |
+
In Collection: DPN
|
73 |
+
Metadata:
|
74 |
+
FLOPs: 20586274792
|
75 |
+
Parameters: 79250000
|
76 |
+
File Size: 318016207
|
77 |
+
Architecture:
|
78 |
+
- Batch Normalization
|
79 |
+
- Convolution
|
80 |
+
- DPN Block
|
81 |
+
- Dense Connections
|
82 |
+
- Global Average Pooling
|
83 |
+
- Max Pooling
|
84 |
+
- Softmax
|
85 |
+
Tasks:
|
86 |
+
- Image Classification
|
87 |
+
Training Techniques:
|
88 |
+
- SGD with Momentum
|
89 |
+
- Weight Decay
|
90 |
+
Training Data:
|
91 |
+
- ImageNet
|
92 |
+
Training Resources: 40x K80 GPUs
|
93 |
+
ID: dpn131
|
94 |
+
LR: 0.316
|
95 |
+
Layers: 131
|
96 |
+
Crop Pct: '0.875'
|
97 |
+
Batch Size: 960
|
98 |
+
Image Size: '224'
|
99 |
+
Interpolation: bicubic
|
100 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L302
|
101 |
+
Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn131-71dfe43e0.pth
|
102 |
+
Results:
|
103 |
+
- Task: Image Classification
|
104 |
+
Dataset: ImageNet
|
105 |
+
Metrics:
|
106 |
+
Top 1 Accuracy: 79.83%
|
107 |
+
Top 5 Accuracy: 94.71%
|
108 |
+
- Name: dpn68
|
109 |
+
In Collection: DPN
|
110 |
+
Metadata:
|
111 |
+
FLOPs: 2990567880
|
112 |
+
Parameters: 12610000
|
113 |
+
File Size: 50761994
|
114 |
+
Architecture:
|
115 |
+
- Batch Normalization
|
116 |
+
- Convolution
|
117 |
+
- DPN Block
|
118 |
+
- Dense Connections
|
119 |
+
- Global Average Pooling
|
120 |
+
- Max Pooling
|
121 |
+
- Softmax
|
122 |
+
Tasks:
|
123 |
+
- Image Classification
|
124 |
+
Training Techniques:
|
125 |
+
- SGD with Momentum
|
126 |
+
- Weight Decay
|
127 |
+
Training Data:
|
128 |
+
- ImageNet
|
129 |
+
Training Resources: 40x K80 GPUs
|
130 |
+
ID: dpn68
|
131 |
+
LR: 0.316
|
132 |
+
Layers: 68
|
133 |
+
Crop Pct: '0.875'
|
134 |
+
Batch Size: 1280
|
135 |
+
Image Size: '224'
|
136 |
+
Interpolation: bicubic
|
137 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L270
|
138 |
+
Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn68-66bebafa7.pth
|
139 |
+
Results:
|
140 |
+
- Task: Image Classification
|
141 |
+
Dataset: ImageNet
|
142 |
+
Metrics:
|
143 |
+
Top 1 Accuracy: 76.31%
|
144 |
+
Top 5 Accuracy: 92.97%
|
145 |
+
- Name: dpn68b
|
146 |
+
In Collection: DPN
|
147 |
+
Metadata:
|
148 |
+
FLOPs: 2990567880
|
149 |
+
Parameters: 12610000
|
150 |
+
File Size: 50781025
|
151 |
+
Architecture:
|
152 |
+
- Batch Normalization
|
153 |
+
- Convolution
|
154 |
+
- DPN Block
|
155 |
+
- Dense Connections
|
156 |
+
- Global Average Pooling
|
157 |
+
- Max Pooling
|
158 |
+
- Softmax
|
159 |
+
Tasks:
|
160 |
+
- Image Classification
|
161 |
+
Training Techniques:
|
162 |
+
- SGD with Momentum
|
163 |
+
- Weight Decay
|
164 |
+
Training Data:
|
165 |
+
- ImageNet
|
166 |
+
Training Resources: 40x K80 GPUs
|
167 |
+
ID: dpn68b
|
168 |
+
LR: 0.316
|
169 |
+
Layers: 68
|
170 |
+
Crop Pct: '0.875'
|
171 |
+
Batch Size: 1280
|
172 |
+
Image Size: '224'
|
173 |
+
Interpolation: bicubic
|
174 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L278
|
175 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/dpn68b_ra-a31ca160.pth
|
176 |
+
Results:
|
177 |
+
- Task: Image Classification
|
178 |
+
Dataset: ImageNet
|
179 |
+
Metrics:
|
180 |
+
Top 1 Accuracy: 79.21%
|
181 |
+
Top 5 Accuracy: 94.42%
|
182 |
+
- Name: dpn92
|
183 |
+
In Collection: DPN
|
184 |
+
Metadata:
|
185 |
+
FLOPs: 8357659624
|
186 |
+
Parameters: 37670000
|
187 |
+
File Size: 151248422
|
188 |
+
Architecture:
|
189 |
+
- Batch Normalization
|
190 |
+
- Convolution
|
191 |
+
- DPN Block
|
192 |
+
- Dense Connections
|
193 |
+
- Global Average Pooling
|
194 |
+
- Max Pooling
|
195 |
+
- Softmax
|
196 |
+
Tasks:
|
197 |
+
- Image Classification
|
198 |
+
Training Techniques:
|
199 |
+
- SGD with Momentum
|
200 |
+
- Weight Decay
|
201 |
+
Training Data:
|
202 |
+
- ImageNet
|
203 |
+
Training Resources: 40x K80 GPUs
|
204 |
+
ID: dpn92
|
205 |
+
LR: 0.316
|
206 |
+
Layers: 92
|
207 |
+
Crop Pct: '0.875'
|
208 |
+
Batch Size: 1280
|
209 |
+
Image Size: '224'
|
210 |
+
Interpolation: bicubic
|
211 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L286
|
212 |
+
Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn92_extra-b040e4a9b.pth
|
213 |
+
Results:
|
214 |
+
- Task: Image Classification
|
215 |
+
Dataset: ImageNet
|
216 |
+
Metrics:
|
217 |
+
Top 1 Accuracy: 79.99%
|
218 |
+
Top 5 Accuracy: 94.84%
|
219 |
+
- Name: dpn98
|
220 |
+
In Collection: DPN
|
221 |
+
Metadata:
|
222 |
+
FLOPs: 15003675112
|
223 |
+
Parameters: 61570000
|
224 |
+
File Size: 247021307
|
225 |
+
Architecture:
|
226 |
+
- Batch Normalization
|
227 |
+
- Convolution
|
228 |
+
- DPN Block
|
229 |
+
- Dense Connections
|
230 |
+
- Global Average Pooling
|
231 |
+
- Max Pooling
|
232 |
+
- Softmax
|
233 |
+
Tasks:
|
234 |
+
- Image Classification
|
235 |
+
Training Techniques:
|
236 |
+
- SGD with Momentum
|
237 |
+
- Weight Decay
|
238 |
+
Training Data:
|
239 |
+
- ImageNet
|
240 |
+
Training Resources: 40x K80 GPUs
|
241 |
+
ID: dpn98
|
242 |
+
LR: 0.4
|
243 |
+
Layers: 98
|
244 |
+
Crop Pct: '0.875'
|
245 |
+
Batch Size: 1280
|
246 |
+
Image Size: '224'
|
247 |
+
Interpolation: bicubic
|
248 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L294
|
249 |
+
Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn98-5b90dec4d.pth
|
250 |
+
Results:
|
251 |
+
- Task: Image Classification
|
252 |
+
Dataset: ImageNet
|
253 |
+
Metrics:
|
254 |
+
Top 1 Accuracy: 79.65%
|
255 |
+
Top 5 Accuracy: 94.61%
|
256 |
+
-->
|
docs/models/.templates/models/ecaresnet.md
ADDED
@@ -0,0 +1,236 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ECA-ResNet
|
2 |
+
|
3 |
+
An **ECA ResNet** is a variant on a [ResNet](https://paperswithcode.com/method/resnet) that utilises an [Efficient Channel Attention module](https://paperswithcode.com/method/efficient-channel-attention). Efficient Channel Attention is an architectural unit based on [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block) that reduces model complexity without dimensionality reduction.
|
4 |
+
|
5 |
+
{% include 'code_snippets.md' %}
|
6 |
+
|
7 |
+
## How do I train this model?
|
8 |
+
|
9 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
10 |
+
|
11 |
+
## Citation
|
12 |
+
|
13 |
+
```BibTeX
|
14 |
+
@misc{wang2020ecanet,
|
15 |
+
title={ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks},
|
16 |
+
author={Qilong Wang and Banggu Wu and Pengfei Zhu and Peihua Li and Wangmeng Zuo and Qinghua Hu},
|
17 |
+
year={2020},
|
18 |
+
eprint={1910.03151},
|
19 |
+
archivePrefix={arXiv},
|
20 |
+
primaryClass={cs.CV}
|
21 |
+
}
|
22 |
+
```
|
23 |
+
|
24 |
+
<!--
|
25 |
+
Type: model-index
|
26 |
+
Collections:
|
27 |
+
- Name: ECAResNet
|
28 |
+
Paper:
|
29 |
+
Title: 'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'
|
30 |
+
URL: https://paperswithcode.com/paper/eca-net-efficient-channel-attention-for-deep
|
31 |
+
Models:
|
32 |
+
- Name: ecaresnet101d
|
33 |
+
In Collection: ECAResNet
|
34 |
+
Metadata:
|
35 |
+
FLOPs: 10377193728
|
36 |
+
Parameters: 44570000
|
37 |
+
File Size: 178815067
|
38 |
+
Architecture:
|
39 |
+
- 1x1 Convolution
|
40 |
+
- Batch Normalization
|
41 |
+
- Bottleneck Residual Block
|
42 |
+
- Convolution
|
43 |
+
- Efficient Channel Attention
|
44 |
+
- Global Average Pooling
|
45 |
+
- Max Pooling
|
46 |
+
- ReLU
|
47 |
+
- Residual Block
|
48 |
+
- Residual Connection
|
49 |
+
- Softmax
|
50 |
+
- Squeeze-and-Excitation Block
|
51 |
+
Tasks:
|
52 |
+
- Image Classification
|
53 |
+
Training Techniques:
|
54 |
+
- SGD with Momentum
|
55 |
+
- Weight Decay
|
56 |
+
Training Data:
|
57 |
+
- ImageNet
|
58 |
+
Training Resources: 4x RTX 2080Ti GPUs
|
59 |
+
ID: ecaresnet101d
|
60 |
+
LR: 0.1
|
61 |
+
Epochs: 100
|
62 |
+
Layers: 101
|
63 |
+
Crop Pct: '0.875'
|
64 |
+
Batch Size: 256
|
65 |
+
Image Size: '224'
|
66 |
+
Weight Decay: 0.0001
|
67 |
+
Interpolation: bicubic
|
68 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1087
|
69 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45402/outputs/ECAResNet101D_281c5844.pth
|
70 |
+
Results:
|
71 |
+
- Task: Image Classification
|
72 |
+
Dataset: ImageNet
|
73 |
+
Metrics:
|
74 |
+
Top 1 Accuracy: 82.18%
|
75 |
+
Top 5 Accuracy: 96.06%
|
76 |
+
- Name: ecaresnet101d_pruned
|
77 |
+
In Collection: ECAResNet
|
78 |
+
Metadata:
|
79 |
+
FLOPs: 4463972081
|
80 |
+
Parameters: 24880000
|
81 |
+
File Size: 99852736
|
82 |
+
Architecture:
|
83 |
+
- 1x1 Convolution
|
84 |
+
- Batch Normalization
|
85 |
+
- Bottleneck Residual Block
|
86 |
+
- Convolution
|
87 |
+
- Efficient Channel Attention
|
88 |
+
- Global Average Pooling
|
89 |
+
- Max Pooling
|
90 |
+
- ReLU
|
91 |
+
- Residual Block
|
92 |
+
- Residual Connection
|
93 |
+
- Softmax
|
94 |
+
- Squeeze-and-Excitation Block
|
95 |
+
Tasks:
|
96 |
+
- Image Classification
|
97 |
+
Training Techniques:
|
98 |
+
- SGD with Momentum
|
99 |
+
- Weight Decay
|
100 |
+
Training Data:
|
101 |
+
- ImageNet
|
102 |
+
ID: ecaresnet101d_pruned
|
103 |
+
Layers: 101
|
104 |
+
Crop Pct: '0.875'
|
105 |
+
Image Size: '224'
|
106 |
+
Interpolation: bicubic
|
107 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1097
|
108 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45610/outputs/ECAResNet101D_P_75a3370e.pth
|
109 |
+
Results:
|
110 |
+
- Task: Image Classification
|
111 |
+
Dataset: ImageNet
|
112 |
+
Metrics:
|
113 |
+
Top 1 Accuracy: 80.82%
|
114 |
+
Top 5 Accuracy: 95.64%
|
115 |
+
- Name: ecaresnet50d
|
116 |
+
In Collection: ECAResNet
|
117 |
+
Metadata:
|
118 |
+
FLOPs: 5591090432
|
119 |
+
Parameters: 25580000
|
120 |
+
File Size: 102579290
|
121 |
+
Architecture:
|
122 |
+
- 1x1 Convolution
|
123 |
+
- Batch Normalization
|
124 |
+
- Bottleneck Residual Block
|
125 |
+
- Convolution
|
126 |
+
- Efficient Channel Attention
|
127 |
+
- Global Average Pooling
|
128 |
+
- Max Pooling
|
129 |
+
- ReLU
|
130 |
+
- Residual Block
|
131 |
+
- Residual Connection
|
132 |
+
- Softmax
|
133 |
+
- Squeeze-and-Excitation Block
|
134 |
+
Tasks:
|
135 |
+
- Image Classification
|
136 |
+
Training Techniques:
|
137 |
+
- SGD with Momentum
|
138 |
+
- Weight Decay
|
139 |
+
Training Data:
|
140 |
+
- ImageNet
|
141 |
+
Training Resources: 4x RTX 2080Ti GPUs
|
142 |
+
ID: ecaresnet50d
|
143 |
+
LR: 0.1
|
144 |
+
Epochs: 100
|
145 |
+
Layers: 50
|
146 |
+
Crop Pct: '0.875'
|
147 |
+
Batch Size: 256
|
148 |
+
Image Size: '224'
|
149 |
+
Weight Decay: 0.0001
|
150 |
+
Interpolation: bicubic
|
151 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1045
|
152 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45402/outputs/ECAResNet50D_833caf58.pth
|
153 |
+
Results:
|
154 |
+
- Task: Image Classification
|
155 |
+
Dataset: ImageNet
|
156 |
+
Metrics:
|
157 |
+
Top 1 Accuracy: 80.61%
|
158 |
+
Top 5 Accuracy: 95.31%
|
159 |
+
- Name: ecaresnet50d_pruned
|
160 |
+
In Collection: ECAResNet
|
161 |
+
Metadata:
|
162 |
+
FLOPs: 3250730657
|
163 |
+
Parameters: 19940000
|
164 |
+
File Size: 79990436
|
165 |
+
Architecture:
|
166 |
+
- 1x1 Convolution
|
167 |
+
- Batch Normalization
|
168 |
+
- Bottleneck Residual Block
|
169 |
+
- Convolution
|
170 |
+
- Efficient Channel Attention
|
171 |
+
- Global Average Pooling
|
172 |
+
- Max Pooling
|
173 |
+
- ReLU
|
174 |
+
- Residual Block
|
175 |
+
- Residual Connection
|
176 |
+
- Softmax
|
177 |
+
- Squeeze-and-Excitation Block
|
178 |
+
Tasks:
|
179 |
+
- Image Classification
|
180 |
+
Training Techniques:
|
181 |
+
- SGD with Momentum
|
182 |
+
- Weight Decay
|
183 |
+
Training Data:
|
184 |
+
- ImageNet
|
185 |
+
ID: ecaresnet50d_pruned
|
186 |
+
Layers: 50
|
187 |
+
Crop Pct: '0.875'
|
188 |
+
Image Size: '224'
|
189 |
+
Interpolation: bicubic
|
190 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1055
|
191 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45899/outputs/ECAResNet50D_P_9c67f710.pth
|
192 |
+
Results:
|
193 |
+
- Task: Image Classification
|
194 |
+
Dataset: ImageNet
|
195 |
+
Metrics:
|
196 |
+
Top 1 Accuracy: 79.71%
|
197 |
+
Top 5 Accuracy: 94.88%
|
198 |
+
- Name: ecaresnetlight
|
199 |
+
In Collection: ECAResNet
|
200 |
+
Metadata:
|
201 |
+
FLOPs: 5276118784
|
202 |
+
Parameters: 30160000
|
203 |
+
File Size: 120956612
|
204 |
+
Architecture:
|
205 |
+
- 1x1 Convolution
|
206 |
+
- Batch Normalization
|
207 |
+
- Bottleneck Residual Block
|
208 |
+
- Convolution
|
209 |
+
- Efficient Channel Attention
|
210 |
+
- Global Average Pooling
|
211 |
+
- Max Pooling
|
212 |
+
- ReLU
|
213 |
+
- Residual Block
|
214 |
+
- Residual Connection
|
215 |
+
- Softmax
|
216 |
+
- Squeeze-and-Excitation Block
|
217 |
+
Tasks:
|
218 |
+
- Image Classification
|
219 |
+
Training Techniques:
|
220 |
+
- SGD with Momentum
|
221 |
+
- Weight Decay
|
222 |
+
Training Data:
|
223 |
+
- ImageNet
|
224 |
+
ID: ecaresnetlight
|
225 |
+
Crop Pct: '0.875'
|
226 |
+
Image Size: '224'
|
227 |
+
Interpolation: bicubic
|
228 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1077
|
229 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45402/outputs/ECAResNetLight_4f34b35b.pth
|
230 |
+
Results:
|
231 |
+
- Task: Image Classification
|
232 |
+
Dataset: ImageNet
|
233 |
+
Metrics:
|
234 |
+
Top 1 Accuracy: 80.46%
|
235 |
+
Top 5 Accuracy: 95.25%
|
236 |
+
-->
|
docs/models/.templates/models/efficientnet-pruned.md
ADDED
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# EfficientNet (Knapsack Pruned)
|
2 |
+
|
3 |
+
**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$, width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a principled way.
|
4 |
+
|
5 |
+
The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
|
6 |
+
|
7 |
+
The base EfficientNet-B0 network is based on the inverted bottleneck residual blocks of [MobileNetV2](https://paperswithcode.com/method/mobilenetv2), in addition to [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block).
|
8 |
+
|
9 |
+
This collection consists of pruned EfficientNet models.
|
10 |
+
|
11 |
+
{% include 'code_snippets.md' %}
|
12 |
+
|
13 |
+
## How do I train this model?
|
14 |
+
|
15 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
16 |
+
|
17 |
+
## Citation
|
18 |
+
|
19 |
+
```BibTeX
|
20 |
+
@misc{tan2020efficientnet,
|
21 |
+
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
|
22 |
+
author={Mingxing Tan and Quoc V. Le},
|
23 |
+
year={2020},
|
24 |
+
eprint={1905.11946},
|
25 |
+
archivePrefix={arXiv},
|
26 |
+
primaryClass={cs.LG}
|
27 |
+
}
|
28 |
+
```
|
29 |
+
|
30 |
+
```
|
31 |
+
@misc{aflalo2020knapsack,
|
32 |
+
title={Knapsack Pruning with Inner Distillation},
|
33 |
+
author={Yonathan Aflalo and Asaf Noy and Ming Lin and Itamar Friedman and Lihi Zelnik},
|
34 |
+
year={2020},
|
35 |
+
eprint={2002.08258},
|
36 |
+
archivePrefix={arXiv},
|
37 |
+
primaryClass={cs.LG}
|
38 |
+
}
|
39 |
+
```
|
40 |
+
|
41 |
+
<!--
|
42 |
+
Type: model-index
|
43 |
+
Collections:
|
44 |
+
- Name: EfficientNet Pruned
|
45 |
+
Paper:
|
46 |
+
Title: Knapsack Pruning with Inner Distillation
|
47 |
+
URL: https://paperswithcode.com/paper/knapsack-pruning-with-inner-distillation
|
48 |
+
Models:
|
49 |
+
- Name: efficientnet_b1_pruned
|
50 |
+
In Collection: EfficientNet Pruned
|
51 |
+
Metadata:
|
52 |
+
FLOPs: 489653114
|
53 |
+
Parameters: 6330000
|
54 |
+
File Size: 25595162
|
55 |
+
Architecture:
|
56 |
+
- 1x1 Convolution
|
57 |
+
- Average Pooling
|
58 |
+
- Batch Normalization
|
59 |
+
- Convolution
|
60 |
+
- Dense Connections
|
61 |
+
- Dropout
|
62 |
+
- Inverted Residual Block
|
63 |
+
- Squeeze-and-Excitation Block
|
64 |
+
- Swish
|
65 |
+
Tasks:
|
66 |
+
- Image Classification
|
67 |
+
Training Data:
|
68 |
+
- ImageNet
|
69 |
+
ID: efficientnet_b1_pruned
|
70 |
+
Crop Pct: '0.882'
|
71 |
+
Image Size: '240'
|
72 |
+
Interpolation: bicubic
|
73 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1208
|
74 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45403/outputs/effnetb1_pruned_9ebb3fe6.pth
|
75 |
+
Results:
|
76 |
+
- Task: Image Classification
|
77 |
+
Dataset: ImageNet
|
78 |
+
Metrics:
|
79 |
+
Top 1 Accuracy: 78.25%
|
80 |
+
Top 5 Accuracy: 93.84%
|
81 |
+
- Name: efficientnet_b2_pruned
|
82 |
+
In Collection: EfficientNet Pruned
|
83 |
+
Metadata:
|
84 |
+
FLOPs: 878133915
|
85 |
+
Parameters: 8310000
|
86 |
+
File Size: 33555005
|
87 |
+
Architecture:
|
88 |
+
- 1x1 Convolution
|
89 |
+
- Average Pooling
|
90 |
+
- Batch Normalization
|
91 |
+
- Convolution
|
92 |
+
- Dense Connections
|
93 |
+
- Dropout
|
94 |
+
- Inverted Residual Block
|
95 |
+
- Squeeze-and-Excitation Block
|
96 |
+
- Swish
|
97 |
+
Tasks:
|
98 |
+
- Image Classification
|
99 |
+
Training Data:
|
100 |
+
- ImageNet
|
101 |
+
ID: efficientnet_b2_pruned
|
102 |
+
Crop Pct: '0.89'
|
103 |
+
Image Size: '260'
|
104 |
+
Interpolation: bicubic
|
105 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1219
|
106 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45403/outputs/effnetb2_pruned_203f55bc.pth
|
107 |
+
Results:
|
108 |
+
- Task: Image Classification
|
109 |
+
Dataset: ImageNet
|
110 |
+
Metrics:
|
111 |
+
Top 1 Accuracy: 79.91%
|
112 |
+
Top 5 Accuracy: 94.86%
|
113 |
+
- Name: efficientnet_b3_pruned
|
114 |
+
In Collection: EfficientNet Pruned
|
115 |
+
Metadata:
|
116 |
+
FLOPs: 1239590641
|
117 |
+
Parameters: 9860000
|
118 |
+
File Size: 39770812
|
119 |
+
Architecture:
|
120 |
+
- 1x1 Convolution
|
121 |
+
- Average Pooling
|
122 |
+
- Batch Normalization
|
123 |
+
- Convolution
|
124 |
+
- Dense Connections
|
125 |
+
- Dropout
|
126 |
+
- Inverted Residual Block
|
127 |
+
- Squeeze-and-Excitation Block
|
128 |
+
- Swish
|
129 |
+
Tasks:
|
130 |
+
- Image Classification
|
131 |
+
Training Data:
|
132 |
+
- ImageNet
|
133 |
+
ID: efficientnet_b3_pruned
|
134 |
+
Crop Pct: '0.904'
|
135 |
+
Image Size: '300'
|
136 |
+
Interpolation: bicubic
|
137 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1230
|
138 |
+
Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45403/outputs/effnetb3_pruned_5abcc29f.pth
|
139 |
+
Results:
|
140 |
+
- Task: Image Classification
|
141 |
+
Dataset: ImageNet
|
142 |
+
Metrics:
|
143 |
+
Top 1 Accuracy: 80.86%
|
144 |
+
Top 5 Accuracy: 95.24%
|
145 |
+
-->
|
docs/models/.templates/models/efficientnet.md
ADDED
@@ -0,0 +1,325 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# EfficientNet
|
2 |
+
|
3 |
+
**EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$, width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a principled way.
|
4 |
+
|
5 |
+
The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
|
6 |
+
|
7 |
+
The base EfficientNet-B0 network is based on the inverted bottleneck residual blocks of [MobileNetV2](https://paperswithcode.com/method/mobilenetv2), in addition to [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block).
|
8 |
+
|
9 |
+
{% include 'code_snippets.md' %}
|
10 |
+
|
11 |
+
## How do I train this model?
|
12 |
+
|
13 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
14 |
+
|
15 |
+
## Citation
|
16 |
+
|
17 |
+
```BibTeX
|
18 |
+
@misc{tan2020efficientnet,
|
19 |
+
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
|
20 |
+
author={Mingxing Tan and Quoc V. Le},
|
21 |
+
year={2020},
|
22 |
+
eprint={1905.11946},
|
23 |
+
archivePrefix={arXiv},
|
24 |
+
primaryClass={cs.LG}
|
25 |
+
}
|
26 |
+
```
|
27 |
+
|
28 |
+
<!--
|
29 |
+
Type: model-index
|
30 |
+
Collections:
|
31 |
+
- Name: EfficientNet
|
32 |
+
Paper:
|
33 |
+
Title: 'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'
|
34 |
+
URL: https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for
|
35 |
+
Models:
|
36 |
+
- Name: efficientnet_b0
|
37 |
+
In Collection: EfficientNet
|
38 |
+
Metadata:
|
39 |
+
FLOPs: 511241564
|
40 |
+
Parameters: 5290000
|
41 |
+
File Size: 21376743
|
42 |
+
Architecture:
|
43 |
+
- 1x1 Convolution
|
44 |
+
- Average Pooling
|
45 |
+
- Batch Normalization
|
46 |
+
- Convolution
|
47 |
+
- Dense Connections
|
48 |
+
- Dropout
|
49 |
+
- Inverted Residual Block
|
50 |
+
- Squeeze-and-Excitation Block
|
51 |
+
- Swish
|
52 |
+
Tasks:
|
53 |
+
- Image Classification
|
54 |
+
Training Data:
|
55 |
+
- ImageNet
|
56 |
+
ID: efficientnet_b0
|
57 |
+
Layers: 18
|
58 |
+
Crop Pct: '0.875'
|
59 |
+
Image Size: '224'
|
60 |
+
Interpolation: bicubic
|
61 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1002
|
62 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b0_ra-3dd342df.pth
|
63 |
+
Results:
|
64 |
+
- Task: Image Classification
|
65 |
+
Dataset: ImageNet
|
66 |
+
Metrics:
|
67 |
+
Top 1 Accuracy: 77.71%
|
68 |
+
Top 5 Accuracy: 93.52%
|
69 |
+
- Name: efficientnet_b1
|
70 |
+
In Collection: EfficientNet
|
71 |
+
Metadata:
|
72 |
+
FLOPs: 909691920
|
73 |
+
Parameters: 7790000
|
74 |
+
File Size: 31502706
|
75 |
+
Architecture:
|
76 |
+
- 1x1 Convolution
|
77 |
+
- Average Pooling
|
78 |
+
- Batch Normalization
|
79 |
+
- Convolution
|
80 |
+
- Dense Connections
|
81 |
+
- Dropout
|
82 |
+
- Inverted Residual Block
|
83 |
+
- Squeeze-and-Excitation Block
|
84 |
+
- Swish
|
85 |
+
Tasks:
|
86 |
+
- Image Classification
|
87 |
+
Training Data:
|
88 |
+
- ImageNet
|
89 |
+
ID: efficientnet_b1
|
90 |
+
Crop Pct: '0.875'
|
91 |
+
Image Size: '240'
|
92 |
+
Interpolation: bicubic
|
93 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1011
|
94 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b1-533bc792.pth
|
95 |
+
Results:
|
96 |
+
- Task: Image Classification
|
97 |
+
Dataset: ImageNet
|
98 |
+
Metrics:
|
99 |
+
Top 1 Accuracy: 78.71%
|
100 |
+
Top 5 Accuracy: 94.15%
|
101 |
+
- Name: efficientnet_b2
|
102 |
+
In Collection: EfficientNet
|
103 |
+
Metadata:
|
104 |
+
FLOPs: 1265324514
|
105 |
+
Parameters: 9110000
|
106 |
+
File Size: 36788104
|
107 |
+
Architecture:
|
108 |
+
- 1x1 Convolution
|
109 |
+
- Average Pooling
|
110 |
+
- Batch Normalization
|
111 |
+
- Convolution
|
112 |
+
- Dense Connections
|
113 |
+
- Dropout
|
114 |
+
- Inverted Residual Block
|
115 |
+
- Squeeze-and-Excitation Block
|
116 |
+
- Swish
|
117 |
+
Tasks:
|
118 |
+
- Image Classification
|
119 |
+
Training Data:
|
120 |
+
- ImageNet
|
121 |
+
ID: efficientnet_b2
|
122 |
+
Crop Pct: '0.875'
|
123 |
+
Image Size: '260'
|
124 |
+
Interpolation: bicubic
|
125 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1020
|
126 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b2_ra-bcdf34b7.pth
|
127 |
+
Results:
|
128 |
+
- Task: Image Classification
|
129 |
+
Dataset: ImageNet
|
130 |
+
Metrics:
|
131 |
+
Top 1 Accuracy: 80.38%
|
132 |
+
Top 5 Accuracy: 95.08%
|
133 |
+
- Name: efficientnet_b2a
|
134 |
+
In Collection: EfficientNet
|
135 |
+
Metadata:
|
136 |
+
FLOPs: 1452041554
|
137 |
+
Parameters: 9110000
|
138 |
+
File Size: 49369973
|
139 |
+
Architecture:
|
140 |
+
- 1x1 Convolution
|
141 |
+
- Average Pooling
|
142 |
+
- Batch Normalization
|
143 |
+
- Convolution
|
144 |
+
- Dense Connections
|
145 |
+
- Dropout
|
146 |
+
- Inverted Residual Block
|
147 |
+
- Squeeze-and-Excitation Block
|
148 |
+
- Swish
|
149 |
+
Tasks:
|
150 |
+
- Image Classification
|
151 |
+
Training Data:
|
152 |
+
- ImageNet
|
153 |
+
ID: efficientnet_b2a
|
154 |
+
Crop Pct: '1.0'
|
155 |
+
Image Size: '288'
|
156 |
+
Interpolation: bicubic
|
157 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1029
|
158 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b3_ra2-cf984f9c.pth
|
159 |
+
Results:
|
160 |
+
- Task: Image Classification
|
161 |
+
Dataset: ImageNet
|
162 |
+
Metrics:
|
163 |
+
Top 1 Accuracy: 80.61%
|
164 |
+
Top 5 Accuracy: 95.32%
|
165 |
+
- Name: efficientnet_b3
|
166 |
+
In Collection: EfficientNet
|
167 |
+
Metadata:
|
168 |
+
FLOPs: 2327905920
|
169 |
+
Parameters: 12230000
|
170 |
+
File Size: 49369973
|
171 |
+
Architecture:
|
172 |
+
- 1x1 Convolution
|
173 |
+
- Average Pooling
|
174 |
+
- Batch Normalization
|
175 |
+
- Convolution
|
176 |
+
- Dense Connections
|
177 |
+
- Dropout
|
178 |
+
- Inverted Residual Block
|
179 |
+
- Squeeze-and-Excitation Block
|
180 |
+
- Swish
|
181 |
+
Tasks:
|
182 |
+
- Image Classification
|
183 |
+
Training Data:
|
184 |
+
- ImageNet
|
185 |
+
ID: efficientnet_b3
|
186 |
+
Crop Pct: '0.904'
|
187 |
+
Image Size: '300'
|
188 |
+
Interpolation: bicubic
|
189 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1038
|
190 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b3_ra2-cf984f9c.pth
|
191 |
+
Results:
|
192 |
+
- Task: Image Classification
|
193 |
+
Dataset: ImageNet
|
194 |
+
Metrics:
|
195 |
+
Top 1 Accuracy: 82.08%
|
196 |
+
Top 5 Accuracy: 96.03%
|
197 |
+
- Name: efficientnet_b3a
|
198 |
+
In Collection: EfficientNet
|
199 |
+
Metadata:
|
200 |
+
FLOPs: 2600628304
|
201 |
+
Parameters: 12230000
|
202 |
+
File Size: 49369973
|
203 |
+
Architecture:
|
204 |
+
- 1x1 Convolution
|
205 |
+
- Average Pooling
|
206 |
+
- Batch Normalization
|
207 |
+
- Convolution
|
208 |
+
- Dense Connections
|
209 |
+
- Dropout
|
210 |
+
- Inverted Residual Block
|
211 |
+
- Squeeze-and-Excitation Block
|
212 |
+
- Swish
|
213 |
+
Tasks:
|
214 |
+
- Image Classification
|
215 |
+
Training Data:
|
216 |
+
- ImageNet
|
217 |
+
ID: efficientnet_b3a
|
218 |
+
Crop Pct: '1.0'
|
219 |
+
Image Size: '320'
|
220 |
+
Interpolation: bicubic
|
221 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1047
|
222 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b3_ra2-cf984f9c.pth
|
223 |
+
Results:
|
224 |
+
- Task: Image Classification
|
225 |
+
Dataset: ImageNet
|
226 |
+
Metrics:
|
227 |
+
Top 1 Accuracy: 82.25%
|
228 |
+
Top 5 Accuracy: 96.11%
|
229 |
+
- Name: efficientnet_em
|
230 |
+
In Collection: EfficientNet
|
231 |
+
Metadata:
|
232 |
+
FLOPs: 3935516480
|
233 |
+
Parameters: 6900000
|
234 |
+
File Size: 27927309
|
235 |
+
Architecture:
|
236 |
+
- 1x1 Convolution
|
237 |
+
- Average Pooling
|
238 |
+
- Batch Normalization
|
239 |
+
- Convolution
|
240 |
+
- Dense Connections
|
241 |
+
- Dropout
|
242 |
+
- Inverted Residual Block
|
243 |
+
- Squeeze-and-Excitation Block
|
244 |
+
- Swish
|
245 |
+
Tasks:
|
246 |
+
- Image Classification
|
247 |
+
Training Data:
|
248 |
+
- ImageNet
|
249 |
+
ID: efficientnet_em
|
250 |
+
Crop Pct: '0.882'
|
251 |
+
Image Size: '240'
|
252 |
+
Interpolation: bicubic
|
253 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1118
|
254 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_em_ra2-66250f76.pth
|
255 |
+
Results:
|
256 |
+
- Task: Image Classification
|
257 |
+
Dataset: ImageNet
|
258 |
+
Metrics:
|
259 |
+
Top 1 Accuracy: 79.26%
|
260 |
+
Top 5 Accuracy: 94.79%
|
261 |
+
- Name: efficientnet_es
|
262 |
+
In Collection: EfficientNet
|
263 |
+
Metadata:
|
264 |
+
FLOPs: 2317181824
|
265 |
+
Parameters: 5440000
|
266 |
+
File Size: 22003339
|
267 |
+
Architecture:
|
268 |
+
- 1x1 Convolution
|
269 |
+
- Average Pooling
|
270 |
+
- Batch Normalization
|
271 |
+
- Convolution
|
272 |
+
- Dense Connections
|
273 |
+
- Dropout
|
274 |
+
- Inverted Residual Block
|
275 |
+
- Squeeze-and-Excitation Block
|
276 |
+
- Swish
|
277 |
+
Tasks:
|
278 |
+
- Image Classification
|
279 |
+
Training Data:
|
280 |
+
- ImageNet
|
281 |
+
ID: efficientnet_es
|
282 |
+
Crop Pct: '0.875'
|
283 |
+
Image Size: '224'
|
284 |
+
Interpolation: bicubic
|
285 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1110
|
286 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_es_ra-f111e99c.pth
|
287 |
+
Results:
|
288 |
+
- Task: Image Classification
|
289 |
+
Dataset: ImageNet
|
290 |
+
Metrics:
|
291 |
+
Top 1 Accuracy: 78.09%
|
292 |
+
Top 5 Accuracy: 93.93%
|
293 |
+
- Name: efficientnet_lite0
|
294 |
+
In Collection: EfficientNet
|
295 |
+
Metadata:
|
296 |
+
FLOPs: 510605024
|
297 |
+
Parameters: 4650000
|
298 |
+
File Size: 18820005
|
299 |
+
Architecture:
|
300 |
+
- 1x1 Convolution
|
301 |
+
- Average Pooling
|
302 |
+
- Batch Normalization
|
303 |
+
- Convolution
|
304 |
+
- Dense Connections
|
305 |
+
- Dropout
|
306 |
+
- Inverted Residual Block
|
307 |
+
- Squeeze-and-Excitation Block
|
308 |
+
- Swish
|
309 |
+
Tasks:
|
310 |
+
- Image Classification
|
311 |
+
Training Data:
|
312 |
+
- ImageNet
|
313 |
+
ID: efficientnet_lite0
|
314 |
+
Crop Pct: '0.875'
|
315 |
+
Image Size: '224'
|
316 |
+
Interpolation: bicubic
|
317 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1163
|
318 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_lite0_ra-37913777.pth
|
319 |
+
Results:
|
320 |
+
- Task: Image Classification
|
321 |
+
Dataset: ImageNet
|
322 |
+
Metrics:
|
323 |
+
Top 1 Accuracy: 75.5%
|
324 |
+
Top 5 Accuracy: 92.51%
|
325 |
+
-->
|
docs/models/.templates/models/ensemble-adversarial.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# # Ensemble Adversarial Inception ResNet v2
|
2 |
+
|
3 |
+
**Inception-ResNet-v2** is a convolutional neural architecture that builds on the Inception family of architectures but incorporates [residual connections](https://paperswithcode.com/method/residual-connection) (replacing the filter concatenation stage of the Inception architecture).
|
4 |
+
|
5 |
+
This particular model was trained for study of adversarial examples (adversarial training).
|
6 |
+
|
7 |
+
The weights from this model were ported from [Tensorflow/Models](https://github.com/tensorflow/models).
|
8 |
+
|
9 |
+
{% include 'code_snippets.md' %}
|
10 |
+
|
11 |
+
## How do I train this model?
|
12 |
+
|
13 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
14 |
+
|
15 |
+
## Citation
|
16 |
+
|
17 |
+
```BibTeX
|
18 |
+
@article{DBLP:journals/corr/abs-1804-00097,
|
19 |
+
author = {Alexey Kurakin and
|
20 |
+
Ian J. Goodfellow and
|
21 |
+
Samy Bengio and
|
22 |
+
Yinpeng Dong and
|
23 |
+
Fangzhou Liao and
|
24 |
+
Ming Liang and
|
25 |
+
Tianyu Pang and
|
26 |
+
Jun Zhu and
|
27 |
+
Xiaolin Hu and
|
28 |
+
Cihang Xie and
|
29 |
+
Jianyu Wang and
|
30 |
+
Zhishuai Zhang and
|
31 |
+
Zhou Ren and
|
32 |
+
Alan L. Yuille and
|
33 |
+
Sangxia Huang and
|
34 |
+
Yao Zhao and
|
35 |
+
Yuzhe Zhao and
|
36 |
+
Zhonglin Han and
|
37 |
+
Junjiajia Long and
|
38 |
+
Yerkebulan Berdibekov and
|
39 |
+
Takuya Akiba and
|
40 |
+
Seiya Tokui and
|
41 |
+
Motoki Abe},
|
42 |
+
title = {Adversarial Attacks and Defences Competition},
|
43 |
+
journal = {CoRR},
|
44 |
+
volume = {abs/1804.00097},
|
45 |
+
year = {2018},
|
46 |
+
url = {http://arxiv.org/abs/1804.00097},
|
47 |
+
archivePrefix = {arXiv},
|
48 |
+
eprint = {1804.00097},
|
49 |
+
timestamp = {Thu, 31 Oct 2019 16:31:22 +0100},
|
50 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-1804-00097.bib},
|
51 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
52 |
+
}
|
53 |
+
```
|
54 |
+
|
55 |
+
<!--
|
56 |
+
Type: model-index
|
57 |
+
Collections:
|
58 |
+
- Name: Ensemble Adversarial
|
59 |
+
Paper:
|
60 |
+
Title: Adversarial Attacks and Defences Competition
|
61 |
+
URL: https://paperswithcode.com/paper/adversarial-attacks-and-defences-competition
|
62 |
+
Models:
|
63 |
+
- Name: ens_adv_inception_resnet_v2
|
64 |
+
In Collection: Ensemble Adversarial
|
65 |
+
Metadata:
|
66 |
+
FLOPs: 16959133120
|
67 |
+
Parameters: 55850000
|
68 |
+
File Size: 223774238
|
69 |
+
Architecture:
|
70 |
+
- 1x1 Convolution
|
71 |
+
- Auxiliary Classifier
|
72 |
+
- Average Pooling
|
73 |
+
- Average Pooling
|
74 |
+
- Batch Normalization
|
75 |
+
- Convolution
|
76 |
+
- Dense Connections
|
77 |
+
- Dropout
|
78 |
+
- Inception-v3 Module
|
79 |
+
- Max Pooling
|
80 |
+
- ReLU
|
81 |
+
- Softmax
|
82 |
+
Tasks:
|
83 |
+
- Image Classification
|
84 |
+
Training Data:
|
85 |
+
- ImageNet
|
86 |
+
ID: ens_adv_inception_resnet_v2
|
87 |
+
Crop Pct: '0.897'
|
88 |
+
Image Size: '299'
|
89 |
+
Interpolation: bicubic
|
90 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_resnet_v2.py#L351
|
91 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ens_adv_inception_resnet_v2-2592a550.pth
|
92 |
+
Results:
|
93 |
+
- Task: Image Classification
|
94 |
+
Dataset: ImageNet
|
95 |
+
Metrics:
|
96 |
+
Top 1 Accuracy: 1.0%
|
97 |
+
Top 5 Accuracy: 17.32%
|
98 |
+
-->
|
docs/models/.templates/models/ese-vovnet.md
ADDED
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ESE-VoVNet
|
2 |
+
|
3 |
+
**VoVNet** is a convolutional neural network that seeks to make [DenseNet](https://paperswithcode.com/method/densenet) more efficient by concatenating all features only once in the last feature map, which makes input size constant and enables enlarging new output channel.
|
4 |
+
|
5 |
+
Read about [one-shot aggregation here](https://paperswithcode.com/method/one-shot-aggregation).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{lee2019energy,
|
17 |
+
title={An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection},
|
18 |
+
author={Youngwan Lee and Joong-won Hwang and Sangrok Lee and Yuseok Bae and Jongyoul Park},
|
19 |
+
year={2019},
|
20 |
+
eprint={1904.09730},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: ESE VovNet
|
30 |
+
Paper:
|
31 |
+
Title: 'CenterMask : Real-Time Anchor-Free Instance Segmentation'
|
32 |
+
URL: https://paperswithcode.com/paper/centermask-real-time-anchor-free-instance-1
|
33 |
+
Models:
|
34 |
+
- Name: ese_vovnet19b_dw
|
35 |
+
In Collection: ESE VovNet
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 1711959904
|
38 |
+
Parameters: 6540000
|
39 |
+
File Size: 26243175
|
40 |
+
Architecture:
|
41 |
+
- Batch Normalization
|
42 |
+
- Convolution
|
43 |
+
- Max Pooling
|
44 |
+
- One-Shot Aggregation
|
45 |
+
- ReLU
|
46 |
+
Tasks:
|
47 |
+
- Image Classification
|
48 |
+
Training Data:
|
49 |
+
- ImageNet
|
50 |
+
ID: ese_vovnet19b_dw
|
51 |
+
Layers: 19
|
52 |
+
Crop Pct: '0.875'
|
53 |
+
Image Size: '224'
|
54 |
+
Interpolation: bicubic
|
55 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/vovnet.py#L361
|
56 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ese_vovnet19b_dw-a8741004.pth
|
57 |
+
Results:
|
58 |
+
- Task: Image Classification
|
59 |
+
Dataset: ImageNet
|
60 |
+
Metrics:
|
61 |
+
Top 1 Accuracy: 76.82%
|
62 |
+
Top 5 Accuracy: 93.28%
|
63 |
+
- Name: ese_vovnet39b
|
64 |
+
In Collection: ESE VovNet
|
65 |
+
Metadata:
|
66 |
+
FLOPs: 9089259008
|
67 |
+
Parameters: 24570000
|
68 |
+
File Size: 98397138
|
69 |
+
Architecture:
|
70 |
+
- Batch Normalization
|
71 |
+
- Convolution
|
72 |
+
- Max Pooling
|
73 |
+
- One-Shot Aggregation
|
74 |
+
- ReLU
|
75 |
+
Tasks:
|
76 |
+
- Image Classification
|
77 |
+
Training Data:
|
78 |
+
- ImageNet
|
79 |
+
ID: ese_vovnet39b
|
80 |
+
Layers: 39
|
81 |
+
Crop Pct: '0.875'
|
82 |
+
Image Size: '224'
|
83 |
+
Interpolation: bicubic
|
84 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/vovnet.py#L371
|
85 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ese_vovnet39b-f912fe73.pth
|
86 |
+
Results:
|
87 |
+
- Task: Image Classification
|
88 |
+
Dataset: ImageNet
|
89 |
+
Metrics:
|
90 |
+
Top 1 Accuracy: 79.31%
|
91 |
+
Top 5 Accuracy: 94.72%
|
92 |
+
-->
|
docs/models/.templates/models/fbnet.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# FBNet
|
2 |
+
|
3 |
+
**FBNet** is a type of convolutional neural architectures discovered through [DNAS](https://paperswithcode.com/method/dnas) neural architecture search. It utilises a basic type of image model block inspired by [MobileNetv2](https://paperswithcode.com/method/mobilenetv2) that utilises depthwise convolutions and an inverted residual structure (see components).
|
4 |
+
|
5 |
+
The principal building block is the [FBNet Block](https://paperswithcode.com/method/fbnet-block).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{wu2019fbnet,
|
17 |
+
title={FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search},
|
18 |
+
author={Bichen Wu and Xiaoliang Dai and Peizhao Zhang and Yanghan Wang and Fei Sun and Yiming Wu and Yuandong Tian and Peter Vajda and Yangqing Jia and Kurt Keutzer},
|
19 |
+
year={2019},
|
20 |
+
eprint={1812.03443},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: FBNet
|
30 |
+
Paper:
|
31 |
+
Title: 'FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural
|
32 |
+
Architecture Search'
|
33 |
+
URL: https://paperswithcode.com/paper/fbnet-hardware-aware-efficient-convnet-design
|
34 |
+
Models:
|
35 |
+
- Name: fbnetc_100
|
36 |
+
In Collection: FBNet
|
37 |
+
Metadata:
|
38 |
+
FLOPs: 508940064
|
39 |
+
Parameters: 5570000
|
40 |
+
File Size: 22525094
|
41 |
+
Architecture:
|
42 |
+
- 1x1 Convolution
|
43 |
+
- Convolution
|
44 |
+
- Dense Connections
|
45 |
+
- Dropout
|
46 |
+
- FBNet Block
|
47 |
+
- Global Average Pooling
|
48 |
+
- Softmax
|
49 |
+
Tasks:
|
50 |
+
- Image Classification
|
51 |
+
Training Techniques:
|
52 |
+
- SGD with Momentum
|
53 |
+
- Weight Decay
|
54 |
+
Training Data:
|
55 |
+
- ImageNet
|
56 |
+
Training Resources: 8x GPUs
|
57 |
+
ID: fbnetc_100
|
58 |
+
LR: 0.1
|
59 |
+
Epochs: 360
|
60 |
+
Layers: 22
|
61 |
+
Dropout: 0.2
|
62 |
+
Crop Pct: '0.875'
|
63 |
+
Momentum: 0.9
|
64 |
+
Batch Size: 256
|
65 |
+
Image Size: '224'
|
66 |
+
Weight Decay: 0.0005
|
67 |
+
Interpolation: bilinear
|
68 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L985
|
69 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/fbnetc_100-c345b898.pth
|
70 |
+
Results:
|
71 |
+
- Task: Image Classification
|
72 |
+
Dataset: ImageNet
|
73 |
+
Metrics:
|
74 |
+
Top 1 Accuracy: 75.12%
|
75 |
+
Top 5 Accuracy: 92.37%
|
76 |
+
-->
|
docs/models/.templates/models/gloun-inception-v3.md
ADDED
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# (Gluon) Inception v3
|
2 |
+
|
3 |
+
**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
|
4 |
+
|
5 |
+
The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@article{DBLP:journals/corr/SzegedyVISW15,
|
17 |
+
author = {Christian Szegedy and
|
18 |
+
Vincent Vanhoucke and
|
19 |
+
Sergey Ioffe and
|
20 |
+
Jonathon Shlens and
|
21 |
+
Zbigniew Wojna},
|
22 |
+
title = {Rethinking the Inception Architecture for Computer Vision},
|
23 |
+
journal = {CoRR},
|
24 |
+
volume = {abs/1512.00567},
|
25 |
+
year = {2015},
|
26 |
+
url = {http://arxiv.org/abs/1512.00567},
|
27 |
+
archivePrefix = {arXiv},
|
28 |
+
eprint = {1512.00567},
|
29 |
+
timestamp = {Mon, 13 Aug 2018 16:49:07 +0200},
|
30 |
+
biburl = {https://dblp.org/rec/journals/corr/SzegedyVISW15.bib},
|
31 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
32 |
+
}
|
33 |
+
```
|
34 |
+
|
35 |
+
<!--
|
36 |
+
Type: model-index
|
37 |
+
Collections:
|
38 |
+
- Name: Gloun Inception v3
|
39 |
+
Paper:
|
40 |
+
Title: Rethinking the Inception Architecture for Computer Vision
|
41 |
+
URL: https://paperswithcode.com/paper/rethinking-the-inception-architecture-for
|
42 |
+
Models:
|
43 |
+
- Name: gluon_inception_v3
|
44 |
+
In Collection: Gloun Inception v3
|
45 |
+
Metadata:
|
46 |
+
FLOPs: 7352418880
|
47 |
+
Parameters: 23830000
|
48 |
+
File Size: 95567055
|
49 |
+
Architecture:
|
50 |
+
- 1x1 Convolution
|
51 |
+
- Auxiliary Classifier
|
52 |
+
- Average Pooling
|
53 |
+
- Average Pooling
|
54 |
+
- Batch Normalization
|
55 |
+
- Convolution
|
56 |
+
- Dense Connections
|
57 |
+
- Dropout
|
58 |
+
- Inception-v3 Module
|
59 |
+
- Max Pooling
|
60 |
+
- ReLU
|
61 |
+
- Softmax
|
62 |
+
Tasks:
|
63 |
+
- Image Classification
|
64 |
+
Training Data:
|
65 |
+
- ImageNet
|
66 |
+
ID: gluon_inception_v3
|
67 |
+
Crop Pct: '0.875'
|
68 |
+
Image Size: '299'
|
69 |
+
Interpolation: bicubic
|
70 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_v3.py#L464
|
71 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/gluon_inception_v3-9f746940.pth
|
72 |
+
Results:
|
73 |
+
- Task: Image Classification
|
74 |
+
Dataset: ImageNet
|
75 |
+
Metrics:
|
76 |
+
Top 1 Accuracy: 78.8%
|
77 |
+
Top 5 Accuracy: 94.38%
|
78 |
+
-->
|
docs/models/.templates/models/gloun-resnet.md
ADDED
@@ -0,0 +1,504 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# (Gluon) ResNet
|
2 |
+
|
3 |
+
**Residual Networks**, or **ResNets**, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. They stack [residual blocks](https://paperswithcode.com/method/residual-block) ontop of each other to form network: e.g. a ResNet-50 has fifty layers using these blocks.
|
4 |
+
|
5 |
+
The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@article{DBLP:journals/corr/HeZRS15,
|
17 |
+
author = {Kaiming He and
|
18 |
+
Xiangyu Zhang and
|
19 |
+
Shaoqing Ren and
|
20 |
+
Jian Sun},
|
21 |
+
title = {Deep Residual Learning for Image Recognition},
|
22 |
+
journal = {CoRR},
|
23 |
+
volume = {abs/1512.03385},
|
24 |
+
year = {2015},
|
25 |
+
url = {http://arxiv.org/abs/1512.03385},
|
26 |
+
archivePrefix = {arXiv},
|
27 |
+
eprint = {1512.03385},
|
28 |
+
timestamp = {Wed, 17 Apr 2019 17:23:45 +0200},
|
29 |
+
biburl = {https://dblp.org/rec/journals/corr/HeZRS15.bib},
|
30 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
31 |
+
}
|
32 |
+
```
|
33 |
+
|
34 |
+
<!--
|
35 |
+
Type: model-index
|
36 |
+
Collections:
|
37 |
+
- Name: Gloun ResNet
|
38 |
+
Paper:
|
39 |
+
Title: Deep Residual Learning for Image Recognition
|
40 |
+
URL: https://paperswithcode.com/paper/deep-residual-learning-for-image-recognition
|
41 |
+
Models:
|
42 |
+
- Name: gluon_resnet101_v1b
|
43 |
+
In Collection: Gloun ResNet
|
44 |
+
Metadata:
|
45 |
+
FLOPs: 10068547584
|
46 |
+
Parameters: 44550000
|
47 |
+
File Size: 178723172
|
48 |
+
Architecture:
|
49 |
+
- 1x1 Convolution
|
50 |
+
- Batch Normalization
|
51 |
+
- Bottleneck Residual Block
|
52 |
+
- Convolution
|
53 |
+
- Global Average Pooling
|
54 |
+
- Max Pooling
|
55 |
+
- ReLU
|
56 |
+
- Residual Block
|
57 |
+
- Residual Connection
|
58 |
+
- Softmax
|
59 |
+
Tasks:
|
60 |
+
- Image Classification
|
61 |
+
Training Data:
|
62 |
+
- ImageNet
|
63 |
+
ID: gluon_resnet101_v1b
|
64 |
+
Crop Pct: '0.875'
|
65 |
+
Image Size: '224'
|
66 |
+
Interpolation: bicubic
|
67 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L89
|
68 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1b-3b017079.pth
|
69 |
+
Results:
|
70 |
+
- Task: Image Classification
|
71 |
+
Dataset: ImageNet
|
72 |
+
Metrics:
|
73 |
+
Top 1 Accuracy: 79.3%
|
74 |
+
Top 5 Accuracy: 94.53%
|
75 |
+
- Name: gluon_resnet101_v1c
|
76 |
+
In Collection: Gloun ResNet
|
77 |
+
Metadata:
|
78 |
+
FLOPs: 10376567296
|
79 |
+
Parameters: 44570000
|
80 |
+
File Size: 178802575
|
81 |
+
Architecture:
|
82 |
+
- 1x1 Convolution
|
83 |
+
- Batch Normalization
|
84 |
+
- Bottleneck Residual Block
|
85 |
+
- Convolution
|
86 |
+
- Global Average Pooling
|
87 |
+
- Max Pooling
|
88 |
+
- ReLU
|
89 |
+
- Residual Block
|
90 |
+
- Residual Connection
|
91 |
+
- Softmax
|
92 |
+
Tasks:
|
93 |
+
- Image Classification
|
94 |
+
Training Data:
|
95 |
+
- ImageNet
|
96 |
+
ID: gluon_resnet101_v1c
|
97 |
+
Crop Pct: '0.875'
|
98 |
+
Image Size: '224'
|
99 |
+
Interpolation: bicubic
|
100 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L113
|
101 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1c-1f26822a.pth
|
102 |
+
Results:
|
103 |
+
- Task: Image Classification
|
104 |
+
Dataset: ImageNet
|
105 |
+
Metrics:
|
106 |
+
Top 1 Accuracy: 79.53%
|
107 |
+
Top 5 Accuracy: 94.59%
|
108 |
+
- Name: gluon_resnet101_v1d
|
109 |
+
In Collection: Gloun ResNet
|
110 |
+
Metadata:
|
111 |
+
FLOPs: 10377018880
|
112 |
+
Parameters: 44570000
|
113 |
+
File Size: 178802755
|
114 |
+
Architecture:
|
115 |
+
- 1x1 Convolution
|
116 |
+
- Batch Normalization
|
117 |
+
- Bottleneck Residual Block
|
118 |
+
- Convolution
|
119 |
+
- Global Average Pooling
|
120 |
+
- Max Pooling
|
121 |
+
- ReLU
|
122 |
+
- Residual Block
|
123 |
+
- Residual Connection
|
124 |
+
- Softmax
|
125 |
+
Tasks:
|
126 |
+
- Image Classification
|
127 |
+
Training Data:
|
128 |
+
- ImageNet
|
129 |
+
ID: gluon_resnet101_v1d
|
130 |
+
Crop Pct: '0.875'
|
131 |
+
Image Size: '224'
|
132 |
+
Interpolation: bicubic
|
133 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L138
|
134 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1d-0f9c8644.pth
|
135 |
+
Results:
|
136 |
+
- Task: Image Classification
|
137 |
+
Dataset: ImageNet
|
138 |
+
Metrics:
|
139 |
+
Top 1 Accuracy: 80.4%
|
140 |
+
Top 5 Accuracy: 95.02%
|
141 |
+
- Name: gluon_resnet101_v1s
|
142 |
+
In Collection: Gloun ResNet
|
143 |
+
Metadata:
|
144 |
+
FLOPs: 11805511680
|
145 |
+
Parameters: 44670000
|
146 |
+
File Size: 179221777
|
147 |
+
Architecture:
|
148 |
+
- 1x1 Convolution
|
149 |
+
- Batch Normalization
|
150 |
+
- Bottleneck Residual Block
|
151 |
+
- Convolution
|
152 |
+
- Global Average Pooling
|
153 |
+
- Max Pooling
|
154 |
+
- ReLU
|
155 |
+
- Residual Block
|
156 |
+
- Residual Connection
|
157 |
+
- Softmax
|
158 |
+
Tasks:
|
159 |
+
- Image Classification
|
160 |
+
Training Data:
|
161 |
+
- ImageNet
|
162 |
+
ID: gluon_resnet101_v1s
|
163 |
+
Crop Pct: '0.875'
|
164 |
+
Image Size: '224'
|
165 |
+
Interpolation: bicubic
|
166 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L166
|
167 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1s-60fe0cc1.pth
|
168 |
+
Results:
|
169 |
+
- Task: Image Classification
|
170 |
+
Dataset: ImageNet
|
171 |
+
Metrics:
|
172 |
+
Top 1 Accuracy: 80.29%
|
173 |
+
Top 5 Accuracy: 95.16%
|
174 |
+
- Name: gluon_resnet152_v1b
|
175 |
+
In Collection: Gloun ResNet
|
176 |
+
Metadata:
|
177 |
+
FLOPs: 14857660416
|
178 |
+
Parameters: 60190000
|
179 |
+
File Size: 241534001
|
180 |
+
Architecture:
|
181 |
+
- 1x1 Convolution
|
182 |
+
- Batch Normalization
|
183 |
+
- Bottleneck Residual Block
|
184 |
+
- Convolution
|
185 |
+
- Global Average Pooling
|
186 |
+
- Max Pooling
|
187 |
+
- ReLU
|
188 |
+
- Residual Block
|
189 |
+
- Residual Connection
|
190 |
+
- Softmax
|
191 |
+
Tasks:
|
192 |
+
- Image Classification
|
193 |
+
Training Data:
|
194 |
+
- ImageNet
|
195 |
+
ID: gluon_resnet152_v1b
|
196 |
+
Crop Pct: '0.875'
|
197 |
+
Image Size: '224'
|
198 |
+
Interpolation: bicubic
|
199 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L97
|
200 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1b-c1edb0dd.pth
|
201 |
+
Results:
|
202 |
+
- Task: Image Classification
|
203 |
+
Dataset: ImageNet
|
204 |
+
Metrics:
|
205 |
+
Top 1 Accuracy: 79.69%
|
206 |
+
Top 5 Accuracy: 94.73%
|
207 |
+
- Name: gluon_resnet152_v1c
|
208 |
+
In Collection: Gloun ResNet
|
209 |
+
Metadata:
|
210 |
+
FLOPs: 15165680128
|
211 |
+
Parameters: 60210000
|
212 |
+
File Size: 241613404
|
213 |
+
Architecture:
|
214 |
+
- 1x1 Convolution
|
215 |
+
- Batch Normalization
|
216 |
+
- Bottleneck Residual Block
|
217 |
+
- Convolution
|
218 |
+
- Global Average Pooling
|
219 |
+
- Max Pooling
|
220 |
+
- ReLU
|
221 |
+
- Residual Block
|
222 |
+
- Residual Connection
|
223 |
+
- Softmax
|
224 |
+
Tasks:
|
225 |
+
- Image Classification
|
226 |
+
Training Data:
|
227 |
+
- ImageNet
|
228 |
+
ID: gluon_resnet152_v1c
|
229 |
+
Crop Pct: '0.875'
|
230 |
+
Image Size: '224'
|
231 |
+
Interpolation: bicubic
|
232 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L121
|
233 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1c-a3bb0b98.pth
|
234 |
+
Results:
|
235 |
+
- Task: Image Classification
|
236 |
+
Dataset: ImageNet
|
237 |
+
Metrics:
|
238 |
+
Top 1 Accuracy: 79.91%
|
239 |
+
Top 5 Accuracy: 94.85%
|
240 |
+
- Name: gluon_resnet152_v1d
|
241 |
+
In Collection: Gloun ResNet
|
242 |
+
Metadata:
|
243 |
+
FLOPs: 15166131712
|
244 |
+
Parameters: 60210000
|
245 |
+
File Size: 241613584
|
246 |
+
Architecture:
|
247 |
+
- 1x1 Convolution
|
248 |
+
- Batch Normalization
|
249 |
+
- Bottleneck Residual Block
|
250 |
+
- Convolution
|
251 |
+
- Global Average Pooling
|
252 |
+
- Max Pooling
|
253 |
+
- ReLU
|
254 |
+
- Residual Block
|
255 |
+
- Residual Connection
|
256 |
+
- Softmax
|
257 |
+
Tasks:
|
258 |
+
- Image Classification
|
259 |
+
Training Data:
|
260 |
+
- ImageNet
|
261 |
+
ID: gluon_resnet152_v1d
|
262 |
+
Crop Pct: '0.875'
|
263 |
+
Image Size: '224'
|
264 |
+
Interpolation: bicubic
|
265 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L147
|
266 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1d-bd354e12.pth
|
267 |
+
Results:
|
268 |
+
- Task: Image Classification
|
269 |
+
Dataset: ImageNet
|
270 |
+
Metrics:
|
271 |
+
Top 1 Accuracy: 80.48%
|
272 |
+
Top 5 Accuracy: 95.2%
|
273 |
+
- Name: gluon_resnet152_v1s
|
274 |
+
In Collection: Gloun ResNet
|
275 |
+
Metadata:
|
276 |
+
FLOPs: 16594624512
|
277 |
+
Parameters: 60320000
|
278 |
+
File Size: 242032606
|
279 |
+
Architecture:
|
280 |
+
- 1x1 Convolution
|
281 |
+
- Batch Normalization
|
282 |
+
- Bottleneck Residual Block
|
283 |
+
- Convolution
|
284 |
+
- Global Average Pooling
|
285 |
+
- Max Pooling
|
286 |
+
- ReLU
|
287 |
+
- Residual Block
|
288 |
+
- Residual Connection
|
289 |
+
- Softmax
|
290 |
+
Tasks:
|
291 |
+
- Image Classification
|
292 |
+
Training Data:
|
293 |
+
- ImageNet
|
294 |
+
ID: gluon_resnet152_v1s
|
295 |
+
Crop Pct: '0.875'
|
296 |
+
Image Size: '224'
|
297 |
+
Interpolation: bicubic
|
298 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L175
|
299 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1s-dcc41b81.pth
|
300 |
+
Results:
|
301 |
+
- Task: Image Classification
|
302 |
+
Dataset: ImageNet
|
303 |
+
Metrics:
|
304 |
+
Top 1 Accuracy: 81.02%
|
305 |
+
Top 5 Accuracy: 95.42%
|
306 |
+
- Name: gluon_resnet18_v1b
|
307 |
+
In Collection: Gloun ResNet
|
308 |
+
Metadata:
|
309 |
+
FLOPs: 2337073152
|
310 |
+
Parameters: 11690000
|
311 |
+
File Size: 46816736
|
312 |
+
Architecture:
|
313 |
+
- 1x1 Convolution
|
314 |
+
- Batch Normalization
|
315 |
+
- Bottleneck Residual Block
|
316 |
+
- Convolution
|
317 |
+
- Global Average Pooling
|
318 |
+
- Max Pooling
|
319 |
+
- ReLU
|
320 |
+
- Residual Block
|
321 |
+
- Residual Connection
|
322 |
+
- Softmax
|
323 |
+
Tasks:
|
324 |
+
- Image Classification
|
325 |
+
Training Data:
|
326 |
+
- ImageNet
|
327 |
+
ID: gluon_resnet18_v1b
|
328 |
+
Crop Pct: '0.875'
|
329 |
+
Image Size: '224'
|
330 |
+
Interpolation: bicubic
|
331 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L65
|
332 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet18_v1b-0757602b.pth
|
333 |
+
Results:
|
334 |
+
- Task: Image Classification
|
335 |
+
Dataset: ImageNet
|
336 |
+
Metrics:
|
337 |
+
Top 1 Accuracy: 70.84%
|
338 |
+
Top 5 Accuracy: 89.76%
|
339 |
+
- Name: gluon_resnet34_v1b
|
340 |
+
In Collection: Gloun ResNet
|
341 |
+
Metadata:
|
342 |
+
FLOPs: 4718469120
|
343 |
+
Parameters: 21800000
|
344 |
+
File Size: 87295112
|
345 |
+
Architecture:
|
346 |
+
- 1x1 Convolution
|
347 |
+
- Batch Normalization
|
348 |
+
- Bottleneck Residual Block
|
349 |
+
- Convolution
|
350 |
+
- Global Average Pooling
|
351 |
+
- Max Pooling
|
352 |
+
- ReLU
|
353 |
+
- Residual Block
|
354 |
+
- Residual Connection
|
355 |
+
- Softmax
|
356 |
+
Tasks:
|
357 |
+
- Image Classification
|
358 |
+
Training Data:
|
359 |
+
- ImageNet
|
360 |
+
ID: gluon_resnet34_v1b
|
361 |
+
Crop Pct: '0.875'
|
362 |
+
Image Size: '224'
|
363 |
+
Interpolation: bicubic
|
364 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L73
|
365 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet34_v1b-c6d82d59.pth
|
366 |
+
Results:
|
367 |
+
- Task: Image Classification
|
368 |
+
Dataset: ImageNet
|
369 |
+
Metrics:
|
370 |
+
Top 1 Accuracy: 74.59%
|
371 |
+
Top 5 Accuracy: 92.0%
|
372 |
+
- Name: gluon_resnet50_v1b
|
373 |
+
In Collection: Gloun ResNet
|
374 |
+
Metadata:
|
375 |
+
FLOPs: 5282531328
|
376 |
+
Parameters: 25560000
|
377 |
+
File Size: 102493763
|
378 |
+
Architecture:
|
379 |
+
- 1x1 Convolution
|
380 |
+
- Batch Normalization
|
381 |
+
- Bottleneck Residual Block
|
382 |
+
- Convolution
|
383 |
+
- Global Average Pooling
|
384 |
+
- Max Pooling
|
385 |
+
- ReLU
|
386 |
+
- Residual Block
|
387 |
+
- Residual Connection
|
388 |
+
- Softmax
|
389 |
+
Tasks:
|
390 |
+
- Image Classification
|
391 |
+
Training Data:
|
392 |
+
- ImageNet
|
393 |
+
ID: gluon_resnet50_v1b
|
394 |
+
Crop Pct: '0.875'
|
395 |
+
Image Size: '224'
|
396 |
+
Interpolation: bicubic
|
397 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L81
|
398 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1b-0ebe02e2.pth
|
399 |
+
Results:
|
400 |
+
- Task: Image Classification
|
401 |
+
Dataset: ImageNet
|
402 |
+
Metrics:
|
403 |
+
Top 1 Accuracy: 77.58%
|
404 |
+
Top 5 Accuracy: 93.72%
|
405 |
+
- Name: gluon_resnet50_v1c
|
406 |
+
In Collection: Gloun ResNet
|
407 |
+
Metadata:
|
408 |
+
FLOPs: 5590551040
|
409 |
+
Parameters: 25580000
|
410 |
+
File Size: 102573166
|
411 |
+
Architecture:
|
412 |
+
- 1x1 Convolution
|
413 |
+
- Batch Normalization
|
414 |
+
- Bottleneck Residual Block
|
415 |
+
- Convolution
|
416 |
+
- Global Average Pooling
|
417 |
+
- Max Pooling
|
418 |
+
- ReLU
|
419 |
+
- Residual Block
|
420 |
+
- Residual Connection
|
421 |
+
- Softmax
|
422 |
+
Tasks:
|
423 |
+
- Image Classification
|
424 |
+
Training Data:
|
425 |
+
- ImageNet
|
426 |
+
ID: gluon_resnet50_v1c
|
427 |
+
Crop Pct: '0.875'
|
428 |
+
Image Size: '224'
|
429 |
+
Interpolation: bicubic
|
430 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L105
|
431 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1c-48092f55.pth
|
432 |
+
Results:
|
433 |
+
- Task: Image Classification
|
434 |
+
Dataset: ImageNet
|
435 |
+
Metrics:
|
436 |
+
Top 1 Accuracy: 78.01%
|
437 |
+
Top 5 Accuracy: 93.99%
|
438 |
+
- Name: gluon_resnet50_v1d
|
439 |
+
In Collection: Gloun ResNet
|
440 |
+
Metadata:
|
441 |
+
FLOPs: 5591002624
|
442 |
+
Parameters: 25580000
|
443 |
+
File Size: 102573346
|
444 |
+
Architecture:
|
445 |
+
- 1x1 Convolution
|
446 |
+
- Batch Normalization
|
447 |
+
- Bottleneck Residual Block
|
448 |
+
- Convolution
|
449 |
+
- Global Average Pooling
|
450 |
+
- Max Pooling
|
451 |
+
- ReLU
|
452 |
+
- Residual Block
|
453 |
+
- Residual Connection
|
454 |
+
- Softmax
|
455 |
+
Tasks:
|
456 |
+
- Image Classification
|
457 |
+
Training Data:
|
458 |
+
- ImageNet
|
459 |
+
ID: gluon_resnet50_v1d
|
460 |
+
Crop Pct: '0.875'
|
461 |
+
Image Size: '224'
|
462 |
+
Interpolation: bicubic
|
463 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L129
|
464 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1d-818a1b1b.pth
|
465 |
+
Results:
|
466 |
+
- Task: Image Classification
|
467 |
+
Dataset: ImageNet
|
468 |
+
Metrics:
|
469 |
+
Top 1 Accuracy: 79.06%
|
470 |
+
Top 5 Accuracy: 94.46%
|
471 |
+
- Name: gluon_resnet50_v1s
|
472 |
+
In Collection: Gloun ResNet
|
473 |
+
Metadata:
|
474 |
+
FLOPs: 7019495424
|
475 |
+
Parameters: 25680000
|
476 |
+
File Size: 102992368
|
477 |
+
Architecture:
|
478 |
+
- 1x1 Convolution
|
479 |
+
- Batch Normalization
|
480 |
+
- Bottleneck Residual Block
|
481 |
+
- Convolution
|
482 |
+
- Global Average Pooling
|
483 |
+
- Max Pooling
|
484 |
+
- ReLU
|
485 |
+
- Residual Block
|
486 |
+
- Residual Connection
|
487 |
+
- Softmax
|
488 |
+
Tasks:
|
489 |
+
- Image Classification
|
490 |
+
Training Data:
|
491 |
+
- ImageNet
|
492 |
+
ID: gluon_resnet50_v1s
|
493 |
+
Crop Pct: '0.875'
|
494 |
+
Image Size: '224'
|
495 |
+
Interpolation: bicubic
|
496 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L156
|
497 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1s-1762acc0.pth
|
498 |
+
Results:
|
499 |
+
- Task: Image Classification
|
500 |
+
Dataset: ImageNet
|
501 |
+
Metrics:
|
502 |
+
Top 1 Accuracy: 78.7%
|
503 |
+
Top 5 Accuracy: 94.25%
|
504 |
+
-->
|
docs/models/.templates/models/gloun-resnext.md
ADDED
@@ -0,0 +1,142 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# (Gluon) ResNeXt
|
2 |
+
|
3 |
+
A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension, *cardinality* (the size of the set of transformations) $C$, as an essential factor in addition to the dimensions of depth and width.
|
4 |
+
|
5 |
+
The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@article{DBLP:journals/corr/XieGDTH16,
|
17 |
+
author = {Saining Xie and
|
18 |
+
Ross B. Girshick and
|
19 |
+
Piotr Doll{\'{a}}r and
|
20 |
+
Zhuowen Tu and
|
21 |
+
Kaiming He},
|
22 |
+
title = {Aggregated Residual Transformations for Deep Neural Networks},
|
23 |
+
journal = {CoRR},
|
24 |
+
volume = {abs/1611.05431},
|
25 |
+
year = {2016},
|
26 |
+
url = {http://arxiv.org/abs/1611.05431},
|
27 |
+
archivePrefix = {arXiv},
|
28 |
+
eprint = {1611.05431},
|
29 |
+
timestamp = {Mon, 13 Aug 2018 16:45:58 +0200},
|
30 |
+
biburl = {https://dblp.org/rec/journals/corr/XieGDTH16.bib},
|
31 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
32 |
+
}
|
33 |
+
```
|
34 |
+
|
35 |
+
<!--
|
36 |
+
Type: model-index
|
37 |
+
Collections:
|
38 |
+
- Name: Gloun ResNeXt
|
39 |
+
Paper:
|
40 |
+
Title: Aggregated Residual Transformations for Deep Neural Networks
|
41 |
+
URL: https://paperswithcode.com/paper/aggregated-residual-transformations-for-deep
|
42 |
+
Models:
|
43 |
+
- Name: gluon_resnext101_32x4d
|
44 |
+
In Collection: Gloun ResNeXt
|
45 |
+
Metadata:
|
46 |
+
FLOPs: 10298145792
|
47 |
+
Parameters: 44180000
|
48 |
+
File Size: 177367414
|
49 |
+
Architecture:
|
50 |
+
- 1x1 Convolution
|
51 |
+
- Batch Normalization
|
52 |
+
- Convolution
|
53 |
+
- Global Average Pooling
|
54 |
+
- Grouped Convolution
|
55 |
+
- Max Pooling
|
56 |
+
- ReLU
|
57 |
+
- ResNeXt Block
|
58 |
+
- Residual Connection
|
59 |
+
- Softmax
|
60 |
+
Tasks:
|
61 |
+
- Image Classification
|
62 |
+
Training Data:
|
63 |
+
- ImageNet
|
64 |
+
ID: gluon_resnext101_32x4d
|
65 |
+
Crop Pct: '0.875'
|
66 |
+
Image Size: '224'
|
67 |
+
Interpolation: bicubic
|
68 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L193
|
69 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnext101_32x4d-b253c8c4.pth
|
70 |
+
Results:
|
71 |
+
- Task: Image Classification
|
72 |
+
Dataset: ImageNet
|
73 |
+
Metrics:
|
74 |
+
Top 1 Accuracy: 80.33%
|
75 |
+
Top 5 Accuracy: 94.91%
|
76 |
+
- Name: gluon_resnext101_64x4d
|
77 |
+
In Collection: Gloun ResNeXt
|
78 |
+
Metadata:
|
79 |
+
FLOPs: 19954172928
|
80 |
+
Parameters: 83460000
|
81 |
+
File Size: 334737852
|
82 |
+
Architecture:
|
83 |
+
- 1x1 Convolution
|
84 |
+
- Batch Normalization
|
85 |
+
- Convolution
|
86 |
+
- Global Average Pooling
|
87 |
+
- Grouped Convolution
|
88 |
+
- Max Pooling
|
89 |
+
- ReLU
|
90 |
+
- ResNeXt Block
|
91 |
+
- Residual Connection
|
92 |
+
- Softmax
|
93 |
+
Tasks:
|
94 |
+
- Image Classification
|
95 |
+
Training Data:
|
96 |
+
- ImageNet
|
97 |
+
ID: gluon_resnext101_64x4d
|
98 |
+
Crop Pct: '0.875'
|
99 |
+
Image Size: '224'
|
100 |
+
Interpolation: bicubic
|
101 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L201
|
102 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnext101_64x4d-f9a8e184.pth
|
103 |
+
Results:
|
104 |
+
- Task: Image Classification
|
105 |
+
Dataset: ImageNet
|
106 |
+
Metrics:
|
107 |
+
Top 1 Accuracy: 80.63%
|
108 |
+
Top 5 Accuracy: 95.0%
|
109 |
+
- Name: gluon_resnext50_32x4d
|
110 |
+
In Collection: Gloun ResNeXt
|
111 |
+
Metadata:
|
112 |
+
FLOPs: 5472648192
|
113 |
+
Parameters: 25030000
|
114 |
+
File Size: 100441719
|
115 |
+
Architecture:
|
116 |
+
- 1x1 Convolution
|
117 |
+
- Batch Normalization
|
118 |
+
- Convolution
|
119 |
+
- Global Average Pooling
|
120 |
+
- Grouped Convolution
|
121 |
+
- Max Pooling
|
122 |
+
- ReLU
|
123 |
+
- ResNeXt Block
|
124 |
+
- Residual Connection
|
125 |
+
- Softmax
|
126 |
+
Tasks:
|
127 |
+
- Image Classification
|
128 |
+
Training Data:
|
129 |
+
- ImageNet
|
130 |
+
ID: gluon_resnext50_32x4d
|
131 |
+
Crop Pct: '0.875'
|
132 |
+
Image Size: '224'
|
133 |
+
Interpolation: bicubic
|
134 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L185
|
135 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnext50_32x4d-e6a097c1.pth
|
136 |
+
Results:
|
137 |
+
- Task: Image Classification
|
138 |
+
Dataset: ImageNet
|
139 |
+
Metrics:
|
140 |
+
Top 1 Accuracy: 79.35%
|
141 |
+
Top 5 Accuracy: 94.42%
|
142 |
+
-->
|
docs/models/.templates/models/gloun-senet.md
ADDED
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# (Gluon) SENet
|
2 |
+
|
3 |
+
A **SENet** is a convolutional neural network architecture that employs [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block) to enable the network to perform dynamic channel-wise feature recalibration.
|
4 |
+
|
5 |
+
The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{hu2019squeezeandexcitation,
|
17 |
+
title={Squeeze-and-Excitation Networks},
|
18 |
+
author={Jie Hu and Li Shen and Samuel Albanie and Gang Sun and Enhua Wu},
|
19 |
+
year={2019},
|
20 |
+
eprint={1709.01507},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: Gloun SENet
|
30 |
+
Paper:
|
31 |
+
Title: Squeeze-and-Excitation Networks
|
32 |
+
URL: https://paperswithcode.com/paper/squeeze-and-excitation-networks
|
33 |
+
Models:
|
34 |
+
- Name: gluon_senet154
|
35 |
+
In Collection: Gloun SENet
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 26681705136
|
38 |
+
Parameters: 115090000
|
39 |
+
File Size: 461546622
|
40 |
+
Architecture:
|
41 |
+
- Convolution
|
42 |
+
- Dense Connections
|
43 |
+
- Global Average Pooling
|
44 |
+
- Max Pooling
|
45 |
+
- Softmax
|
46 |
+
- Squeeze-and-Excitation Block
|
47 |
+
Tasks:
|
48 |
+
- Image Classification
|
49 |
+
Training Data:
|
50 |
+
- ImageNet
|
51 |
+
ID: gluon_senet154
|
52 |
+
Crop Pct: '0.875'
|
53 |
+
Image Size: '224'
|
54 |
+
Interpolation: bicubic
|
55 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L239
|
56 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_senet154-70a1a3c0.pth
|
57 |
+
Results:
|
58 |
+
- Task: Image Classification
|
59 |
+
Dataset: ImageNet
|
60 |
+
Metrics:
|
61 |
+
Top 1 Accuracy: 81.23%
|
62 |
+
Top 5 Accuracy: 95.35%
|
63 |
+
-->
|
docs/models/.templates/models/gloun-seresnext.md
ADDED
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# (Gluon) SE-ResNeXt
|
2 |
+
|
3 |
+
**SE ResNeXt** is a variant of a [ResNext](https://www.paperswithcode.com/method/resnext) that employs [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block) to enable the network to perform dynamic channel-wise feature recalibration.
|
4 |
+
|
5 |
+
The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{hu2019squeezeandexcitation,
|
17 |
+
title={Squeeze-and-Excitation Networks},
|
18 |
+
author={Jie Hu and Li Shen and Samuel Albanie and Gang Sun and Enhua Wu},
|
19 |
+
year={2019},
|
20 |
+
eprint={1709.01507},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: Gloun SEResNeXt
|
30 |
+
Paper:
|
31 |
+
Title: Squeeze-and-Excitation Networks
|
32 |
+
URL: https://paperswithcode.com/paper/squeeze-and-excitation-networks
|
33 |
+
Models:
|
34 |
+
- Name: gluon_seresnext101_32x4d
|
35 |
+
In Collection: Gloun SEResNeXt
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 10302923504
|
38 |
+
Parameters: 48960000
|
39 |
+
File Size: 196505510
|
40 |
+
Architecture:
|
41 |
+
- 1x1 Convolution
|
42 |
+
- Batch Normalization
|
43 |
+
- Convolution
|
44 |
+
- Global Average Pooling
|
45 |
+
- Grouped Convolution
|
46 |
+
- Max Pooling
|
47 |
+
- ReLU
|
48 |
+
- ResNeXt Block
|
49 |
+
- Residual Connection
|
50 |
+
- Softmax
|
51 |
+
- Squeeze-and-Excitation Block
|
52 |
+
Tasks:
|
53 |
+
- Image Classification
|
54 |
+
Training Data:
|
55 |
+
- ImageNet
|
56 |
+
ID: gluon_seresnext101_32x4d
|
57 |
+
Crop Pct: '0.875'
|
58 |
+
Image Size: '224'
|
59 |
+
Interpolation: bicubic
|
60 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L219
|
61 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext101_32x4d-cf52900d.pth
|
62 |
+
Results:
|
63 |
+
- Task: Image Classification
|
64 |
+
Dataset: ImageNet
|
65 |
+
Metrics:
|
66 |
+
Top 1 Accuracy: 80.87%
|
67 |
+
Top 5 Accuracy: 95.29%
|
68 |
+
- Name: gluon_seresnext101_64x4d
|
69 |
+
In Collection: Gloun SEResNeXt
|
70 |
+
Metadata:
|
71 |
+
FLOPs: 19958950640
|
72 |
+
Parameters: 88230000
|
73 |
+
File Size: 353875948
|
74 |
+
Architecture:
|
75 |
+
- 1x1 Convolution
|
76 |
+
- Batch Normalization
|
77 |
+
- Convolution
|
78 |
+
- Global Average Pooling
|
79 |
+
- Grouped Convolution
|
80 |
+
- Max Pooling
|
81 |
+
- ReLU
|
82 |
+
- ResNeXt Block
|
83 |
+
- Residual Connection
|
84 |
+
- Softmax
|
85 |
+
- Squeeze-and-Excitation Block
|
86 |
+
Tasks:
|
87 |
+
- Image Classification
|
88 |
+
Training Data:
|
89 |
+
- ImageNet
|
90 |
+
ID: gluon_seresnext101_64x4d
|
91 |
+
Crop Pct: '0.875'
|
92 |
+
Image Size: '224'
|
93 |
+
Interpolation: bicubic
|
94 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L229
|
95 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext101_64x4d-f9926f93.pth
|
96 |
+
Results:
|
97 |
+
- Task: Image Classification
|
98 |
+
Dataset: ImageNet
|
99 |
+
Metrics:
|
100 |
+
Top 1 Accuracy: 80.88%
|
101 |
+
Top 5 Accuracy: 95.31%
|
102 |
+
- Name: gluon_seresnext50_32x4d
|
103 |
+
In Collection: Gloun SEResNeXt
|
104 |
+
Metadata:
|
105 |
+
FLOPs: 5475179184
|
106 |
+
Parameters: 27560000
|
107 |
+
File Size: 110578827
|
108 |
+
Architecture:
|
109 |
+
- 1x1 Convolution
|
110 |
+
- Batch Normalization
|
111 |
+
- Convolution
|
112 |
+
- Global Average Pooling
|
113 |
+
- Grouped Convolution
|
114 |
+
- Max Pooling
|
115 |
+
- ReLU
|
116 |
+
- ResNeXt Block
|
117 |
+
- Residual Connection
|
118 |
+
- Softmax
|
119 |
+
- Squeeze-and-Excitation Block
|
120 |
+
Tasks:
|
121 |
+
- Image Classification
|
122 |
+
Training Data:
|
123 |
+
- ImageNet
|
124 |
+
ID: gluon_seresnext50_32x4d
|
125 |
+
Crop Pct: '0.875'
|
126 |
+
Image Size: '224'
|
127 |
+
Interpolation: bicubic
|
128 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L209
|
129 |
+
Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext50_32x4d-90cf2d6e.pth
|
130 |
+
Results:
|
131 |
+
- Task: Image Classification
|
132 |
+
Dataset: ImageNet
|
133 |
+
Metrics:
|
134 |
+
Top 1 Accuracy: 79.92%
|
135 |
+
Top 5 Accuracy: 94.82%
|
136 |
+
-->
|
docs/models/.templates/models/gloun-xception.md
ADDED
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# (Gluon) Xception
|
2 |
+
|
3 |
+
**Xception** is a convolutional neural network architecture that relies solely on [depthwise separable convolution](https://paperswithcode.com/method/depthwise-separable-convolution) layers.
|
4 |
+
|
5 |
+
The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
|
6 |
+
|
7 |
+
{% include 'code_snippets.md' %}
|
8 |
+
|
9 |
+
## How do I train this model?
|
10 |
+
|
11 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
12 |
+
|
13 |
+
## Citation
|
14 |
+
|
15 |
+
```BibTeX
|
16 |
+
@misc{chollet2017xception,
|
17 |
+
title={Xception: Deep Learning with Depthwise Separable Convolutions},
|
18 |
+
author={François Chollet},
|
19 |
+
year={2017},
|
20 |
+
eprint={1610.02357},
|
21 |
+
archivePrefix={arXiv},
|
22 |
+
primaryClass={cs.CV}
|
23 |
+
}
|
24 |
+
```
|
25 |
+
|
26 |
+
<!--
|
27 |
+
Type: model-index
|
28 |
+
Collections:
|
29 |
+
- Name: Gloun Xception
|
30 |
+
Paper:
|
31 |
+
Title: 'Xception: Deep Learning with Depthwise Separable Convolutions'
|
32 |
+
URL: https://paperswithcode.com/paper/xception-deep-learning-with-depthwise
|
33 |
+
Models:
|
34 |
+
- Name: gluon_xception65
|
35 |
+
In Collection: Gloun Xception
|
36 |
+
Metadata:
|
37 |
+
FLOPs: 17594889728
|
38 |
+
Parameters: 39920000
|
39 |
+
File Size: 160551306
|
40 |
+
Architecture:
|
41 |
+
- 1x1 Convolution
|
42 |
+
- Convolution
|
43 |
+
- Dense Connections
|
44 |
+
- Depthwise Separable Convolution
|
45 |
+
- Global Average Pooling
|
46 |
+
- Max Pooling
|
47 |
+
- ReLU
|
48 |
+
- Residual Connection
|
49 |
+
- Softmax
|
50 |
+
Tasks:
|
51 |
+
- Image Classification
|
52 |
+
Training Data:
|
53 |
+
- ImageNet
|
54 |
+
ID: gluon_xception65
|
55 |
+
Crop Pct: '0.903'
|
56 |
+
Image Size: '299'
|
57 |
+
Interpolation: bicubic
|
58 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_xception.py#L241
|
59 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/gluon_xception-7015a15c.pth
|
60 |
+
Results:
|
61 |
+
- Task: Image Classification
|
62 |
+
Dataset: ImageNet
|
63 |
+
Metrics:
|
64 |
+
Top 1 Accuracy: 79.7%
|
65 |
+
Top 5 Accuracy: 94.87%
|
66 |
+
-->
|
docs/models/.templates/models/hrnet.md
ADDED
@@ -0,0 +1,358 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# HRNet
|
2 |
+
|
3 |
+
**HRNet**, or **High-Resolution Net**, is a general purpose convolutional neural network for tasks like semantic segmentation, object detection and image classification. It is able to maintain high resolution representations through the whole process. We start from a high-resolution convolution stream, gradually add high-to-low resolution convolution streams one by one, and connect the multi-resolution streams in parallel. The resulting network consists of several ($4$ in the paper) stages and the $n$th stage contains $n$ streams corresponding to $n$ resolutions. The authors conduct repeated multi-resolution fusions by exchanging the information across the parallel streams over and over.
|
4 |
+
|
5 |
+
{% include 'code_snippets.md' %}
|
6 |
+
|
7 |
+
## How do I train this model?
|
8 |
+
|
9 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
10 |
+
|
11 |
+
## Citation
|
12 |
+
|
13 |
+
```BibTeX
|
14 |
+
@misc{sun2019highresolution,
|
15 |
+
title={High-Resolution Representations for Labeling Pixels and Regions},
|
16 |
+
author={Ke Sun and Yang Zhao and Borui Jiang and Tianheng Cheng and Bin Xiao and Dong Liu and Yadong Mu and Xinggang Wang and Wenyu Liu and Jingdong Wang},
|
17 |
+
year={2019},
|
18 |
+
eprint={1904.04514},
|
19 |
+
archivePrefix={arXiv},
|
20 |
+
primaryClass={cs.CV}
|
21 |
+
}
|
22 |
+
```
|
23 |
+
|
24 |
+
<!--
|
25 |
+
Type: model-index
|
26 |
+
Collections:
|
27 |
+
- Name: HRNet
|
28 |
+
Paper:
|
29 |
+
Title: Deep High-Resolution Representation Learning for Visual Recognition
|
30 |
+
URL: https://paperswithcode.com/paper/190807919
|
31 |
+
Models:
|
32 |
+
- Name: hrnet_w18
|
33 |
+
In Collection: HRNet
|
34 |
+
Metadata:
|
35 |
+
FLOPs: 5547205500
|
36 |
+
Parameters: 21300000
|
37 |
+
File Size: 85718883
|
38 |
+
Architecture:
|
39 |
+
- Batch Normalization
|
40 |
+
- Convolution
|
41 |
+
- ReLU
|
42 |
+
- Residual Connection
|
43 |
+
Tasks:
|
44 |
+
- Image Classification
|
45 |
+
Training Techniques:
|
46 |
+
- Nesterov Accelerated Gradient
|
47 |
+
- Weight Decay
|
48 |
+
Training Data:
|
49 |
+
- ImageNet
|
50 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
51 |
+
ID: hrnet_w18
|
52 |
+
Epochs: 100
|
53 |
+
Layers: 18
|
54 |
+
Crop Pct: '0.875'
|
55 |
+
Momentum: 0.9
|
56 |
+
Batch Size: 256
|
57 |
+
Image Size: '224'
|
58 |
+
Weight Decay: 0.001
|
59 |
+
Interpolation: bilinear
|
60 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L800
|
61 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w18-8cb57bb9.pth
|
62 |
+
Results:
|
63 |
+
- Task: Image Classification
|
64 |
+
Dataset: ImageNet
|
65 |
+
Metrics:
|
66 |
+
Top 1 Accuracy: 76.76%
|
67 |
+
Top 5 Accuracy: 93.44%
|
68 |
+
- Name: hrnet_w18_small
|
69 |
+
In Collection: HRNet
|
70 |
+
Metadata:
|
71 |
+
FLOPs: 2071651488
|
72 |
+
Parameters: 13190000
|
73 |
+
File Size: 52934302
|
74 |
+
Architecture:
|
75 |
+
- Batch Normalization
|
76 |
+
- Convolution
|
77 |
+
- ReLU
|
78 |
+
- Residual Connection
|
79 |
+
Tasks:
|
80 |
+
- Image Classification
|
81 |
+
Training Techniques:
|
82 |
+
- Nesterov Accelerated Gradient
|
83 |
+
- Weight Decay
|
84 |
+
Training Data:
|
85 |
+
- ImageNet
|
86 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
87 |
+
ID: hrnet_w18_small
|
88 |
+
Epochs: 100
|
89 |
+
Layers: 18
|
90 |
+
Crop Pct: '0.875'
|
91 |
+
Momentum: 0.9
|
92 |
+
Batch Size: 256
|
93 |
+
Image Size: '224'
|
94 |
+
Weight Decay: 0.001
|
95 |
+
Interpolation: bilinear
|
96 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L790
|
97 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnet_w18_small_v1-f460c6bc.pth
|
98 |
+
Results:
|
99 |
+
- Task: Image Classification
|
100 |
+
Dataset: ImageNet
|
101 |
+
Metrics:
|
102 |
+
Top 1 Accuracy: 72.34%
|
103 |
+
Top 5 Accuracy: 90.68%
|
104 |
+
- Name: hrnet_w18_small_v2
|
105 |
+
In Collection: HRNet
|
106 |
+
Metadata:
|
107 |
+
FLOPs: 3360023160
|
108 |
+
Parameters: 15600000
|
109 |
+
File Size: 62682879
|
110 |
+
Architecture:
|
111 |
+
- Batch Normalization
|
112 |
+
- Convolution
|
113 |
+
- ReLU
|
114 |
+
- Residual Connection
|
115 |
+
Tasks:
|
116 |
+
- Image Classification
|
117 |
+
Training Techniques:
|
118 |
+
- Nesterov Accelerated Gradient
|
119 |
+
- Weight Decay
|
120 |
+
Training Data:
|
121 |
+
- ImageNet
|
122 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
123 |
+
ID: hrnet_w18_small_v2
|
124 |
+
Epochs: 100
|
125 |
+
Layers: 18
|
126 |
+
Crop Pct: '0.875'
|
127 |
+
Momentum: 0.9
|
128 |
+
Batch Size: 256
|
129 |
+
Image Size: '224'
|
130 |
+
Weight Decay: 0.001
|
131 |
+
Interpolation: bilinear
|
132 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L795
|
133 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnet_w18_small_v2-4c50a8cb.pth
|
134 |
+
Results:
|
135 |
+
- Task: Image Classification
|
136 |
+
Dataset: ImageNet
|
137 |
+
Metrics:
|
138 |
+
Top 1 Accuracy: 75.11%
|
139 |
+
Top 5 Accuracy: 92.41%
|
140 |
+
- Name: hrnet_w30
|
141 |
+
In Collection: HRNet
|
142 |
+
Metadata:
|
143 |
+
FLOPs: 10474119492
|
144 |
+
Parameters: 37710000
|
145 |
+
File Size: 151452218
|
146 |
+
Architecture:
|
147 |
+
- Batch Normalization
|
148 |
+
- Convolution
|
149 |
+
- ReLU
|
150 |
+
- Residual Connection
|
151 |
+
Tasks:
|
152 |
+
- Image Classification
|
153 |
+
Training Techniques:
|
154 |
+
- Nesterov Accelerated Gradient
|
155 |
+
- Weight Decay
|
156 |
+
Training Data:
|
157 |
+
- ImageNet
|
158 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
159 |
+
ID: hrnet_w30
|
160 |
+
Epochs: 100
|
161 |
+
Layers: 30
|
162 |
+
Crop Pct: '0.875'
|
163 |
+
Momentum: 0.9
|
164 |
+
Batch Size: 256
|
165 |
+
Image Size: '224'
|
166 |
+
Weight Decay: 0.001
|
167 |
+
Interpolation: bilinear
|
168 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L805
|
169 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w30-8d7f8dab.pth
|
170 |
+
Results:
|
171 |
+
- Task: Image Classification
|
172 |
+
Dataset: ImageNet
|
173 |
+
Metrics:
|
174 |
+
Top 1 Accuracy: 78.21%
|
175 |
+
Top 5 Accuracy: 94.22%
|
176 |
+
- Name: hrnet_w32
|
177 |
+
In Collection: HRNet
|
178 |
+
Metadata:
|
179 |
+
FLOPs: 11524528320
|
180 |
+
Parameters: 41230000
|
181 |
+
File Size: 165547812
|
182 |
+
Architecture:
|
183 |
+
- Batch Normalization
|
184 |
+
- Convolution
|
185 |
+
- ReLU
|
186 |
+
- Residual Connection
|
187 |
+
Tasks:
|
188 |
+
- Image Classification
|
189 |
+
Training Techniques:
|
190 |
+
- Nesterov Accelerated Gradient
|
191 |
+
- Weight Decay
|
192 |
+
Training Data:
|
193 |
+
- ImageNet
|
194 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
195 |
+
Training Time: 60 hours
|
196 |
+
ID: hrnet_w32
|
197 |
+
Epochs: 100
|
198 |
+
Layers: 32
|
199 |
+
Crop Pct: '0.875'
|
200 |
+
Momentum: 0.9
|
201 |
+
Batch Size: 256
|
202 |
+
Image Size: '224'
|
203 |
+
Weight Decay: 0.001
|
204 |
+
Interpolation: bilinear
|
205 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L810
|
206 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w32-90d8c5fb.pth
|
207 |
+
Results:
|
208 |
+
- Task: Image Classification
|
209 |
+
Dataset: ImageNet
|
210 |
+
Metrics:
|
211 |
+
Top 1 Accuracy: 78.45%
|
212 |
+
Top 5 Accuracy: 94.19%
|
213 |
+
- Name: hrnet_w40
|
214 |
+
In Collection: HRNet
|
215 |
+
Metadata:
|
216 |
+
FLOPs: 16381182192
|
217 |
+
Parameters: 57560000
|
218 |
+
File Size: 230899236
|
219 |
+
Architecture:
|
220 |
+
- Batch Normalization
|
221 |
+
- Convolution
|
222 |
+
- ReLU
|
223 |
+
- Residual Connection
|
224 |
+
Tasks:
|
225 |
+
- Image Classification
|
226 |
+
Training Techniques:
|
227 |
+
- Nesterov Accelerated Gradient
|
228 |
+
- Weight Decay
|
229 |
+
Training Data:
|
230 |
+
- ImageNet
|
231 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
232 |
+
ID: hrnet_w40
|
233 |
+
Epochs: 100
|
234 |
+
Layers: 40
|
235 |
+
Crop Pct: '0.875'
|
236 |
+
Momentum: 0.9
|
237 |
+
Batch Size: 256
|
238 |
+
Image Size: '224'
|
239 |
+
Weight Decay: 0.001
|
240 |
+
Interpolation: bilinear
|
241 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L815
|
242 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w40-7cd397a4.pth
|
243 |
+
Results:
|
244 |
+
- Task: Image Classification
|
245 |
+
Dataset: ImageNet
|
246 |
+
Metrics:
|
247 |
+
Top 1 Accuracy: 78.93%
|
248 |
+
Top 5 Accuracy: 94.48%
|
249 |
+
- Name: hrnet_w44
|
250 |
+
In Collection: HRNet
|
251 |
+
Metadata:
|
252 |
+
FLOPs: 19202520264
|
253 |
+
Parameters: 67060000
|
254 |
+
File Size: 268957432
|
255 |
+
Architecture:
|
256 |
+
- Batch Normalization
|
257 |
+
- Convolution
|
258 |
+
- ReLU
|
259 |
+
- Residual Connection
|
260 |
+
Tasks:
|
261 |
+
- Image Classification
|
262 |
+
Training Techniques:
|
263 |
+
- Nesterov Accelerated Gradient
|
264 |
+
- Weight Decay
|
265 |
+
Training Data:
|
266 |
+
- ImageNet
|
267 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
268 |
+
ID: hrnet_w44
|
269 |
+
Epochs: 100
|
270 |
+
Layers: 44
|
271 |
+
Crop Pct: '0.875'
|
272 |
+
Momentum: 0.9
|
273 |
+
Batch Size: 256
|
274 |
+
Image Size: '224'
|
275 |
+
Weight Decay: 0.001
|
276 |
+
Interpolation: bilinear
|
277 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L820
|
278 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w44-c9ac8c18.pth
|
279 |
+
Results:
|
280 |
+
- Task: Image Classification
|
281 |
+
Dataset: ImageNet
|
282 |
+
Metrics:
|
283 |
+
Top 1 Accuracy: 78.89%
|
284 |
+
Top 5 Accuracy: 94.37%
|
285 |
+
- Name: hrnet_w48
|
286 |
+
In Collection: HRNet
|
287 |
+
Metadata:
|
288 |
+
FLOPs: 22285865760
|
289 |
+
Parameters: 77470000
|
290 |
+
File Size: 310603710
|
291 |
+
Architecture:
|
292 |
+
- Batch Normalization
|
293 |
+
- Convolution
|
294 |
+
- ReLU
|
295 |
+
- Residual Connection
|
296 |
+
Tasks:
|
297 |
+
- Image Classification
|
298 |
+
Training Techniques:
|
299 |
+
- Nesterov Accelerated Gradient
|
300 |
+
- Weight Decay
|
301 |
+
Training Data:
|
302 |
+
- ImageNet
|
303 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
304 |
+
Training Time: 80 hours
|
305 |
+
ID: hrnet_w48
|
306 |
+
Epochs: 100
|
307 |
+
Layers: 48
|
308 |
+
Crop Pct: '0.875'
|
309 |
+
Momentum: 0.9
|
310 |
+
Batch Size: 256
|
311 |
+
Image Size: '224'
|
312 |
+
Weight Decay: 0.001
|
313 |
+
Interpolation: bilinear
|
314 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L825
|
315 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w48-abd2e6ab.pth
|
316 |
+
Results:
|
317 |
+
- Task: Image Classification
|
318 |
+
Dataset: ImageNet
|
319 |
+
Metrics:
|
320 |
+
Top 1 Accuracy: 79.32%
|
321 |
+
Top 5 Accuracy: 94.51%
|
322 |
+
- Name: hrnet_w64
|
323 |
+
In Collection: HRNet
|
324 |
+
Metadata:
|
325 |
+
FLOPs: 37239321984
|
326 |
+
Parameters: 128060000
|
327 |
+
File Size: 513071818
|
328 |
+
Architecture:
|
329 |
+
- Batch Normalization
|
330 |
+
- Convolution
|
331 |
+
- ReLU
|
332 |
+
- Residual Connection
|
333 |
+
Tasks:
|
334 |
+
- Image Classification
|
335 |
+
Training Techniques:
|
336 |
+
- Nesterov Accelerated Gradient
|
337 |
+
- Weight Decay
|
338 |
+
Training Data:
|
339 |
+
- ImageNet
|
340 |
+
Training Resources: 4x NVIDIA V100 GPUs
|
341 |
+
ID: hrnet_w64
|
342 |
+
Epochs: 100
|
343 |
+
Layers: 64
|
344 |
+
Crop Pct: '0.875'
|
345 |
+
Momentum: 0.9
|
346 |
+
Batch Size: 256
|
347 |
+
Image Size: '224'
|
348 |
+
Weight Decay: 0.001
|
349 |
+
Interpolation: bilinear
|
350 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L830
|
351 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w64-b47cc881.pth
|
352 |
+
Results:
|
353 |
+
- Task: Image Classification
|
354 |
+
Dataset: ImageNet
|
355 |
+
Metrics:
|
356 |
+
Top 1 Accuracy: 79.46%
|
357 |
+
Top 5 Accuracy: 94.65%
|
358 |
+
-->
|
docs/models/.templates/models/ig-resnext.md
ADDED
@@ -0,0 +1,209 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Instagram ResNeXt WSL
|
2 |
+
|
3 |
+
A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension, *cardinality* (the size of the set of transformations) $C$, as an essential factor in addition to the dimensions of depth and width.
|
4 |
+
|
5 |
+
This model was trained on billions of Instagram images using thousands of distinct hashtags as labels exhibit excellent transfer learning performance.
|
6 |
+
|
7 |
+
Please note the CC-BY-NC 4.0 license on theses weights, non-commercial use only.
|
8 |
+
|
9 |
+
{% include 'code_snippets.md' %}
|
10 |
+
|
11 |
+
## How do I train this model?
|
12 |
+
|
13 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
14 |
+
|
15 |
+
## Citation
|
16 |
+
|
17 |
+
```BibTeX
|
18 |
+
@misc{mahajan2018exploring,
|
19 |
+
title={Exploring the Limits of Weakly Supervised Pretraining},
|
20 |
+
author={Dhruv Mahajan and Ross Girshick and Vignesh Ramanathan and Kaiming He and Manohar Paluri and Yixuan Li and Ashwin Bharambe and Laurens van der Maaten},
|
21 |
+
year={2018},
|
22 |
+
eprint={1805.00932},
|
23 |
+
archivePrefix={arXiv},
|
24 |
+
primaryClass={cs.CV}
|
25 |
+
}
|
26 |
+
```
|
27 |
+
|
28 |
+
<!--
|
29 |
+
Type: model-index
|
30 |
+
Collections:
|
31 |
+
- Name: IG ResNeXt
|
32 |
+
Paper:
|
33 |
+
Title: Exploring the Limits of Weakly Supervised Pretraining
|
34 |
+
URL: https://paperswithcode.com/paper/exploring-the-limits-of-weakly-supervised
|
35 |
+
Models:
|
36 |
+
- Name: ig_resnext101_32x16d
|
37 |
+
In Collection: IG ResNeXt
|
38 |
+
Metadata:
|
39 |
+
FLOPs: 46623691776
|
40 |
+
Parameters: 194030000
|
41 |
+
File Size: 777518664
|
42 |
+
Architecture:
|
43 |
+
- 1x1 Convolution
|
44 |
+
- Batch Normalization
|
45 |
+
- Convolution
|
46 |
+
- Global Average Pooling
|
47 |
+
- Grouped Convolution
|
48 |
+
- Max Pooling
|
49 |
+
- ReLU
|
50 |
+
- ResNeXt Block
|
51 |
+
- Residual Connection
|
52 |
+
- Softmax
|
53 |
+
Tasks:
|
54 |
+
- Image Classification
|
55 |
+
Training Techniques:
|
56 |
+
- Nesterov Accelerated Gradient
|
57 |
+
- Weight Decay
|
58 |
+
Training Data:
|
59 |
+
- IG-3.5B-17k
|
60 |
+
- ImageNet
|
61 |
+
Training Resources: 336x GPUs
|
62 |
+
ID: ig_resnext101_32x16d
|
63 |
+
Epochs: 100
|
64 |
+
Layers: 101
|
65 |
+
Crop Pct: '0.875'
|
66 |
+
Momentum: 0.9
|
67 |
+
Batch Size: 8064
|
68 |
+
Image Size: '224'
|
69 |
+
Weight Decay: 0.001
|
70 |
+
Interpolation: bilinear
|
71 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L874
|
72 |
+
Weights: https://download.pytorch.org/models/ig_resnext101_32x16-c6f796b0.pth
|
73 |
+
Results:
|
74 |
+
- Task: Image Classification
|
75 |
+
Dataset: ImageNet
|
76 |
+
Metrics:
|
77 |
+
Top 1 Accuracy: 84.16%
|
78 |
+
Top 5 Accuracy: 97.19%
|
79 |
+
- Name: ig_resnext101_32x32d
|
80 |
+
In Collection: IG ResNeXt
|
81 |
+
Metadata:
|
82 |
+
FLOPs: 112225170432
|
83 |
+
Parameters: 468530000
|
84 |
+
File Size: 1876573776
|
85 |
+
Architecture:
|
86 |
+
- 1x1 Convolution
|
87 |
+
- Batch Normalization
|
88 |
+
- Convolution
|
89 |
+
- Global Average Pooling
|
90 |
+
- Grouped Convolution
|
91 |
+
- Max Pooling
|
92 |
+
- ReLU
|
93 |
+
- ResNeXt Block
|
94 |
+
- Residual Connection
|
95 |
+
- Softmax
|
96 |
+
Tasks:
|
97 |
+
- Image Classification
|
98 |
+
Training Techniques:
|
99 |
+
- Nesterov Accelerated Gradient
|
100 |
+
- Weight Decay
|
101 |
+
Training Data:
|
102 |
+
- IG-3.5B-17k
|
103 |
+
- ImageNet
|
104 |
+
Training Resources: 336x GPUs
|
105 |
+
ID: ig_resnext101_32x32d
|
106 |
+
Epochs: 100
|
107 |
+
Layers: 101
|
108 |
+
Crop Pct: '0.875'
|
109 |
+
Momentum: 0.9
|
110 |
+
Batch Size: 8064
|
111 |
+
Image Size: '224'
|
112 |
+
Weight Decay: 0.001
|
113 |
+
Interpolation: bilinear
|
114 |
+
Minibatch Size: 8064
|
115 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L885
|
116 |
+
Weights: https://download.pytorch.org/models/ig_resnext101_32x32-e4b90b00.pth
|
117 |
+
Results:
|
118 |
+
- Task: Image Classification
|
119 |
+
Dataset: ImageNet
|
120 |
+
Metrics:
|
121 |
+
Top 1 Accuracy: 85.09%
|
122 |
+
Top 5 Accuracy: 97.44%
|
123 |
+
- Name: ig_resnext101_32x48d
|
124 |
+
In Collection: IG ResNeXt
|
125 |
+
Metadata:
|
126 |
+
FLOPs: 197446554624
|
127 |
+
Parameters: 828410000
|
128 |
+
File Size: 3317136976
|
129 |
+
Architecture:
|
130 |
+
- 1x1 Convolution
|
131 |
+
- Batch Normalization
|
132 |
+
- Convolution
|
133 |
+
- Global Average Pooling
|
134 |
+
- Grouped Convolution
|
135 |
+
- Max Pooling
|
136 |
+
- ReLU
|
137 |
+
- ResNeXt Block
|
138 |
+
- Residual Connection
|
139 |
+
- Softmax
|
140 |
+
Tasks:
|
141 |
+
- Image Classification
|
142 |
+
Training Techniques:
|
143 |
+
- Nesterov Accelerated Gradient
|
144 |
+
- Weight Decay
|
145 |
+
Training Data:
|
146 |
+
- IG-3.5B-17k
|
147 |
+
- ImageNet
|
148 |
+
Training Resources: 336x GPUs
|
149 |
+
ID: ig_resnext101_32x48d
|
150 |
+
Epochs: 100
|
151 |
+
Layers: 101
|
152 |
+
Crop Pct: '0.875'
|
153 |
+
Momentum: 0.9
|
154 |
+
Batch Size: 8064
|
155 |
+
Image Size: '224'
|
156 |
+
Weight Decay: 0.001
|
157 |
+
Interpolation: bilinear
|
158 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L896
|
159 |
+
Weights: https://download.pytorch.org/models/ig_resnext101_32x48-3e41cc8a.pth
|
160 |
+
Results:
|
161 |
+
- Task: Image Classification
|
162 |
+
Dataset: ImageNet
|
163 |
+
Metrics:
|
164 |
+
Top 1 Accuracy: 85.42%
|
165 |
+
Top 5 Accuracy: 97.58%
|
166 |
+
- Name: ig_resnext101_32x8d
|
167 |
+
In Collection: IG ResNeXt
|
168 |
+
Metadata:
|
169 |
+
FLOPs: 21180417024
|
170 |
+
Parameters: 88790000
|
171 |
+
File Size: 356056638
|
172 |
+
Architecture:
|
173 |
+
- 1x1 Convolution
|
174 |
+
- Batch Normalization
|
175 |
+
- Convolution
|
176 |
+
- Global Average Pooling
|
177 |
+
- Grouped Convolution
|
178 |
+
- Max Pooling
|
179 |
+
- ReLU
|
180 |
+
- ResNeXt Block
|
181 |
+
- Residual Connection
|
182 |
+
- Softmax
|
183 |
+
Tasks:
|
184 |
+
- Image Classification
|
185 |
+
Training Techniques:
|
186 |
+
- Nesterov Accelerated Gradient
|
187 |
+
- Weight Decay
|
188 |
+
Training Data:
|
189 |
+
- IG-3.5B-17k
|
190 |
+
- ImageNet
|
191 |
+
Training Resources: 336x GPUs
|
192 |
+
ID: ig_resnext101_32x8d
|
193 |
+
Epochs: 100
|
194 |
+
Layers: 101
|
195 |
+
Crop Pct: '0.875'
|
196 |
+
Momentum: 0.9
|
197 |
+
Batch Size: 8064
|
198 |
+
Image Size: '224'
|
199 |
+
Weight Decay: 0.001
|
200 |
+
Interpolation: bilinear
|
201 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L863
|
202 |
+
Weights: https://download.pytorch.org/models/ig_resnext101_32x8-c38310e5.pth
|
203 |
+
Results:
|
204 |
+
- Task: Image Classification
|
205 |
+
Dataset: ImageNet
|
206 |
+
Metrics:
|
207 |
+
Top 1 Accuracy: 82.7%
|
208 |
+
Top 5 Accuracy: 96.64%
|
209 |
+
-->
|
docs/models/.templates/models/inception-resnet-v2.md
ADDED
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Inception ResNet v2
|
2 |
+
|
3 |
+
**Inception-ResNet-v2** is a convolutional neural architecture that builds on the Inception family of architectures but incorporates [residual connections](https://paperswithcode.com/method/residual-connection) (replacing the filter concatenation stage of the Inception architecture).
|
4 |
+
|
5 |
+
{% include 'code_snippets.md' %}
|
6 |
+
|
7 |
+
## How do I train this model?
|
8 |
+
|
9 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
10 |
+
|
11 |
+
## Citation
|
12 |
+
|
13 |
+
```BibTeX
|
14 |
+
@misc{szegedy2016inceptionv4,
|
15 |
+
title={Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning},
|
16 |
+
author={Christian Szegedy and Sergey Ioffe and Vincent Vanhoucke and Alex Alemi},
|
17 |
+
year={2016},
|
18 |
+
eprint={1602.07261},
|
19 |
+
archivePrefix={arXiv},
|
20 |
+
primaryClass={cs.CV}
|
21 |
+
}
|
22 |
+
```
|
23 |
+
|
24 |
+
<!--
|
25 |
+
Type: model-index
|
26 |
+
Collections:
|
27 |
+
- Name: Inception ResNet v2
|
28 |
+
Paper:
|
29 |
+
Title: Inception-v4, Inception-ResNet and the Impact of Residual Connections on
|
30 |
+
Learning
|
31 |
+
URL: https://paperswithcode.com/paper/inception-v4-inception-resnet-and-the-impact
|
32 |
+
Models:
|
33 |
+
- Name: inception_resnet_v2
|
34 |
+
In Collection: Inception ResNet v2
|
35 |
+
Metadata:
|
36 |
+
FLOPs: 16959133120
|
37 |
+
Parameters: 55850000
|
38 |
+
File Size: 223774238
|
39 |
+
Architecture:
|
40 |
+
- Average Pooling
|
41 |
+
- Dropout
|
42 |
+
- Inception-ResNet-v2 Reduction-B
|
43 |
+
- Inception-ResNet-v2-A
|
44 |
+
- Inception-ResNet-v2-B
|
45 |
+
- Inception-ResNet-v2-C
|
46 |
+
- Reduction-A
|
47 |
+
- Softmax
|
48 |
+
Tasks:
|
49 |
+
- Image Classification
|
50 |
+
Training Techniques:
|
51 |
+
- Label Smoothing
|
52 |
+
- RMSProp
|
53 |
+
- Weight Decay
|
54 |
+
Training Data:
|
55 |
+
- ImageNet
|
56 |
+
Training Resources: 20x NVIDIA Kepler GPUs
|
57 |
+
ID: inception_resnet_v2
|
58 |
+
LR: 0.045
|
59 |
+
Dropout: 0.2
|
60 |
+
Crop Pct: '0.897'
|
61 |
+
Momentum: 0.9
|
62 |
+
Image Size: '299'
|
63 |
+
Interpolation: bicubic
|
64 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_resnet_v2.py#L343
|
65 |
+
Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/inception_resnet_v2-940b1cd6.pth
|
66 |
+
Results:
|
67 |
+
- Task: Image Classification
|
68 |
+
Dataset: ImageNet
|
69 |
+
Metrics:
|
70 |
+
Top 1 Accuracy: 0.95%
|
71 |
+
Top 5 Accuracy: 17.29%
|
72 |
+
-->
|
docs/models/.templates/models/inception-v3.md
ADDED
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Inception v3
|
2 |
+
|
3 |
+
**Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
|
4 |
+
|
5 |
+
{% include 'code_snippets.md' %}
|
6 |
+
|
7 |
+
## How do I train this model?
|
8 |
+
|
9 |
+
You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
|
10 |
+
|
11 |
+
## Citation
|
12 |
+
|
13 |
+
```BibTeX
|
14 |
+
@article{DBLP:journals/corr/SzegedyVISW15,
|
15 |
+
author = {Christian Szegedy and
|
16 |
+
Vincent Vanhoucke and
|
17 |
+
Sergey Ioffe and
|
18 |
+
Jonathon Shlens and
|
19 |
+
Zbigniew Wojna},
|
20 |
+
title = {Rethinking the Inception Architecture for Computer Vision},
|
21 |
+
journal = {CoRR},
|
22 |
+
volume = {abs/1512.00567},
|
23 |
+
year = {2015},
|
24 |
+
url = {http://arxiv.org/abs/1512.00567},
|
25 |
+
archivePrefix = {arXiv},
|
26 |
+
eprint = {1512.00567},
|
27 |
+
timestamp = {Mon, 13 Aug 2018 16:49:07 +0200},
|
28 |
+
biburl = {https://dblp.org/rec/journals/corr/SzegedyVISW15.bib},
|
29 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
30 |
+
}
|
31 |
+
```
|
32 |
+
|
33 |
+
<!--
|
34 |
+
Type: model-index
|
35 |
+
Collections:
|
36 |
+
- Name: Inception v3
|
37 |
+
Paper:
|
38 |
+
Title: Rethinking the Inception Architecture for Computer Vision
|
39 |
+
URL: https://paperswithcode.com/paper/rethinking-the-inception-architecture-for
|
40 |
+
Models:
|
41 |
+
- Name: inception_v3
|
42 |
+
In Collection: Inception v3
|
43 |
+
Metadata:
|
44 |
+
FLOPs: 7352418880
|
45 |
+
Parameters: 23830000
|
46 |
+
File Size: 108857766
|
47 |
+
Architecture:
|
48 |
+
- 1x1 Convolution
|
49 |
+
- Auxiliary Classifier
|
50 |
+
- Average Pooling
|
51 |
+
- Average Pooling
|
52 |
+
- Batch Normalization
|
53 |
+
- Convolution
|
54 |
+
- Dense Connections
|
55 |
+
- Dropout
|
56 |
+
- Inception-v3 Module
|
57 |
+
- Max Pooling
|
58 |
+
- ReLU
|
59 |
+
- Softmax
|
60 |
+
Tasks:
|
61 |
+
- Image Classification
|
62 |
+
Training Techniques:
|
63 |
+
- Gradient Clipping
|
64 |
+
- Label Smoothing
|
65 |
+
- RMSProp
|
66 |
+
- Weight Decay
|
67 |
+
Training Data:
|
68 |
+
- ImageNet
|
69 |
+
Training Resources: 50x NVIDIA Kepler GPUs
|
70 |
+
ID: inception_v3
|
71 |
+
LR: 0.045
|
72 |
+
Dropout: 0.2
|
73 |
+
Crop Pct: '0.875'
|
74 |
+
Momentum: 0.9
|
75 |
+
Image Size: '299'
|
76 |
+
Interpolation: bicubic
|
77 |
+
Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_v3.py#L442
|
78 |
+
Weights: https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth
|
79 |
+
Results:
|
80 |
+
- Task: Image Classification
|
81 |
+
Dataset: ImageNet
|
82 |
+
Metrics:
|
83 |
+
Top 1 Accuracy: 77.46%
|
84 |
+
Top 5 Accuracy: 93.48%
|
85 |
+
-->
|