rail-berkeley
/

octo-base-1.5

Robotics

Transformers

Inference Endpoints

Model card Files Files and versions Community

rail-berkeley commited on May 21

Commit

9dcddab

•

1 Parent(s): 5dcdc3e

Update README.md

Browse files

Files changed (1) hide show

README.md +73 -3

README.md CHANGED Viewed

@@ -1,3 +1,73 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: robotics
+---
+# Octo Base
+See https://github.com/octo-models/octo for instructions for using this model.
+Octo Base is trained with a window size of 2, predicting 7-dimensional actions 4 steps into the future using a diffusion policy. The model is a Transformer with 93M parameters (equivalent to a ViT-B). Images are tokenized by preprocessing with a lightweight convolutional encoder, then grouped into 16x16 patches. Language is tokenized by applying the T5 tokenizer, and then applying the T5-Base language encoder.
+Observations and tasks conform to the following spec:
+Observations:
+```
+{
+    image_primary: ('batch', 'history_window', 256, 256, 3),
+    image_wrist: ('batch', 'history_window', 128, 128, 3),
+}
+```
+Tasks:
+```
+{
+    image_primary: ('batch', 256, 256, 3),
+    image_wrist: ('batch', 128, 128, 3),
+    language_instruction: {
+        attention_mask: ('batch', 16),
+        input_ids: ('batch', 16),
+    },
+}
+```
+At inference, you may pass in any subset of these observation and task keys, with a history window up to 2 timesteps.
+This model was trained on a mix of datasets from the Open X-Embodiment dataset.
+| Dataset                                                    | Proportion of batch |
+|------------------------------------------------------------|---------------------|
+| Fractal (Brohan et al, 2022)                               | 17.0\%              |
+| Kuka (Kalashnikov et al, 2018)                             | 17.0\%              |
+| Bridge (Walke et al, 2023)                         | 17.0\%              |
+| BC-Z (Jang et al, 2022)                                    | 9.1\%               |
+| Stanford Hydra Dataset (Belkhale et al, 2023)          | 6.0\%               |
+| Language Table~ (Lynch et al, 2023)                | 5.9\%               |
+| Taco Play (Rosete-Beas et al, 2022, Mees et al., 2023)   | 3.6\%               |
+| Furniture Bench Dataset (Heo et al, 2023)      | 3.3\%               |
+| UTAustin Mutex (Shah et al, 2023)                       | 3.0\%               |
+| Austin Sailor Dataset (Nasiriany et al, 2022)          | 2.9\%               |
+| Roboturk (Mandlekar et al, 2018)         | 2.8\%               |
+| Toto (Zhou et al, 2023)                                 | 2.4\%               |
+| Austin Sirius Dataset (Liu et al, 2023)                 | 2.3\%               |
+| Berkeley Autolab UR5 (Chen et al)            | 1.5\%               |
+| IAMLab CMU Pickup Insert (Saxena et al, 2023) | 1.2\%               |
+| Viola (Zhu et al, 2023)                                 | 1.2\%               |
+| Berkeley Fanuc Manipulation (Zhu et al, 2023) | 1.0\%               |
+| NYU Franka Play Dataset (Cui et al, 2022)                | 0.9\%               |
+| UCSD Kitchen Dataset (Ge Yan and Wang, 2023)                 | <0.1\%              |
+| Jaco Play (Dass et al, 2023)                         | 0.6\%               |
+| Berkeley Cable Routing (Luo et al, 2023)           | 0.3\%               |
+| Austin Buds Dataset (Zhu et al, 2022)                  | 0.3\%               |
+| CMU Stretch (Mendonca et al, 2023)                 | 0.2\%               |
+| NYU Door Opening (Pari et al, 2021)                | 0.1\%               |
+| DLR EDAN Shared Control (Quere et al, 2020)          | 0.1\%               |
+# Updates for Version 1.5
+- Language task tokens are now repeated at every timestep in the context window.
+- Augmented the language instructions in the data with rephrasings from GPT-3.5.
+- Bug fixes:
+  - Turned off dropout in the diffusion head due to incompatibility with layer norm.
+  - Fixed an off-by-one error with the attention mask.
+  - Fixed an issue where different image augmentations did not get fresh random seeds.