Metadata-Version: 2.1
Name: depth_anything
Version: 2024.1.22.0
Project-URL: Documentation, https://github.com/LiheYoung/Depth-Anything
Project-URL: Issues, https://github.com/LiheYoung/Depth-Anything/issues
Project-URL: Source, https://github.com/LiheYoung/Depth-Anything
License-File: LICENSE
Requires-Dist: opencv-python
Requires-Dist: torch
Requires-Dist: torchvision
Description-Content-Type: text/markdown
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
[**Lihe Yang**](https://liheyoung.github.io/)
1 · [**Bingyi Kang**](https://scholar.google.com/citations?user=NmHgX-wAAAAJ)
2+ · [**Zilong Huang**](http://speedinghzl.github.io/)
2 · [**Xiaogang Xu**](https://xiaogang00.github.io/)
3,4 · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)
2 · [**Hengshuang Zhao**](https://hszhao.github.io/)
1+
1The University of Hong Kong ·
2TikTok ·
3Zhejiang Lab ·
4Zhejiang University
+corresponding authors
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and **62M+ unlabeled images**.
![teaser](assets/teaser.png)
## News
* **2024-01-22:** Paper, project page, code, models, and demo are released.
## Features of Depth Anything
- **Relative depth estimation**:
Our foundation models listed [here](https://huggingface.co/spaces/LiheYoung/Depth-Anything/tree/main/checkpoints) can provide relative depth estimation for any given image robustly. Please refer [here](#running) for details.
- **Metric depth estimation**
We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation. Please refer [here](./metric_depth) for details.
- **Better depth-conditioned ControlNet**
We re-train **a better depth-conditioned ControlNet** based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet. Please refer [here](./controlnet/) for details.
- **Downstream high-level scene understanding**
The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, *e.g.*, semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K. Please refer [here](./semseg/) for details.
## Performance
Here we compare our Depth Anything with the previously best MiDaS v3.1 BEiT