File size: 13,924 Bytes
4de2b91
31baa93
00d327a
e8c4ed3
2809e52
6bdd205
31baa93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4956dce
31baa93
 
 
effa24f
0a7dc31
 
effa24f
80c3093
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2809e52
80c3093
ac55c81
80c3093
0a7dc31
80c3093
 
 
 
 
 
0a7dc31
80c3093
 
0a7dc31
80c3093
0a7dc31
80c3093
 
3c9f729
80c3093
 
 
 
3c9f729
 
 
0a7dc31
3c9f729
2809e52
 
 
3c9f729
2809e52
 
 
 
 
 
 
 
 
5716022
2809e52
 
80c3093
 
 
 
 
 
ac55c81
80c3093
0a7dc31
2809e52
e1b3138
3c9f729
 
 
40c15ce
2809e52
360958c
8c4eb58
2809e52
3c9f729
2809e52
0a7dc31
3c9f729
0a7dc31
2809e52
3c9f729
 
e37da91
2809e52
3c9f729
b3ecd8f
 
 
 
 
2809e52
 
b3ecd8f
 
 
 
96bb15d
e37da91
2809e52
 
effa24f
4de2b91
 
 
2809e52
 
0a7dc31
2809e52
0a7dc31
2809e52
5716022
f87e0cc
 
 
 
 
 
 
 
e3b6615
f87e0cc
5716022
f87e0cc
 
 
 
 
 
 
 
 
2809e52
 
80c3093
 
 
 
 
 
 
 
 
 
 
3c9f729
80c3093
 
 
 
 
 
 
 
 
 
 
2809e52
 
 
 
 
0a7dc31
2809e52
3c9f729
2809e52
 
96bb15d
2809e52
0a7dc31
3c9f729
febb4b8
3c9f729
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/flax-community/DietNerf-Demo) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1etYeMTntw5mh3FvJv4Ubb7XUoTtt5J9G?usp=sharing)

<p align="center"><img width="450" alt="스크린샷 2021-07-04 오후 4 11 51" src="https://user-images.githubusercontent.com/77657524/126361638-4aad58e8-4efb-4fc5-bf78-f53d03799e1e.png"></p>

This project attempted to implement the paper **[Putting NeRF on a Diet](https://arxiv.org/abs/2104.00677)** (DietNeRF) in JAX/Flax.
DietNeRF is designed for rendering quality novel views in few-shot learning scheme, a task that vanilla NeRF (Neural Radiance Field) struggles.
To achieve this, the author coins **Semantic Consistency Loss** to supervise DietNeRF by prior knowledge from CLIP Vision Transformer. Such supervision enables DietNeRF to learn 3D scene reconstruction with CLIP's prior knowledge on 2D views.  

Besides this repo, you can check our write-up and demo here:
- ✍️ **[Write-up in Notion](https://steep-cycle-f6b.notion.site/DietNeRF-Putting-NeRF-on-a-Diet-4aeddae95d054f1d91686f02bdb74745)**: more details of DietNeRF and our experiments
-**[Demo in Hugging Face Space](https://huggingface.co/spaces/flax-community/DietNerf-Demo)**: showcase our trained DietNeRFs by Streamlit

## 🤩 Demo
1. You can check out [our demo in Hugging Face Space](https://huggingface.co/spaces/flax-community/DietNerf-Demo)
2. Or you can set up our Streamlit demo locally (model checkpoints will be fetched automatically upon startup)
```shell
pip install -r requirements_demo.txt
streamlit run app.py
```

<p align="center"><img width="600" height="400" alt="Streamlit Demo" src="assets/space_demo.png"></p>

## ✨ Implementation

Our code is written in JAX/ Flax and mainly based upon [jaxnerf](https://github.com/google-research/google-research/tree/master/jaxnerf) from Google Research. The base code is highly optimized in GPU & TPU. For semantic consistency loss, we utilize pretrained CLIP Vision Transformer from [transformers](https://github.com/huggingface/transformers) library.
To learn more about DietNeRF, our experiments and implementation, you are highly recommended to check out our very detailed **[Notion write-up](https://www.notion.so/DietNeRF-Putting-NeRF-on-a-Diet-4aeddae95d054f1d91686f02bdb74745)**!

<p align="center"><img width="500" height="600" alt="스크린샷 2021-07-04 오후 4 11 51" src="assets/report_thumbnail.png"></p> 

##  🤗 Hugging Face Model Hub Repo
You can also find our project on the [Hugging Face Model Hub Repository](https://huggingface.co/flax-community/putting-nerf-on-a-diet/).

Our JAX/Flax implementation currently supports:

<table class="tg">
<thead>
  <tr>
    <th class="tg-0lax"><span style="font-weight:bold">Platform</span></th>
    <th class="tg-0lax" colspan="2"><span style="font-weight:bold">Single-Host GPU</span></th>
    <th class="tg-0lax" colspan="2"><span style="font-weight:bold">Multi-Device TPU</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0lax"><span style="font-weight:bold">Type</span></td>
    <td class="tg-0lax">Single-Device</td>
    <td class="tg-0lax">Multi-Device</td>
    <td class="tg-0lax">Single-Host</td>
    <td class="tg-0lax">Multi-Host</td>
  </tr>
  <tr>
    <td class="tg-0lax"><span style="font-weight:bold">Training</span></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
  </tr>
  <tr>
    <td class="tg-0lax"><span style="font-weight:bold">Evaluation</span></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
    <td class="tg-0lax"><img src="http://storage.googleapis.com/gresearch/jaxnerf/check.png" alt="Supported" width=18px height=18px></td>
  </tr>
</tbody>
</table>

## 💻 Installation

```bash
# Clone the repo
git clone https://github.com/codestella/putting-nerf-on-a-diet
# Create a conda environment, note you can use python 3.6-3.8 as
# one of the dependencies (TensorFlow) hasn't supported python 3.9 yet.
conda create --name jaxnerf python=3.6.12; conda activate jaxnerf
# Prepare pip
conda install pip; pip install --upgrade pip
# Install requirements
pip install -r requirements.txt
# [Optional] Install GPU and TPU support for Jax
# Remember to change cuda101 to your CUDA version, e.g. cuda110 for CUDA 11.0.
!pip install --upgrade jax "jax[cuda110]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
# install flax and flax-transformer
pip install flax transformers[flax]
```

## ⚽ Dataset 
Download the datasets from the [NeRF official Google Drive](https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1).
Please download the `nerf_synthetic.zip` and unzip them
in the place you like. Let's assume they are placed under `/tmp/jaxnerf/data/`.


## 💖 Methods

* 👉👉 You can check VEEEERY detailed explanation about our project on [Notion Report](https://www.notion.so/DietNeRF-Putting-NeRF-on-a-Diet-4aeddae95d054f1d91686f02bdb74745)

<p align="center"><img width="400" alt="스크린샷 2021-07-04 오후 4 11 51" src="https://user-images.githubusercontent.com/77657524/124376591-b312b780-dce2-11eb-80ad-9129d6f5eedb.png"></p> 

Based on the principle
that “a bulldozer is a bulldozer from any perspective”, Our proposed DietNeRF supervises the radiance field from arbitrary poses
(DietNeRF cameras). This is possible because we compute a semantic consistency loss in a feature space capturing high-level
scene attributes, not in pixel space. We extract semantic representations of renderings using the CLIP Vision Transformer, then
maximize similarity with representations of ground-truth views. In
effect, we use prior knowledge about scene semantics learned by
single-view 2D image encoders to constrain a 3D representation.

You can check detail information on the author's paper. Also, you can check the CLIP based semantic loss structure on the following image.
<p align="center"><img width="600" alt="스크린샷 2021-07-04 오후 4 11 51" src="https://user-images.githubusercontent.com/77657524/126386709-a4ce7ff8-2a68-442f-b4ed-26971fb90e51.png"></p>

Our code used JAX/FLAX framework for implementation. So that it can achieve much speed up than other NeRF codes. At last, our code used hugging face, transformer,  CLIP model library. 

## 🤟 How to use
```
python -m train \
  --data_dir=/PATH/TO/YOUR/SCENE/DATA \ % e.g., nerf_synthetic/lego
  --train_dir=/PATH/TO/THE/PLACE/YOU/WANT/TO/SAVE/CHECKPOINTS \
  --config=configs/CONFIG_YOU_LIKE
```
You can toggle the semantic loss by “use_semantic_loss” in configuration files.

## 💎 Experimental Results

### ❗ Rendered Rendering images by 8-shot learned Diet-NeRF

DietNeRF has a strong capacity to generalise on novel and challenging views with EXTREMELY SMALL TRAINING SAMPLES!

### HOTDOG / DRUM / SHIP / CHAIR / LEGO / MIC

<img alt="" src="https://user-images.githubusercontent.com/77657524/126976706-caec6d6c-6126-45d0-8680-4c883f71f5bb.png" width="250"/></td><td><img alt="" src="https://user-images.githubusercontent.com/77657524/126976868-183af09a-47b3-4c76-ba20-90e9fef17bcc.png" width="250"/><td><img alt="" src="https://user-images.githubusercontent.com/77657524/126977843-18b4b077-1db0-4287-8e5c-baa10c46e647.png" width="250"/>
<img alt="" src="https://user-images.githubusercontent.com/77657524/126977066-9c99a882-7a46-4a1d-921f-cdb0eee60f39.gif" width="250"/><img alt="" src="https://user-images.githubusercontent.com/77657524/126913553-19ebd2f2-c5f1-4332-a253-950e41cb5229.gif" width="300"/><img alt="" src="https://user-images.githubusercontent.com/77657524/126913559-dfce4b88-84a8-4a0a-91eb-ed12716ab328.gif" width="300"/>

### ❗ Rendered GIF by occluded 14-shot learned NeRF and Diet-NeRF

We made artificial occlusion on the right side of image (Only picked left side training poses).
The reconstruction quality can be compared with this experiment.
DietNeRF shows better quality than Original NeRF when It is occluded.

#### Training poses
<img width="1400" src="https://user-images.githubusercontent.com/26036843/126111980-4f332c87-a7f0-42e0-a355-8e77621bbca4.png">


#### LEGO
[DietNeRF]
<img alt="" src="https://user-images.githubusercontent.com/77657524/126913404-800777f8-8f88-451a-92de-3dda25075206.gif" width="300"/>
[NeRF]
<img alt="" src="https://user-images.githubusercontent.com/77657524/126913412-f10dfb3e-e918-4ff4-aa2c-63529fec91d8.gif" width="300"/>


#### SHIP
[DietNeRF]
<img alt="" src="https://user-images.githubusercontent.com/77657524/126913430-0014a904-6ca1-4a7b-9cd6-6f73b36552fb.gif" width="300"/>
[NeRF]
<img alt="" src="https://user-images.githubusercontent.com/77657524/126913439-2e3128ef-c7ef-4c21-8261-6e3b8fe51f86.gif" width="300"/>


## 👨‍👧‍👦 Our Teams


| Teams            | Members                                                                                                                                                        |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Project Managing | [Stella Yang](https://github.com/codestella) To Watch Our Project Progress, Please Check [Our Project Notion](https://www.notion.so/Putting-NeRF-on-a-Diet-e0caecea0c2b40c3996c83205baf870d) |
| NeRF Team        | [Stella Yang](https://github.com/codestella), [Alex Lau](https://github.com/riven314), [Seunghyun Lee](https://github.com/sseung0703), [Hyunkyu Kim](https://github.com/minus31),  [Haswanth Aekula](https://github.com/hassiahk), [JaeYoung Chung](https://github.com/robot0321)          |
| CLIP Team        | [Seunghyun Lee](https://github.com/sseung0703), [Sasikanth Kotti](https://github.com/ksasi), [Khali Sifullah](https://github.com/khalidsaifullaah) , [Sunghyun Kim](https://github.com/MrBananaHuman)                                |
| Cloud TPU Team   | [Alex Lau](https://github.com/riven314), [Aswin Pyakurel](https://github.com/masapasa), [JaeYoung Chung](https://github.com/robot0321),  [Sunghyun Kim](https://github.com/MrBananaHuman)                                                    |

* Extremely Don't Sleep Contributors 🤣:  [Seunghyun Lee](https://github.com/sseung0703), [Alex Lau](https://github.com/riven314), [Stella Yang](https://github.com/codestella), [Haswanth Aekula](https://github.com/hassiahk)

## 😎 What we improved from original JAX-NeRF : Innovation
 - Neural rendering with fewshot images
 - Hugging face CLIP based semantic loss loop
 - You can choose coarse mlp / coarse + fine mlp training
   (coarse + fine is on the `main` branch / coarse is on the `coarse_only` branch)
    * coarse + fine : shows good geometric reconstruction
    * coarse : shows good PSNR/SSIM result  
 - Make Video/GIF rendering result, `--generate_gif_only` arg can run fast rendering GIF.
 - Cleaning / refactoring the code 
 - Made multiple models / colab / space for Nice demo

## 💞 Social Impact

 - Game Industry
 - Augmented Reality Industry
 - Virtual Reality Industry
 - Graphics Industry
 - Online shopping
 - Metaverse
 - Digital Twin
 - Mapping / SLAM

## 🌱 References
This project is based on “JAX-NeRF”.
```
@software{jaxnerf2020github,
  author = {Boyang Deng and Jonathan T. Barron and Pratul P. Srinivasan},
  title = {{JaxNeRF}: an efficient {JAX} implementation of {NeRF}},
  url = {https://github.com/google-research/google-research/tree/master/jaxnerf},
  version = {0.0},
  year = {2020},
}
```

This project is based on “Putting NeRF on a Diet”.
```
@misc{jain2021putting,
      title={Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis}, 
      author={Ajay Jain and Matthew Tancik and Pieter Abbeel},
      year={2021},
      eprint={2104.00677},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```

## 🔑 License
[Apache License 2.0](https://github.com/codestella/putting-nerf-on-a-diet/blob/main/LICENSE)

## ❤️ Special Thanks 

Our Project is started in the [HuggingFace X GoogleAI (JAX) Community Week Event](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104).

Thank you for our mentor Suraj and organizers in JAX/Flax Community Week! 
Our team grows up with this community learning experience. It was wonderful time!

<img width="250" alt="스크린샷 2021-07-04 오후 4 11 51" src="https://user-images.githubusercontent.com/77657524/126369170-5664076c-ac99-4157-bc53-b91dfb7ed7e1.jpeg">

[Common Computer AI](https://comcom.ai/en/) sponsored multiple V100 GPUs for our project!
Thank you so much for your support!
<img width="250" alt="스크린샷" src="https://user-images.githubusercontent.com/77657524/126914984-d959be06-19f4-4228-8d3a-a855396b2c3f.jpeg">