arxiv:2310.17075

HyperFields: Towards Zero-Shot Generation of NeRFs from Text

Published on Oct 26, 2023

· Submitted by

akhaliq on Oct 27, 2023

#3 Paper of the day

Upvote

Authors:

Sudarshan Babu ,

Avery Zhou ,

Michael Maire ,

Rana Hanocka

Abstract

We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes -- either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.

View arXiv page View PDF Add to collection

Community

mikelabs

Oct 28, 2023

My summary of the day: Generating 3D objects based solely on text descriptions has proven extremely challenging for AI. Current state-of-the-art methods require optimizing a full 3D model from scratch for each new prompt, which is computationally demanding.

A new technique called HyperFields demonstrates promising progress in generating detailed 3D models directly from text prompts, without slow optimization.

The HyperFields approach instead aims to learn a generalized mapping from language to 3D geometry representations. This would allow tailored 3D models to be produced for new text prompts efficiently in a single feedforward pass, without slow optimization.

HyperFields combines two key techniques:

A dynamic hypernetwork that takes in text and progressively predicts weights for a separate 3D generation network. The weight predictions are conditioned on previous layer activations, enabling specialization.
Distilling individually optimized 3D networks into the hypernetwork, providing dense supervision for learning the complex text-to-3D mapping.

In experiments, HyperFields exceeded previous state-of-the-art methods in sample efficiency and wall-clock convergence time by 5-10x. It demonstrated the ability to:

Encode over 100 distinct objects like "yellow vase" in a single model
Generalize to new text combinations without seeing that exact prompt before
Rapidly adapt to generate completely novel objects with minimal fine-tuning

However, limitations remain around flexibility, fine-grained details, and reliance on existing 2D guidance systems.

TL;DR: HyperFields uses a dynamic hypernetwork to predict weights for a 3D generation network. The method is 5-10x faster than existing techniques and can quickly adapt to new text prompts, but has limitations in fine details.

Full summary is here.