VGGHeads

VGGHeads

VGGHeads

A Large-Scale Synthetic Dataset for 3D Human Heads

A Large-Scale Synthetic Dataset for 3D Human Heads

A Large-Scale Synthetic Dataset for 3D Human Heads

🙋‍♂️ Orest Kupyn

🙋‍♂️ Eugene Khvedchenya

🙋‍♂️ Christian Rupprecht

Visual Geometry Group, University of Oxford
Ukrainian Catholic University
PiñataFarms AI
arXiv Preprint Paper

Visual Geometry Group, University of Oxford
Ukrainian Catholic University
PiñataFarms AI
arXiv Preprint Paper

Visual Geometry Group, University of Oxford
Ukrainian Catholic University
PiñataFarms AI
arXiv Preprint Paper

We introduce a large scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset we introduce a new model architecture capable of simultaneous heads detection and head meshes reconstruction from a single image in a single step.

Dataset

Dataset

Dataset

The dataset generation process consists of the following stages:

  1. We generate images with a latent diffusion model conditioned on a large real-world dataset

  2. A small subset of data is manually labeled with head bounding boxes to train a binary detector on synthetic data,

  3. For each detected head in generated images we predict the 3D head model parameters

  4. The final dataset is automatically filtered to remove extra noise and privacy sensitive samples


Data contains a wide variety of scenes, numbers of people, provides rich annotations for every human head and is scalable to an arbitrary number of samples. The final dataset consists of 1,022,944 images with 2,219,146 heads annotated with bounding boxes and 3D model parameters

Model

Model

Model

VGGHeads extends YOLO-NAS architecture to predict the 3D Morphable Model parameters along with the head bounding boxes from the multi-scale feature maps. Rich ground truth annotations allow us to train a model that estimates a compact 3D Head representation of multiple people from an RGB image in a single forward pass. For each head, we predict a vector of 3DMM parameters disentangled into shape, expression, and pose. As every vertex can be mapped to a face part, both head, face and other head parts bounding box can be recovered from the reprojected head mesh. Compared to other methods, this setup encodes a more general representation that serves as a base for other downstream head modeling tasks. The final predictions include:

  1. Head and Face Bounding Box

  2. 3DMM Parameters

  3. Head and Face 3D Vertices and 2D Landmarks

  4. 3D Head Pose

Controllable Image Generation

Controllable Image Generation

Controllable Image Generation

The full 3D head mesh provides a strong condition for the image generation process. We demonstrate the ability to generate images with a controlled 3D head shape and pose by training a ControlNet and T2I Adapter conditioned on the meshes.

Python Package

Python Package

Python Package

The code is structured in a Python Package with a simple API that allow to predict head meshes and boxes with just a few lines of code.

The code is structured in a Python Package with a simple API that allow to predict head meshes and boxes with just a few lines of code.

from head_detector import HeadDetector

detector = HeadDetector()
image_path = "your_image.jpg"
predictions = detector(image_path)
# predictions.heads contain a list of heads with .bbox, .vertices_3d, .head_pose params predictions.draw() # draw heads on the image

BibTeX

BibTeX

BibTeX

If you use the VGGHeads Dataset or code - implicitly or explicitly - for your research projects, please cite the following paper:

If you use the VGGHeads Dataset or code - implicitly or explicitly - for your research projects, please cite the following paper:

@article{vggheads,
title={VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads},
author={Orest Kupyn and Eugene Khvedchenia and Christian Rupprecht},
year={2024},
eprint={2407.18245},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.18245},
}

Acknowledgements

Acknowledgements

Acknowledgements

We would like to thank Tetiana Martyniuk and Iro Laina for paper proofreading and valuable feedback. We also thank the Armed Forces of Ukraine for providing security to complete this work.

DAD-3DHeads

A large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image

Read More

Dataset Enhancement

with Instance-Level Augmentations

Read More

FEAR:

Fast, Efficient, Accurate and Robust Visual Tracker

Read More

© Piñata Farms AI 2024

Instagram

Email