Dataset Enhancement with Instance-Level Augmentations

Dataset Enhancement with Instance-Level Augmentations

Dataset Enhancement with Instance-Level Augmentations

🙋‍♂️ Orest Kupyn

🙋‍♂️ Christian Rupprecht

Visual Geometry Group, University of Oxford
Piñata Farms AI
ECCV 2024

Visual Geometry Group, University of Oxford
Piñata Farms AI
ECCV 2024

Visual Geometry Group, University of Oxford
Piñata Farms AI
ECCV 2024

We propose a novel method for dataset enhancement with instance-level augmentations. Given an image and ground truth (or predicted) segmentation mask, we estimate depth and edge maps at the image level. The annotation is decomposed into the per-object binary masks and class, which together form the conditioning of the inpainting model. We redraw every instance and recombine them into a final image using alpha-blending sorted by depth. Used as a data augmentation method, it improves the performance and generalization of the state-of-the-art salient object detection, semantic segmentation and object detection models. By redrawing all privacy-sensitive instances (people, license plates, etc.), the method is also applicable for data anonymization.

Visual Examples:

Repainting Multiple Objects In A Scene

Visual Examples:

Repainting Multiple Objects In A Scene

Visual Examples:

Repainting Multiple Objects In A Scene

We augment images by redrawing individual objects in the scene retaining their original shape. This allows training with the unchanged class label (e.g. class, segmentation, detection, etc.). The generations are highly diverse and match the scene composition. Guess the original in each row!

Applications

Applications

Applications

The method can generate images for an arbitrary dataset labelled with bounding boxes or segmentation masks, making it applicable for a wide range of tasks, including Object Detection, Semantic and Instance Segmentation, Panoptic Segmentation, etc.

Salient Object Detection

Salient Object Detection

Salient Object Detection

We create an augmented version of DUTS - largest dataset for salient object detection. To construct the text prompt for the pipeline, we crop the image by the bounding rectangle of the binarized saliency map and use the BLIP-VQA model to predict the object name in an open vocabulary setting. The dataset is of relatively low resolution, which we also improve with our method. Since salient object segmentation highly depends on predicting accurate object boundaries, we add an optional mask refinement stage to preserve the sharpness and high quality of the masks. To this end, we crop every generated object from the images in the train set using its corresponding bounding rectangle of the saliency map. Below you can see an example of an original image and three augmented variants.

Visual Object Detection

Visual Object Detection

Visual Object Detection

We evaluate the method on the COCO dataset, which is the largest dataset for object detection. The dataset includes complex scenes, with multiple objects, occlusions and various backgrounds. Still, the method generalize well to the complex scenes

Generalization To Other Datasets

Generalization To Other Datasets

Generalization To Other Datasets

The method can also be applied to the dataset that don't have the instance masks labels. In such case, the mask can be simply predicted by off-the-shelf model in open vocabulary setting. This allows to generate the augmentations for any object classes. To validate this, we regenerate Pascal VOC dataset for semantic segmentation without using the instance masks.

Data Anonymization

Data Anonymization

Data Anonymization

We validate that fully replacing people in the training data with synthetic samples does not affect the final performance, which enables retrospectively improving privacy shortcomings of scraped internet datasets. For the COCO dataset, we generate two additional versions: anonymizing people and cars (for personal information such as license plates). The visual examples of repainting people and vehicles are shown below.

Python Package

Python Package

Python Package

The code is structured in a Python Package with a simple API that allow the augmentations to be applied ot any vision dataset online, with just a few lines of code.

The code is structured in a Python Package with a simple API that allow the augmentations to be applied ot any vision dataset online, with just a few lines of code.

import os
import cv2
import glob
from instance_augmentation.augment import Augmenter augmenter = Augmenter("path_to_save_results", p=1.0)
for image_path in glob.glob("path_to_image_folder/*"):
image_name = os.path.split(image_path)[1]
original_image = cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB)
augmented_image = augmenter.augment_image(original_image, image_name)

BibTeX

BibTeX

BibTeX

@article{kupyn2024dataset,
title = {Dataset Enhancement with Instance-Level Augmentations},
author = {Kupyn, Orest and Rupprecht, Christian},
journal = {arXiv preprint arXiv:2406.08249},
year = {2024}
}

Acknowledgements

Acknowledgements

Acknowledgements

We would like to thank Tetiana Martyniuk for paper proofreading and valuable feedback.

DAD-3DHeads

A large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image

Read More

VGGHeads

A Large-Scale Synthetic Dataset for 3D Human Heads

Read More

FEAR:

Fast, Efficient, Accurate and Robust Visual Tracker

Read More

© Piñata Farms AI 2024

Instagram

Email