CALTECH 101

CALTECH 256

The CALTECH 256 dataset by Li Fei-Fei contains 30607 images for 256 categories.

centered, classification, detection, image, object, scene

Vision

CIFAR-10 / 100

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test image…

color, image classification, object, patch, scene, tiny

Vision

ScanNet

ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and insta…

3d, cad, indoor, layout, object, realism, recognition, rendering, room, scene, segmentation, synthetic

Vision

UK Bench

The UK Bench dataset from Henrik Stewenius and David Nister contains 10200 images of N=2550 groups with each four images at size 640x480. The images are r…

centered, image retrieval, object, rotation

Vision

SUNCG: Indoor Scenes

The SUNCG dataset is a Large 3D Model Repository for Indoor Scenes. SUNCG is an ongoing effort to establish a richly-annotated, large-scale dataset of…

3d, indoor, layout, object, realism, recognition, rendering, room, scene, segmentation, synthetic

Vision

Labeling in 3D Scenes

This dataset package contains the software and data used for Detection-based Object Labeling on the RGB-D Scenes Dataset as implemented in the paper: De…

3d, depth, indoor, kinect, object, recognition, reconstruction

Vision

TRANCOS Overlapping Car Crowds

The TRaffic ANd COngestionS (TRANCOS) dataset, a novel benchmark for (extremely overlapping) vehicle counting in traffic congestion situations. It consist…

car, detection, highway, object, spain, traffic, transportation, urban, vehicle

Vision

ICG Multi-Camera and Virtual …

The ICG Multi-Camera and Virtual PTZ dataset contains the video streams and calibrations of several static Axis P1347 cameras and one panoramic video from…

calibration, camera, crowd, detection, graz, multitarget, multiview, network, object, outdoor, panorama, pedestrian, tracking, video

Vision

ICG Multi-Camera Datasets

The ICG Multi-Camera datasets consist of Easy Data Set (just one person) Medium Data Set (3-5 persons, used for the experiments) Hard Data Set (crowd…

calibration, camera, detection, graz, indoor, multitarget, multiview, object, pedestrian, tracking, video

Vision

DAVIS: Densely Annotated VIde…

We present the 2017 DAVIS Challenge, a public competition specifically designed for the task of video object segmentation. Following the footsteps of othe…

benchmark, code, hd, object, quality, resolution, segmentation, tracking, video segmentation

Vision

Labelme

A large dataset of annotated images.

natural-image

Vision

UMD Dynamic Scene Recognition

The UMD Dynamic Scene Recognition dataset consists of 13 classes and 10 videos per class and is used to classify dynamic scenes. The dataset has been de…

classification, dynamic, motion, recognition, scene, video

Vision

ImageNET

The ImageNET dataset is the latest dataset by Li Fei-Fei containing various dataset ranging from 1000 to 10000 categories.

image classification, object segmentation, retrieval

Vision

STL-10 dataset

is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. Like CIFAR-10 with some modi…

natural-image

Vision

Caltech 256

Pictures of objects belonging to 256 categoriesPictures of objects belonging to 256 categories.

classification, natural-image

Vision

INRIA Lafarge Benchmarks

Some datasets and evaluation tools are provided on this page for four different computer vision and computer graphics problems. Population counting Lin…

3d, counting, crowd, detection, groundtruth, line, network, object, pedestrian, pointcloud, reconstruction, road, surface, urban

Vision

COIL-100

The COIL-100 (Columbia University Image Library) consists of 100 objects. For formal documentation look at the corresponding compressed technical report, …

image classification, image retrieval

Vision

Daimler Mono Pedestrian Detec…

The Daimler Mono Pedestrian Detection Benchmark dataset contains a large training and test set. The training set contains 15.560 pedestrian samples (image…

detection, mono, object, outdoor, pedestrian, scale, urban

Vision

Youtube-Objects dataset

The YouTube-Objects dataset is composed of videos collected from YouTube by querying for the names of 10 object classes. It contains between 9 and 24 vide…

detection, flow, object, optical, segmentation, video

Vision

Microsoft COCO

The Microsoft COCO (mscoco) is an image recognition and segmentation dataset which contains more 300k images for more than 70 categories. Other features…

benchmark, context, detection, object, recognition, segmentation, semantic

Vision

VIDEO datasets overview

Many different labeled video datasets have been collected over the past few years, but it is hard to compare them at a glance. So we have created a handy …

action, benchmark, classification, detection, object, recognition, video

Vision

MPI VehicleScenes

Abstract Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. In…

3d, car, classification, pedestrian, scene, segmentation, semantic, understanding

Vision

KU Leuven Facade

The KU Leuven Facade dataset is used for architectural styles classification. M. Mathias, A. Martinovic, J. Weissenberg, S. Haegler, L. Van Gool: Automa…

architecture, image classification, procedural reconstruction, urban

Vision

An RGB-D Dataset for 6D Pose …

A dataset acquired with 3 synchronized sensors (Primesense Carmine 1.09, Microsoft Kinect v2, Canon IXUS 950 IS), featuring: * 30 industry-relevant obje…

3d, estimation, object, pose, rgbd, texture-less

Vision

HUJI Multi-illuminant Image S…

The Multi-illuminant Image Sequences dataset contains 16 video sequences (13 with single light source and 3 with two global light sources), recorded with…

balance, chromaticity, color, constancy, dichromatic, illumination, light, nature, object, physics, white

Vision

ADE20k

Scene Parsing Benchmark Scene parsing data and part segmentation data derived from ADE20K dataset could be download from MIT Scene Parsing Benchmark. m…

annotation, benchmark, recognition, scene, segmentation, semantic

Vision

OpenStreetMap

Vector data for the entire planet under a free license. It contains (an older version of) the US Census Bureaus data.

geospatial, natural-image

Vision

WildLife Documentary (WLD) Da…

The dataset contains 15 documentary films that are downloaded from YouTube, whose durations vary from 9 minutes to as long as 50 minutes, and the total nu…

detection, object, video

Vision

UCF Person and Car VideoSeg

The UCF Person and Car VideoSeg dataset consists of six videos with groundtruth for video object segmentation. Surfing, jumping, skiing, sliding, big ca…

camera, groundtruth, model, motion, object, segmentation, video

Vision

Googles Open Images

A collection of 9 million URLs to images that have been annotated with labels spanning over 6,000 categories under Creative Commons.

natural-image

Vision

LSUN

Scene understanding with many ancillary tasks (room layout estimation, saliency prediction, etc.) and an associated competition.

natural-image

Vision

B3DO: Berkeley 3D Object Data…

For the first few decades of the fields existence, computer vision has been focused on algorithmic, logical approaches to perception. But it was only with…

3d, depth, indoor, kinect, object, recognition, reconstruction

Vision

ETH/Yahoo Video2Gif dataset

The Video2GIF dataset contains over 100,000 pairs of GIFs and their source videos. The GIFs were collected from two popular GIF websites (makeagif.com, gi…

gif, scene, summarization, summary, understanding, video highlight detection

Vision

Farman Institute 3D Point Sets

The Farman Institute 3D Point Sets dataset contains 11 objects by a 3D laser scanner. This dataset was peer-reviewed by Image Processing On Line: Farman I…

3d, laser, model, object, point, reconstruction, scanner

Vision

MIT Places205

Places205 dataase contains 2.5 million images from 205 scene categories for the academic public. The image dataset contains 2,448,873 images from 205 sc…

feature, learning, place, recognition, scene, urban

Vision

The Street View House Numbers…

House numbers from Google Street View. Think of this as recurrent MNIST in the wild.

natural-image

Vision

Tiny Images

The Tiny Images dataset consists of 79,302,017 images, each being a 32x32 color image. This data is stored in the form of large binary files which can be …

color, image classification, image retrieval, tiny

Vision

Aspect Layout dataset

The Aspect Layout dataset is designed to allow evaluation of object detection for aspect ratios in perspective images. Author text: In this project we…

aspect, detection, layout, object, perspective, ratio

Vision

COIL100

COIL100 : Different objects imaged at every angle in a 360 rotation.

natural-image

Vision

TUD Shapes 1+2

This material is supplementary to Michael Stark, Bernt Schiele. How Good are Local Features for Classes of Geometric Objects. Eleventh IEEE Internatio…

binary, classification, object, shape, tool

Vision

Daimler Mono Pedestrian Class…

The Daimler Mono Pedestrian Classification Benchmark dataset consists of two parts: a base data set. The base data set contains a total of 4000 pedestri…

classification, illumination, object, outdoor, pedestrian, scale, urban

Vision

Pascal VOC

Generic image Segmentation / classificationnot terribly useful for building real-world image annotation, but great for baselines

natural-image

Vision

Visual Attributes dataset

The Visual Attributes dataset contains visual attribute annotations for over 500 object classes (animate and inanimate) which are all represented in Image…

attribute, classification, imagenet, object, recognition

Vision

NEXRAD

Doppler radar scans of atmospheric conditions in the US.

geospatial, natural-image

Vision

University of Len - Edge prof…

This data set comprises 144 images of an edge profile cutting head of a milling machine. The head tool contains a total of 30 cutting inserts. The cutting…

cutting, edge, head, inserts, localization, milling, monitoring, object, profile, tool, tools, wear

Vision

Pankrac Marseille

Our repetitive pattern dataset with 106 images of app. 30 buildings from Pankrac, Prague and Marseille appearing in more than one image, number of appeara…

image classification, image retrieval, repetition, symmetry, urban

Vision

Video Segmentation Benchmark

The Video Segmentation Benchmark (VSB100) provides ground truth annotations for the Berkeley Video Dataset, which consists of 100 HD quality videos divide…

benchmark, groundtruth, motion, object, pedestrian, segmentation, tracking, video

Vision

GaTech VideoSeg

The GaTech VideoSeg dataset consists of two (waterski and yunakim?) video sequences for object segmentation. There exists no groundtruth segmentation an…

camera, model, motion, object, segmentation, video

Vision

MS COCO

Generic image understanding / captioning, with an associated competition.

natural-image

Vision

Bristol Egocentric Object Int…

The BEOID dataset includes object interactions ranging from preparing a coffee to operating a weight lifting machine and opening a door. The dataset is re…

3d, egocentric, interaction, object, pose, tracking, video

Vision

NORB

Binocular images of toy figurines under various illumination and pose.

natural-image

Vision

ImageNet

The de-facto image dataset for new algorithms. Many image API companies have labels from their REST interfaces that are suspiciously close to the 1000 cat…

natural-image

Vision

SceneNet RGB-D Synthetic Indo…

SceneNet RGB-D is dataset comprised of 5 million Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth. It expands the previous work of…

3d, indoor, lighting, navigation, reconstruction, rendering, robot, scene, segmentation, slam, synthetic, trajectory

Vision

GaTech SegTrack

The SegTrack dataset consists of six videos (five are used) with ground truth pixelwise segmentation (6th penguin is not usable). The dataset is used for …

camera, flow, groundtruth, model, motion, object, optical, proposal, segmentation, stationary, video

Vision

MNIST handwritten digits

MNIST: handwritten digits: The most commonly used sanity check. Dataset of 25x25, centered, B&W handwritten digits. It is an easy taskjust because somethi…

natural-image

Vision

VOT2016 segmentation

The VOT2016 pixel-wise annotations dataset contains pixel-wise per-frame annotations for sequences from VOT2016 dataset. The annotation is in a form of BW…

annotation, mask, object, segmentation, tracking, visual

Vision

All I Have Seen (AIHS)

The All I Have Seen (AIHS) dataset is created to study the properties of total visual input in humans, for around two weeks Nebojsa Jojic wore a camera ca…

3d, clustering, indoor, outdoor, scene, similarity, study, summary, user, video

Vision

Comprehensive Cars (CompCars)

The Comprehensive Cars (CompCars) dataset contains data from two scenarios, including images from web-nature and surveillance-nature. The web-nature data …

attribute, car, classification, fine-grained, object, recognition, urban, vehicle

Vision

COIL 20

Different objects imaged at every angle in a 360 rotation.

natural-image

Vision

PASCAL VOC Parts

The PASCAL VOC is augmented with segmentation annotation for semantic parts of objects. For example, for the person category, we provide segmentation mask…

detection, human, object, part, pascal, pedestrian, recognition, segmentation, semantic

Vision

ICG Lab 6 (Multi-Camera Multi…

The ICG Lab 6 (Multi-Camera Multi-Object Tracking) dataset contains 6 indoor people tracking scenarios recorded at our laboratory using 4 static Axis P134…

calibration, camera, detection, evaluation, graz, laboratory, multiview, object, pedestrian, segmentation, tracking

Vision

Crowd Dataset

The crowd datasets are collected from a variety of sources, such as UCF and data-driven crowd datasets. The sequences are diverse, representing dense crow…

anomaly, crowd, detection, human, pedestrian, scene, understanding, video

Vision

Landsat8

Satellite shots of the entire Earth surface, updated every several weeks.

geospatial, natural-image

Vision

CIFAR10 / CIFAR100

32x32 color images with 10 / 100 categories. Not commonly used anymore, though once again, can be an interesting sanity check.

natural-image

Vision

MPI-I VISPR (Visual Privacy)

We present a dataset to address the problem of visual privacy - where users unintentionally leak private information when sharing personal images online, …

classification, flickr, multilabel, privacy, regression, scene

Vision

Sheffield Building

Sheffield Building Image Dataset consists of over 3,000 low-resolution images of forty different buildings typically between 70 and 120 images per buildi…

image classification, image retrieval, sheffield, urban

Vision

Daimler Pedestrian Classifica…

Daimler Multi-Cue, Occluded Pedestrian Classification Benchmark Training and test samples have a resolution of 48 x 96 pixels with a 12-pixel border aro…

image classification, object detection, pedestrian, urban

Vision

Freiburg-Berkeley Motion Segm…

The Freiburg-Berkeley Motion Segmentation Dataset (FBMS-59) is an extension of the BMS dataset with 33 additional video sequences. A total of 720 frames i…

benchmark, groundtruth, motion, object, pedestrian, segmentation, tracking, video

Vision

KTH Multiview Football

The KTH Multiview Football dataset contains 771 images of football players includes images taken from 3 views at 257 time instances 14 annotated body join…

camera, detection, game, multitarget, multiview, object, outdoor, pedestrian, pose, recognition, soccer, tracking

Vision

LASIESTA (Labeled and Annotat…

LASIESTA is composed by many real indoor and outdoor sequences organized in different categories, each of one covering a specific challenge in moving obje…

background, camera, challenge, dataset, detection, foreground, groundtruth, motion, object, stationary, subtraction

Vision

Related datasets