2015-11-05

Computer Vision DataSet资源列表

俗话说，“算法为王，数据为后”。巧妇难为无米之炊，可见再优秀的算法也得有数据支持。这篇就用来记录我用过的数据集，以备不时之需。

数据集汇总

http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
http://visionandlanguage.net/
http://riemenschneider.hayko.at/vision/dataset/

物体

INSTRE: for INSTance-level object REtrieval and REcognition

http://vipl.ict.ac.cn/isia/instre/
(中科院计算所)新的图像数据集（共计28,543幅图像，100个类），用于验证实例级对象检索、识别算法及其他机器视觉算法，如检测、不变特征和特征匹配等

LOGO图

Dataset: FlickrLogos-32

http://www.multimedia-computing.de/flickrlogos/
2011年公布的一个数据集，包含32类知名商标品牌的logo。

动植物图像

水果FIDS30: Fruit Image Data set

http://www.vicos.si/Downloads/FIDS30
2014年公布的水果图片集，包含971张图片，覆盖30种不同的水果

鲜花102 Category Flower Dataset

http://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
牛津大学vgg组2009年搞的花卉图片，包含102类花卉8189张图片，对应标签imagelabels.mat

植物图像库

http://www.plantphoto.cn/
收录图片208万幅，1.86万种

人脸

CelebA: Large-scale CelebFaces Attributes (CelebA) Dataset

http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
香港中文大学组2015年搞的一个最新的目前最大的人脸集，包含10177个人，202599张人脸图片，而且每张图片有5个关键点标注信息以及40个2值属性，属性包括是否带眼睛，是否在笑，是否带帽子，是不是卷发，是否年轻，性别等等，是非常珍贵的人脸数据。

WIDER FACE: A Face Detection Benchmark

http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/
香港中文大学再放大招，2015年11月又推出人脸检测标注数据库，包含32203张图片，393703张人脸。其中50%的测试数据集并没有公开标注信息。

IMDB-WIKI

https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/
有人脸位置、性别、年龄的标注信息，共52万的标注图片

MegaFace Dataset

http://megaface.cs.washington.edu
1 Million Faces for Recognition at Scale 690,572 unique people

FaceScrub

http://vintage.winklerbros.net/facescrub.html
A Dataset With Over 100,000 Face Images of 530 People.

FDDB

http://vis-www.cs.umass.edu/fddb
Face Detection and Data Set Benchmark. 5k images.

AFLW

https://lrs.icg.tugraz.at/research/aflw/
Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. 25k images.

AFW

http://www.ics.uci.edu/~xzhu/face/
Annotated Faces in the Wild. ~1k images.

3D Mask Attack Dataset]

https://www.idiap.ch/dataset/3dmad
76500 frames of 17 persons using Kinect RGBD with eye positions (Sebastien Marcel)

Audio-visual database for face and speaker recognition

https://www.idiap.ch/dataset/mobio
Mobile Biometry MOBIO http://www.mobioproject.org/

BANCA face and voice database

http://www.ee.surrey.ac.uk/CVSSP/banca/
Univ of Surrey

Binghampton Univ 3D static and dynamic facial expression database

http://www.cs.binghamton.edu/~lijun/Research/3DFE/3DFE_Analysis.html
(Lijun Yin, Peter Gerhardstein and teammates)

The BioID Face Database

https://www.bioid.com/About/BioID-Face-Database
BioID group

Biwi 3D Audiovisual Corpus of Affective Communication

http://www.vision.ee.ethz.ch/datasets/b3dac2.en.html
1000 high quality, dynamic 3D scans of faces, recorded while pronouncing a set of English sentences.

Cohn-Kanade AU-Coded Expression Database

http://www.pitt.edu/~emotion/ck-spread.htm
500+ expression sequences of 100+ subjects, coded by activated Action Units (Affect Analysis Group, Univ. of Pittsburgh.

CMU/MIT Frontal Faces

http://cbcl.mit.edu/software-datasets/FaceData2.html
Training set: 2,429 faces, 4,548 non-faces; Test set: 472 faces, 23,573 non-faces.

kaggle表情数据

https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data
人脸表情数据集，7种表情(0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)，训练集28709张图片，测试集3589张，像素48*48

人脸素描数据集

http://mmlab.ie.cuhk.edu.hk/archive/facesketch.html
606张人脸的素描和证件照的一一对应图像

汽车

KITTI Vision Benchmark

http://www.cvlibs.net/datasets/kitti/index.php
这个就厉害了，包括车载环境的机动车、非机动车、行人以及车道等多方面的标注信息。用于专业的车辆辅助驾驶的检测算法测评。

CompCars: The comprehensive cars dataset

http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html

图像知识图谱

Visual Genome

https://visualgenome.org/
100K+图像，400万区域描述，170万图像问答，210万物体，180万的属性和关系，所有都映射到Wordnet Synsets

OCR

COCO-TEXT

http://vision.cornell.edu/se3/coco-text/
该数据库含63686张图像，123589个文本区域及标注（位置、手写/印刷等属性、语言、可辨识性、文本）

视频

ActivityNet

http://activity-net.org/
人类活动理解建模，200个类和2万个训练/调试/测试视频

WWW Crowd

http://www.ee.cuhk.edu.hk/~jshao/WWWCrowdDataset.html
10000个视频、8257种不同场景、超过800万帧图像。