Others

85 Datasets

Datasets


Connectionist Bench (Vowel Re…

The problem is specified by the accompanying data file, "vowel.data". This consists of a three dimensional array: voweldata [speaker, vowel, input]. The …

classification

Dodgers Loop Sensor

This loop sensor data was collected for the Glendale on ramp for the 101 North freeway in Los Angeles. It is close enough to the stadium to see unusual t…

multivariate, time-series

Bag of Words

For each text collection, D is the number of documents, W is the number of words in the vocabulary, and N is the total number of words in the collection (…

clustering, text

Hill-Valley

Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y co-ordinate, the points will create eith…

classification, sequential

Dexter

The original data were formatted by Thorsten Joachims in the bag-of-words representation. There were 9947 features (of which 2562 are always zeros for all…

classification, multivariate

Madelon

MADELON is an artificial dataset containing data points grouped in 32 clusters placed on the vertices of a five dimensional hypercube and randomly labeled…

classification, multivariate

USPTO Algorithm Challenge, ru…

USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Patent Labeling

classification, domain-theory

Libras Movement

The dataset (movement_libras) contains 15 classes of 24 instances each, where each class references to a hand movement type in LIBRAS. In the video pre-p…

classification, clustering, multivariate, sequential

Spoken Arabic Digit

Dataset from 8800(10 digits x 10 repetitions x 88 speakers) time series of 13 Frequency Cepstral Coefficients (MFCCs) had taken from 44 males and 44 femal…

classification, multivariate, time-series

AutoUniv

The user first creates a classification model and then generates classified examples from it. To create a model, the following are specified: the number o…

classification, multivariate