Data | STABILO DigiVision

The Dataset

Dataset 1+2 of the open competition contains recordings of 47 right-handed volunteers that provided 9477 equations overall. The task is to recognize single equations of unknown writers. Five labelled equations of each test-person can be used as adaptation set. Ground truths are complete equations (like 123+123=246), no character-segmentation is provided.

This is what you will be able to download after registering:

A zipped folder containing 36 subfolders representing persons. Each person-folder contains one folder per recording of that person. Each recording-folder contains two files:
- sensor_data.csv: Contains the raw time series data of a complete recording of two 3D accelerometers, a 3D gyroscope, a 3D magnetometer and a 1D force sensor sampled at 100 Hz. Furthermore, the last column of the file is a simple sample counter and the first column contains a timestamp indicating when the sample was transmitted to the recording device via Bluetooth. More info on the sensors.
- labels.csv: Contains the information at which point in time which equation was written. The start and stop columns of the labels.csv file point to the Millis column in the sensor_data.csv file.
A Python 3 script read_dataset.py that splits the sensor_data.csv file according to the timestamps given in the corresponding labels.csv file. It also saves the obtained letters and their labels in a pickle file for easier future use. Furthermore, the the code demonstrates how to split 5 equations off as an adaptation set. Look at this file’s source code to understand what’s going on in-depth.

Additional info on the dataset:

There will be a second data release in ~May 2021 (having additional equations written by new volunteers)
The equations are not segmented into the single characters they’re made up of
The alphabet of the equations consists of the following characters:
Digits: 0,1,2,3,4,5,6,7,8,9
Operators: =,+,-,·,: (the last ones are used for multiplication/division in Germany)
The equations are not mathematically correct to prevent you from calculating the result
It is not forbidden to use publicly available DigiPen data in addition to the challenge dataset but we do not encourage it.

How to evaluate?

Your algorithm will be tested with data of secret writers.
Your algorithm will receive 5 labelled equations of each secret writer right before the evaluation to automatically adapt your pipeline to this person’s gripping position and writing style.
More info on what and how to submit can be found here.

Data Acquisition

To obtain the sensor data we provide for this challenge, we implemented a recording app that connects to a DigiPen and tells the volunteers which equation to write. These are some of the constraints that were met during the recordings:

The recordings were conducted sitting on a chair in front of a table.
The writing surface was horizontal.
Normal, white paper sheets (about 80g/m^2) were used to write upon.
The sheet was padded by five additional sheets.
There was no guideline concerning the size of the handwriting. The subjects were asked to use a size that is natural for them.
There was no guideline concerning the way of holding the pen. The subjects were asked to use a position that is natural for them.
The volunteers were asked to make sure the STABILO logo faces up to avert different pen orientations.
Participants could choose freely between print and cursive writing styles.
Only right-handed recordings are released during this challenge.

The Dataset

Data Acquisition

Cookies