Data

The Dataset

The challenge is already finished, but you can download a very similar dataset: the onHW-Dataset.

Stage 1

In stage 1 of the open competition, your task is to classify 26 upper case letters. We recorded 100 volunteers that provided 13102 letters.

This is what you will be able to download:

  • A folder containing 100 subfolders. Each subfolder contains one recording of one person. Each recording consists of three files:
    • sensor_data.csv: Contains the raw time series data of a complete recording of two 3D accelerometers, a 3D gyroscope, a 3D magnetometer and a 1D force sensor sampled at 100 Hz. Furthermore, the last column of the file is a simple sample counter and the first column contains a timestamp indicating when the sample was transmitted to the recording device via Bluetooth. More info on the sensors.
    • labels.csv: Contains the information at which point in time which letter was written. The start and stop columns of the labels.csv file point to the Millis column in the sensor_data.csv file.
    • calibration.txt: Contains parameters obtained from the last calibration procedure performed with this pen. More info on the calibration.
  • A Python 3 script split_characters.py that splits the sensor_data.csv file according to the timestamps given in the corresponding labels.csv file. It also saves the obtained letters and their labels in a pickle file for easier future use. Look at this file’s source code to understand what’s going on in-depth.

Stage 2

For stage two, we added lower case letters to the data set. 26179 letters are provided overall to predict 52 classes. The data format stays the same.

Data Acquisition

To obtain the sensor data we provide for this challenge, we implemented a recording app that connects to a DigiPen and tells the volunteers which letter to write. These are some of the constraints that were met during the recordings:

  • The recordings were conducted sitting on a chair in front of a table.
  • The writing surface was horizontal.
  • Normal, white paper sheets (about 80g/m^2) were used to write upon.
  • The sheet was padded by five additional sheets.
  • There was no guideline concerning the size of the handwriting. The subjects were asked to use a size that is natural for them.
  • There was no guideline concerning the way of holding the pen. The subjects were asked to use a position that is natural for them.
  • The volunteers were asked to make sure the STABILO logo faces up to avert different pen orientations.
  • Participants could choose freely between print and cursive writing styles.
  • Only right-handed recordings are released during this challenge.