This article was written by Stefano Mauceri who made it to this year’s finals at the DatSci Awards as “Data Science Student Of The Year” finalist, 2017, which is proudly supported by Core Media. Stefano gave us some insight in to some recent research he conducted in researching whether a person could be identified by analysing data derived from wrist-worn accelerometers and the results look promising!
We are excited to see Stefano and all the other contestants at the second DatSci Awards in Croke Park on the 21st of September and we would like you to be there as well to celebrate data science. Get your ticket here!
Stefano Mauceri is entering his second year as a PhD student at the UCD Natural Computing Research & Applications Group. His research interests include anomaly detection and bio-inspired algorithms. Prior to the UCD PhD programme, he graduated with the UCD MSc in Business Analytics. Previously he graduated in Economics for Tourism (MSc, Bicocca University, Milan) and Management (BSc, Bocconi University, Milan).
Stefano applied for the “Data Science Student Of The Year” competition at DatSci Awards 2017 and was shortlisted as finalist for his joint master’s thesis project titled: “An heuristic approach for the Green Vehicle Routing Problem”. This work will be presented by the two authors on 17th August 2017 in Croke Park (Dublin).
Here’s what Stefano had to say:
I’m a PhD student at the UCD Natural Computing & Applications Group and in the following lines I’ll introduce you to a research question I’m currently investigating: can we identify a person by means of wrist-worn accelerometer data?
Wrist-worn devices popularity continues to increase not only among consumers but also in the context of medical research. Researchers see such devices as an inexpensive and unobtrusive diagnostic tool. Two central applications are the assessment of physical activity, and the study of movement disorders.
Data integrity is of utmost importance in medical research therefore scientists want to confirm that a device is being worn only by the intended person for the entire data collection period. This is to prevent fraud or other misconduct which could invalidate their studies. In other words: given a daily time-series of accelerometer data we want to be able to tell if it has been generated by the intended person or not.
Wrist-worn devices can record several different variables however in my study I focus only on triaxial acceleration. Other sensors commonly embedded in such devices are: a gyroscope which estimates rotation in space, and a magnetometer which determines orientation with respect to Earth’s magnetic field. Sometimes there are also a light sensor and a thermometer. These sensors along with specialised algorithms can provide information about: body posture, body temperature, energy expenditure, heart rate, light intensity, physical activity, sleep-wake cycles, and steps taken.
Is not hard to see why wearable devices are a remarkable source of data for medical researchers. However, dealing with this data it’s not an easy task. The problem at hand and the big volume of data involved are outside the expertise of most medical researchers.
Once again data science shows its versatility. Solving the problem of subject identification through wrist-worn accelerometer data don’t require any expertise in terms of medical knowledge but it requires familiarity with data science fundamentals. To address the problem, I have had to deal with missing values, find out the most appropriate data resolution, identify a data representation strategy, test algorithms and classification approaches and measure performance.
My preliminary results show that with the right data representation a simple algorithm like logistic regression can achieve 85% accuracy.
Even if it’s in an early stage my research shows promising performance. Accelerometer data gathered through wrist-worn devices can outline a unique representation of a subject and therefore allow his/her identification. I’m now investigating advanced bio-inspired learning methods including genetic programming (GP) and neural networks.