I was recently pointed to this excellent blog post that argues how the ideas of ‘big data’ and ‘quantified self’ do not fit well together. The title here comes directly from that post: “Big Data and Quantified Self, just like chocolate and champagne, do not pair together well.” In the true spirit of online blogging, I thought I’d reply here instead of via e-mail.
The key idea is that ‘big data’ tends to focus on the ‘average’ person: the aggregate of many noisy data points that, when put together, give an indication of behaviour that is the sum of everyone, but manifested by no one. Self-tracking, or quantified-self, data comes from a self-selecting sample of the population and therefore is not representative of everyone: “self-trackers are different from other people with regard to mentality, psychological traits, lifestyles, behaviors, etc. So even if we derive a certain pattern based on a data from a hundred, thousand or even five thousand self-trackers with diabetes, that pattern won’t necessarily hold for all other people with diabetes.”
I mostly agree with this: my thoughts only differ in terms of the conclusions.
First, this problem is increasingly emerging/actively discussed in all ‘big data’ research. Studying how people move around cities based on foursquare check-ins only looks at people who like foursquare, researching how twitter predicts elections only looks at the sample of people who use twitter, and 96% of brain research has been conducted on westerners. Psychologists agree that they have been mostly studying people who are WEIRDos (Western, Educated, Industrialized, Rich, and Democratic). While something certainly has to be done to address this, I would posit that throwing away everything we have learned is not one of those things: there are many domains (take, for example, medicine) where ‘small’ tests have led to methods that have successfully scaled to all. Instead, we need to increase our awareness about how much of a sub/self-selecting-sample we are dealing with when making our conclusions.
By being full of people, ‘big data’ also has one key advantage: it can help overcome the data sparsity that any single self tracker will face, and finding links between people’s behaviours is the only way to do that. While tracking my mood, I know that I cannot accurately record it every minute, since I am otherwise engaged. However, your actions and mood may have something to teach me.
Mathematically speaking (see the other blog post), I’m saying that while Y_me = f_me(X_me), and Y_you = f_you(Y_you), since we are all human there are bound to be some people in the world where f_me ~ f_you: and we can learn from one another. So one of the goals of the quantified self movement should be to facilitate this process: putting people together in a room where they each talk about their lessons learned is a first step in this direction.
The only difference I see between QS and big data? By looking at your own data, QS seems to encode the ideas of mindfulness (beyond just self-experimentation). When I look at my QS data, I stop and think about my life. When I’m running my ‘big data’ experiments, I don’t!