A recent talk that I gave in Cambridge’s Dept of Psychiatry and Institute of Public Health, and UCL’s e-Health unite.

Since launching Emotion Sense, I have received a number of e-mails from people who would like to conduct studies or build apps that collect similar kinds of data:

  • Momentary Survey Responses: answers to questions about the moment that the person is in. How happy are you? How focused are you? Where are you? Or, indeed, any of the other kinds of questions that are typical of the experience sampling method.
  • Smartphone Sensor Data: it is increasingly well-known that smartphones have a variety of sensors that can be used to characterise a person’s behaviour and environment. Here is a paper (pdf) that I wrote about that very topic. What are a participant’s GPS coordinates? What are their calling and texting patterns like? And so on.

The main challenge, however, is building this kind of software. This takes time, money, and a lot of programming/design effort: precious resources that are not widely available. Moreover, it is impossible to ‘use’ the Emotion Sense app for anything other than what it was built to do: collect data about positive/negative mood (via the very specific questions that we have chosen) while taking users who download the app on what we have called a ‘journey of discovery,’ as they unlock feedback screens while they use the app.

In essence, Emotion Sense isn’t useful for others who are interested in doing similar kinds of research. There are two directions that I can take this: (a) building new, specific apps on a case-by-case basis, or (b) building a new, generic app that tries to suit the broad needs behind doing this kind of research. While I am pursuing both of these paths, this post is about the second, (b).

I have spent a few months working on a new app: Easy M. The app essentially takes all of the pieces of the Emotion Sense puzzle (indeed, it also uses the same open-source code) but has reshuffled everything into an app that can suit others’ needs. The idea behind this was to build an app much like the fantastic MyExperience toolkit, which unfortunately is only available for ‘old’ Windows Mobile devices.

The Easy M app is now online on the app store, and we are soon going to launch the first studies that use this tool. The way this works is (hopefully) simple: your users download the app, put in a code for your study, review what it is they are joining (in terms of data collection) and after giving their consent, the app automatically reconfigures itself to do everything you need: ask the questions that you want, and collect the sensor data that you would like to focus on. Once your study has run its course, the app logs your users out of the study and makes sure that any pending data gets sent up to the servers.

The online documentation includes most of the details about what this app can and cannot do. In particular, there is currently no way for you to use the tool without getting in touch with me, so that I can set up the appropriate configuration files for your needs; the data also comes to our servers to then be transferred to you. As I develop the tool, I hope to add many of the missing bits — for now, I’m interested in hearing from you if this is the kind of tool that you would like to use in your research.

Slides from a talk I’ve given at a few different places recently, covering the design and deployment of Emotion Sense and the initial design of Easy M.

 

2013 was a long, long year for me; here’s my attempt to summarise it as succinctly as possible.

  • I released Emotion Sense. To date, it has been downloaded approximately 30,000 times. In the days after the release, the press coverage was very intense and exciting. Following the release, there has been a lot of admin to keep the app going, working, updated, and to keep our servers in shape. The research is on its way, too!
  • I continued my data science consulting work. I worked with banks, hotels, data science groups, and a charity. It is like getting a breath of fresh non-academic air. It’s always an eye-opener to see what data-related problems pop up in the so-called ‘real’ world.
  • I worked with Kiran Rachuri on the open-source Android sensing library that we have now released (along with a data manager and trigger library). A bunch of students at Birkbeck College tested the library as part of their mobile dev course; we wrote about it in a workshop paper.
  • I was a guest lecturer on the Coursera Introduction to Recommender Systems; I got to talk with Michael about the temporal issues that emerge in online recommender systems that I studied during my PhD.
  • I wrote a chapter for the upcoming Handbook of Human Computation. The other chapters look very interesting, I’m looking forward to reading them.
  • We published a study (at ACM Ubicomp) that we conducted in late 2012 that used the libraries above in an app that asks people about their feelings and context (a precursor to Emotion Sense).. the paper has the enigmatic title “Contextual Dissonance…”; in the final days before submission, it was being edited from 3 different time zones (which turned out to be surprisingly efficient).
  • I was Ubicomp’s social media chair. That meant running their Twitter, Facebook, and Google+ accounts, which was a bit of fun. I also co-organised the conference’s Workshop on Pervasive Urban Applications.
  • I worked with Jagadeesh Gorla and colleagues to publish some work on group recommender systems at WWW. I didn’t get to go to Brazil, but I hear the work contributed to the founding of a Cambridge-based recommender system company.
  • What will smartphone-based behaviour change interventions look like in the futre? We wrote a paper about that for IEEE Pervasive Computing’s Special Issue on Understanding and Changing Behaviour (pdf).
  • Some early work on a smoking cessation app received funding from the MRC. I’m looking forward to that kicking off in 2014 too.
  • I built the Android Easy M app. It’s a bit like Emotion Sense under the hood, but it’s for other researchers who want to run sensor-collecting experience sampling studies with their own setup. The first few studies will be kicking off in the beginning of 2014.
  • I gave talks at Birkbeck College, Ghent University, Sussex University, the Ubhave Conference, DrinkAware’s Workshop, Ubicomp (and its workshops) and at a Government Social Research Workshop. All black slides ftw.
  • I reviewed papers for a range of different conferences (ICWSM, SIGIR, CIKM) and workshops. I realised that I have reached the stage where I probably read more unpublished work than published papers. I certainly spend more time reading papers that I have to review than those that I don’t!

With all this manic stuff going on, I think that 2014 is going to be a year of prioritising and getting things finished.

Recently, we launched the Emotion Sense app for Android on the Google Play Market. This app combines experience sampling with the passive data that modern smartphones can collect, and gives people feedback about how their reported mood compares to this data. Of course, this app was designed and built to support our research into how daily mood relates to the behavioural signals that phones can capture.

To support the launch, this press release was published, which has since been picked up by a number of newspapers and blogs (a sample of them are listed here). Overall, the response has been overwhelming and we have learned a number of lessons which I should dedicate a separate blog post to.

Naturally, since this app collects data from a wide variety of sensors, there have been a number of concerns about privacy. This issue has and continues to be very important to us. However, a number of comments that have been made are misleading and so this post aims to clarify how and why we collect sensor data.

Why do we collect sensor data? What happens to this data?

The aim of collecting sensor data is to support academic research into daily moods and smartphone sensors. We are academic researchers, not marketers: we are ethically bound and fully committed to never sell or share the data the app collects. We are not interested in advertising or making money off your data: we are interested in progressing the state-of-the-art in Computer Science (sensing and data mining) and Psychology (studying daily life). If any of us leave the University, to move on to other projects, we will leave the data behind. We will also never make the data available to anyone, except the person who generated it. If you want your data, you can simply get in touch.

Our research is not about prying into individual’s lives. We are looking for broad patterns, which emerge from many people doing the same thing (e.g., using an app). We really have no interest in looking at anything other than aggregate patterns in the data.

How is the data anonymised?

The app does a fair bit of data collection on your phone, but then anonymises it: we do not receive your raw data. In particular:

  • The app does not send us conversations,or any audio recordings at all. All it does is measure the ambient volume, which is a number (e.g., “23”). We do not and cannot track the web sites you visit, your eye movement, or how you touch your phone screen (in fact, other researchers have shown that this is impossible on a number of Android devices!).
  • The app does not record any text message content or clear text phone numbers. In fact, it uses a one way hash function to convert a phone number into a indecipherable string. So, for example, we will see that a phone texted another phone, identified as “abdjasdfkjqwercsdsdsaqt2″ and sent 3 words.

How is this research funded?

We have not paid anyone to write/blog about our work, and our project has no commercial partners. Our work is funded by the Engineering and Physical Sciences Research Council: details of the project are available here.

We are aware that people may still have some questions about how the app works. If you or any of your readers has any questions, please feel free to contact me: neal.lathia@cl.cam.ac.uk

Or visit our FAQs page.

Update: Potential Reasons for this Confusion

As pointed out to me, some of the confusion about what we are doing with data may be due to this earlier research paper that was published in 2010 and used the same Emotion Sense name. I would like to point out that the app used in this paper is very different from the app that we released to the general public.

The publicly available app does not perform any speaker recognition or emotion detection. This is for a number of reasons:

  • Privacy. Since an audio recording should have informed consent, and not everyone around your phone may have given it, we chose to not store an audio that is recorded beyond (as above) the ambient volume.
  • Technical Challenges. While audio processing research is progressing, an earlier trial of the techniques we used in previous research were inconclusive and overly cumbersome: the sounds that people are surrounded with when their phones are in their bags, pockets, etc. go well beyond the controlled trial that was done in the lab.
  • Moving Beyond the Microphone. There is, naturally, more to a smartphone than its microphone. The design of the publicly available app therefore seeks to learn about daily life using a more holistic approach (i.e., combining the data from different sensors).

I was recently pointed to this excellent blog post that argues how the ideas of ‘big data’ and ‘quantified self’ do not fit well together. The title here comes directly from that post: “Big Data and Quantified Self, just like chocolate and champagne, do not pair together well.” In the true spirit of online blogging, I thought I’d reply here instead of via e-mail.

The key idea is that ‘big data’ tends to focus on the ‘average’ person: the aggregate of many noisy data points that, when put together, give an indication of behaviour that is the sum of everyone, but manifested by no one. Self-tracking, or quantified-self, data comes from a self-selecting sample of the population and therefore is not representative of everyone: “self-trackers  are different from other people with regard to mentality, psychological traits, lifestyles, behaviors, etc. So even if we derive a certain pattern based on a data from a hundred, thousand or even five thousand self-trackers with diabetes, that pattern won’t necessarily hold for all other people with diabetes.”

I mostly agree with this: my thoughts only differ in terms of the conclusions.

First, this problem is increasingly emerging/actively discussed in all ‘big data’ research. Studying how people move around cities based on foursquare check-ins only looks at people who like foursquare, researching how twitter predicts elections only looks at the sample of people who use twitter, and 96% of brain research has been conducted on westerners. Psychologists agree that they have been mostly studying people who are WEIRDos (Western, Educated, Industrialized, Rich, and Democratic). While something certainly has to be done to address this, I would posit that throwing away everything we have learned is not one of those things: there are many domains (take, for example, medicine) where ‘small’ tests have led to methods that have successfully scaled to all. Instead, we need to increase our awareness about how much of a sub/self-selecting-sample we are dealing with when making our conclusions.

By being full of people, ‘big data’ also has one key advantage: it can help overcome the data sparsity that any single self tracker will face, and finding links between people’s behaviours is the only way to do that. While tracking my mood, I know that I cannot accurately record it every minute, since I am otherwise engaged. However, your actions and mood may have something to teach me.

Mathematically speaking (see the other blog post), I’m saying that while Y_me = f_me(X_me), and Y_you = f_you(Y_you), since we are all human there are bound to be some people in the world where f_me ~ f_you: and we can learn from one another. So one of the goals of the quantified self movement should be to facilitate this process: putting people together in a room where they each talk about their lessons learned is a first step in this direction.

The only difference I see between QS and big data? By looking at your own data, QS seems to encode the ideas of mindfulness (beyond just self-experimentation). When I look at my QS data, I stop and think about my life. When I’m running my ‘big data’ experiments, I don’t!

A couple of weeks ago I was invited to participate in a workshop at NYU’s CUSP, or Center for Urban Science and Progress. As they describe themselves:

The Center for Urban Science + Progress (CUSP) is a unique public-private research center that uses New York as its laboratory and classroom to help cities around the world become more productive, livable, equitable, and resilient. CUSP observes, analyzes, and models cities to optimize outcomes, prototype new solutions, formalize new tools and processes, and develop new expertise/experts. These activities will make CUSP the world’s leading authority in the emerging field of “Urban Informatics.”

The theme of the workshop was ‘mobile sensing’ – with, of course, a particular focus on how it may support urban science.

Talks. The invited speakers were from diverse backgrounds and institutions, making for a very interesting line up. I did not take any notes, so my summary here is vastly unfair to each talk:

  • Rob van Kranenburg (@robvank) naturally spoke about the Internet of Things, and how it fits into the broader ecosystem of cities.
  • Mischa Dohler (@mischadohler)’s talk covered urban sensors, and gave rise to a big debate about where the boundary between crowd-sourcing and urban sensing should lie.
  • Jacqueline Lu (NYC Parks and Recreation) spoke about how data is supporting efforts to maintain and promote green spaces in the city. This was particularly interesting since it made me realise how a seemingly ‘trivial’ problem (maintaining trees) is actually vastly complex when placed into urban settings.
  • Vivek Singh (MIT) spoke about ongoing mobile sensing experiments that investigate how behaviours can be promoted between social groups.
  • Margaret Martonosi (Princeton) spoke about her work using Call Detail Record data (e.g., see this paper). The data gives fantastic geographical coverage and potential to study many facets of mobility, while presenting very difficult challenges with regards to inference and privacy.
  • Weisi Guo (Warwick University), the workshop organiser, spoke about his research about understanding cities through mobile sensors.
  • Jarlath O’Neil-Dunne (University of Vermont, @jarlathond) gave a talk about geographic analysis using satellite data – I learned about how LiDAR data (e.g., this blog post) can be used to, for example, find trees that would otherwise be hidden by shade: a very challenging data feature extraction task.
  • Raz Schwartz (Rutgers, @razsc) is the co-creator of the Livehoods project, which is a great example of attempts to uncover the structure of the city via social media analysis.
  • Eiman Kanjo (King Saud University) discussed her work with mobility and affective sensing using smartphones (see her publications here)
  • Andrew Eckford (York University – Canada not UK!) gave a very interesting talk about molecular communication for harsh environments (say, flooded subway tunnels!) – something I had never heard of.
  • Graham Cormode (AT&T) discussed his work on distributed data monitoring and mining. See his personal page here.
  • Lin Zhang (Tsinghua University) talked about his work with sensors on Beijing’s taxi cabs for pollution monitoring (MobiSys paper here). The dataset is available on request.

Finally, I briefly talked (with very sparse slides) about open challenges in mobile sensing – ranging from energy efficiency to data inference and behaviour change measurement.

Open Ideas. The fact that this broad range of researchers all agreed to come to a workshop on ‘urban sensing’ shows how this field is still in its infancy; I very much enjoyed the fact that everyone spoke about very different things. In fact, we even differed on the basics:

  • What is urban? It is one of those words that could mean tunnels in a subway, parking sensors on a road, or community driven tree maintenance. The ‘where’ of all the research above certainly agreed on cities: but within this context, there is a hierarchy than ranges from metropolitan-scale analysis down to individual citisen’s sensors.
  • What is sensing? It seems that ‘sensor’ is quickly becoming a term to mean ‘a source of data;’ while this is consistent with the past, my gut tells me that historically this would not have been the case. Both tweets, accelerometers, and satellites are sensors, albeit very different in nature: and there is ample space for research both within the scope of individual sensors and finding links/building systems that bridge between them.

 

I was invited to Gent recently to give a talk at a workshop on mobile research. Smartphones are at the intersection of a variety of domains… from hardware to usability and machine learning. The slides below tried to capture this, and look at some recent lessons and application areas that my research is supporting.

I recently attended and presented at a workshop on shared bicycle systems in Paris, France. The workshop, called “Spatio-temporal Data Mining for a Better Understanding of People’s Mobility: The Bicycle Sharing System (BSS) Case Study” was organised by Latifa Oukhellou from IFSTTAR; the program and slides are online here (and the presentation that I gave on my recent paper is embedded below).

I personally think that it was an excellent opportunity to collect a variety of people who have been working with data from shared bicycle systems; particularly since the work spanning this ‘nascent’ field is from people with very diverse academic backgrounds. The day uncovered surprising similarities in the techniques that people are using to analyse a variety of city’s data (e.g., clustering stations), and, more broadly, what the few (but growing) researchers in this field have been trying to solve.

The day also really served to expose a number of key problems:

  1. Data Acquisition. There is a blatant tension between researchers who have and want to continue studying these systems and those practitioners who run shared bicycle system web sites. A vast majority of researchers in the group have obtained their data by regularly crawling a shared bicycle map, like this one of London. This has allowed researchers to collection time-varying station capacity data, and is useful for training algorithms that seek to predict how many bikes a station will have. However, this is clearly not an ideal way to collect data, and I hope to see a closer collaboration between transport operators and researchers in this domain in the future.
  2. Data Quality. The data that is collected by scraping web sites is prone to inaccuracies and noise, and this can lead to errors in our analysis. For example, Ollie (who has made all the great online maps of shared bike systems) pointed out that one of the differences that I uncovered in my recent work was not due to a change in activity, but in the fact that the station that I thought had changed patterns had, instead, simply been moved closer to a train station. While, in hindsight, I don’t think this completely deconstructs the work I did (!) I wonder how much of the broader research is somewhat affected by similar hidden changes (or, at least, changes that a web scraper would not be seeking).
  3. Data Granularity. More importantly, web-scraped data does not capture important features of the system, such as origin-destination pairs or the actual habits of the systems’ users. As researchers, we know that the value of data is often proportional to its granularity. For example, all of the recent work that I have done using Oyster card data would not have been possible if all I had was station gate counts, which is the rough equivalent of the data that most shared bicycle researchers have. How are people responding to incentives? What is the variety of behaviours that the system users are exhibiting? All these questions are currently beyond our (data’s) reach.
  4. Limits of the Data. A very important point was raised during the day: any data that the transport authority holds will inherently only capture the “satisfied” part of the travel demand. All public transport operators do not currently have means of gauging how many passengers they have failed to transport; whether that be because the person has made the (healthy) choice to walk, or (in the shared bicycle case) that person has found an empty station when they sought a bicycle, or a station where all the bike’s tires have been punctured.
  5. Motivation for Mining Shared Bike Data. As researchers, I don’t think that we have fully uncovered the entire family of problems that data from shared bicycle systems can address; I felt that some propositions were lacking in a grounded motivation. There are a wide range of problems that could be addressed, if the right data were at hand. For example, can the data be used to discover bicycles that are broken? Can real-time data mining guide a load balancing truck to best suit current and predicted travel demand? This is where perhaps a closer relationship with transport operators may again be helpful.

Overall, I think it was a great workshop, and I encourage you to look at the presentations that are online. If you have an interest in this area, I would also encourage you to join the Google Group that I set up for researchers in this domain to share their findings.

London Shared Bicycles: Measuring Intervention Impact from Neal Lathia

Data Science London hosted a meetup on recommender systems at the end of the Strata London Conference. To kick-off the presentations, I was asked to give an overview of what is happening in the #recsys research community, with a particular focus on what happened recently in Dublin at ACM RecSys 2012 (which was then followed by talks by Tamas Jambor, Dinesh Vadhia, and Sean Owen). It was certainly a daunting task to give a 15-minute summary of a week-long conference, so I chose to do so by giving as many pointers to people and research topics as possible, within some kind of coherent story line.

This is what I went for (slides below):

  1. Why do we need recommender systems? While the older papers in the field talks about information overload, a recent alternative idea is that the web, which has facilitated the quick and large-scale publication and distribution of all kinds of goods, has removed the financial, editorial, or other kinds of filters that we previously used (filter failure). But, really, this is old news too: we now implement recommender systems to foster engagement and community, and the web has become an ecosystem of personalisation (see Daniel Tunkelang’s talk about LinkedIn recommendations).
  2. What are recommender systems? They are collaborative, query-less discovery engines. They are machine learning applied to preference signals. And while the Netflix prize always comes to mind in this context, there is actually a thriving and growing research community that meets annually at the ACM RecSys conference (and I managed to find a photo of me at RecSys 2007!)
  3. Don’t Reinvent the Wheel. Many times that I talk to start-ups that are building recommender systems, they tell me about problems that they are having which have been regularly visited by the rich research literature (cold-start, scalability, etc). So, what are some of the problems that the research community is looking at today? I made up 5 points, based on looking at the sessions and program from this year’s conference:
  4. Problem 1: Predictions. The research community has become very aware of the fact that there is more to recommendation than predicting ratings. There was an entire workshop dedicated to evaluation beyond accuracy (proceedings are now online here). How can you make recommendations novel, diverse and serendipitous? How do you deal with conflicting objectives?
  5. Problem 2: Algorithms. Related to the above, there was a nice discussion on the balance between the effort required (imposed) on users to rate things in order to improve recommendations vs. improving algorithms that can deal with few ratings. This topic fits well into the general theme of defining just what algorithms need to do: as above, while the traditional focus has been on prediction, recent shifts (including the best paper at the conference) were about ranking.
  6. Problem 3: Users and Ratings. The traditional mode of thinking about recommender systems has been “users” and “items,” who are linked by “ratings.” This paradigm is slowly being shown to be incomplete. What about context? What about groups of users? What about the platform you are delivering recommendations on (tablets, mobiles, PCs, televisions)? There was a related discussion at the mobile workshop I co-organised: is there a difference between capturing preference (what I like) vs. capturing intent (what I want)? As a side note, many times that I hear that the Netflix prize come up in conversation, people echo the widely publicised fact that the challenge solutions were not implemented due to their engineering constraints. But it is worth reinforcing a broader point that Xavier presented: Netflix has moved on to “other issues” that are more important.
  7. Problem 4: Items. The idea of having tangible “things” that you recommend is also slowly shifting. There was a whole workshop dedicated to recommender systems for lifestyle change, which I sadly missed. If “items” can now subsume decisions, behaviours, and processes – what are they, and are they worth thinking about as items?
  8. Problem 5: Measurement. The most recurring conversation at ACM RecSys is about understanding how to measure progress. I’ve already touched on it above; however, this year there were three clear groups: (a) algorithm-people, who present their results with empirical metrics performed on offline experiments, (b) usability-people, who perform experiments by means of user studies, and (c) the industry – which was clearly advocating online, large-scale A/B testing (see this great keynote). Sadly, academic researchers don’t have access to (c). Moreover, the real problem is that nobody really knows how (a), (b), and (c) relate to one another.
  9. So, to end: 3 key take-aways. First, recommender systems are an ensemble… of disciplines. This is clearly recognised as not being a exclusive machine learning topic. Second, the idea of black-box recommenders is slowly fading. Long live the domain! (and check out Paul’s keynote on music). Finally, the recsys research community clearly differentiates itself from others by having always been highly involved with the industry and start-ups who are building and running these systems, and there are tons of great open-source projects (e.g., MyMediaLite and Lenskit), backed by open, intelligent, and collaborative people which are there for you to explore and learn from.
Follow

Get every new post delivered to your Inbox.