One evening last week, I attended a very interesting round table discussion about big data in the UK transport scene. I was invited to kick start the evening by saying a few words, and did so in front of a small but diverse group of people hailing from many corners of the UK transport sector. Here’s a quick round up of how the evening went.
First, what did I talk about? I put down five points in my notes:
- The Big Data Trend. The term “big data” is spreading like wild fire; it is heard everywhere nowadays. But, of course, today’s big data is tomorrow’s small data. Yesterday’s big data is hardly data at all. A rough simile I devised to show what really matters: if you have a lot of gold, you are rich. But if you have a lot of data, it’s more like having a lot of time to live your life: the real value is what you do with it (not just having it in the first place).
- The Two Worlds. A notable trend in data is the divergence between how web-based companies have been using it (it is their lifeblood) and how “physical-world” companies have been tackling data (which seems to be more of “challenge,” a “problem,” or an “untapped opportunity”). There are therefore valuable lessons to be learned from how web companies view and reason about their data that may be easily transferred to the offline domain.
- Brief words on stuff I’ve been doing. Seek elsewhere (or in other blog posts) for this info.
- Some lessons: what should you be using “big data” for? Well, this falls down to the basics of being a data scientist (a term that we discussed in and of itself as well).
- Challenging assumptions. Every time that I have an idea about how people use a system (e.g., the tube), data analysis shows that there are a variety of behaviours and uses that I had no idea would ever emerge. But do. Lots of data teaches you to be a bit humble before throwing around your views of how the world should work.
- Discovery. Often times, I didn’t know what problem(s) need solving until I’ve put my hands on some data. This has been my experience with TfL ticket purchasing data.
- Measuring and deciding. Data is not only useful for observing how people behave, but also as a means for measuring the outcome of interventions on their behaviours.
- Building and engaging. A powerful aspect of big data is its cross-applicability. Just as Google uses clicks to improve search, transport authorities could use fare data to build personalised services, or evaluate transport policy.
- Some discussion points. The travel information “market” is very much open. I claimed that, if transport operators and authorities don’t learn to quickly capitalise on their own data, others will. Moreover, the (e.g. smart card) data they hold is hardly unique: mobile phone operators and social media feeds are quickly beginning to contain the same (or richer) data about how people navigate cities. What are the challenges to adopting big data? Why aren’t we there already?
The ensuing conversation covered a lot of ground; I wasn’t taking notes throughout and had to wait until I got back home to jot down some of the themes that emerged. But here are some highlights:
- Transportation vs. customer experience. The transportation industry’s core business is getting people from A to B. Progress in the public transport domain has been marred by the notorious lack of competition (you either take our bus or find your own way…). As such, transport authorities have been (are?) less concerned with providing a “customer experience” in favour of maintaining and improving infrastructure. Can this be changed by a fresh perspective on data?
- Data modeling, application developing. Should transport operators be in the business of making mobile phone applications? Again, that steers them away from their “core business.” However, they already do data-modeling in house, so should they also go the next step? Or should they not be doing data science in-house at all (and seek aide externally from those who may do it better for that too)?
- Open, valuable, private data. There is a continuous cry for transport operators to release their data. However, in this case I argued that there are two camps of data that they hold:
- Open-able data. Includes train time tables, fares, aggregate usage statistics, etc. A daydream for the frenzied travel-app maker or data journalist.
- Private data. The implicit profiles that they can build of all their passengers. Where people go, when, etc. I argued that, if they are ever to find new value in their data, it would be in this category (but return to theme #1 above…).
Overall, a great evening. Other tiny highlights: I recommended Predictably Irrational as well as The Wisdom of the Crowds, told everyone what Foursquare is, and received a book on Managing Business Risk.