Creating a brand new lingo

We have enormous amounts of data that is essential to create great products. But it’s not always easy to use. Big Brain is a Schibsted project to get the data to speak the same language.

“What gets measured gets improved”. You’ve probably come across this quote by Peter Drucker, or one of its variants more than once. Although you could debate the fact that not everything is measurable but still can be improved – truth is, it will certainly be easier to tell you are improving if you can measure progress.

It goes further than that, without measuring how would you even know if something needs improvement? How would you know what you should be focusing on?

The lean startup loop takes it to the next level and makes measurement a key step without which you can not learn. Ideas and opinions help you elaborate relevant hypotheses but only facts and figures allow you to validate them. It is by measuring experiments results that you learn and decide which changes are worth implementing and which are not.

At Schibsted, data scientists and product analysts are not just working closely with product teams, they are part of the team from the very start. Imbedded in product teams along with UX experts, their role is to help formulating hypotheses based on data exploration, set experiments, implement analytics tools and, most importantly, translate results into key learning to support decision making.

At the same time, because they are part of a wider analytics community within Schibsted, data scientist and product analysts leverage on their peers ideas, skills and experience. And the challenge is considerable. It’s not just about monitoring product performance based on global benchmarks – it´s about discovering strategic and actionable insights to support product decisions and grow our business, through advanced data exploration and analysis.


Understanding how our users interact with our products, what drives their engagement, what makes them come back or stop using us, fuels new ideas to grow our user base through acquisition and retention. Data analysis, combined with UX user research, also provides a better and deeper understanding of who our users are.

Knowing how specific groups of users behave, makes it possible to predict the likelihood of new users becoming active. We can then adapt our communication and product experience accordingly.

Now to the complicated part – all of this makes a lot of sense, but it is easier said than done. Facts on what users do – user events – are collected from different platforms, ex-ternals and internals, following different format and logic. It is almost impossible to tie data points back together to get the entire picture. Data cleansing and formatting comprises about 90 percent of the work of a data analyst.

For the same reasons, business intelligence experts in the past decade have strongly advocated for one single version of the truth. However diversity is not necessarily a bad thing. Quite the opposite. Keeping data in its richest form, allows us to use it for different purposes in different moments in time.  As business and product change, what is true one day might not be true the next. Data you thought useless at some point might become crucial. Striking the right balance between data consistency, comparability and on the other hand local and real time flexibility, is a big deal.

With operations in more than 30 countries, speaking al-most as many different languages, how is Schibsted keeping this balance while answering the augmenting need for data throughout the organization? Big Brain. We started working on this project a little less than two years ago. The idea is simple, build a single data platform and a single data warehouse for the whole group, where data is stored following a common data model but where we also keep raw data to allow further data discovery.

In recent years, thanks to the much talked about “Big Data” technologies, the amount of data we can capture, store and process has fortunately exploded. We are now able to collect events from across a wide range of internal and third party platforms.

Today we have two types of data, behavioral data which consists of user events triggered from our apps and sites (visits, page views, the opening of a form to contact a seller, etc…) and content data, all piece of data related to content inserted into our transactional databases (ex, classified ads details such as category, price, title and text or purchase details of a premium feature such as quantity, types and payment mode, etc…)

Behavioral data, is captured using Pulse, our internal tracking system, and stored in a cloud-based data platform. The data format used is a combination of common events and customized events. That way we can have the best of both worlds. Critical events are captured in a similar way across all operations and markets, providing us with comparable data, while different teams have enough flexibility to track their own specific events as they wish.

Trailing the user journey

For content data, we first need to import content from our local databases before we can process it and add a common logic to it. Local databases, have to evolve according to local and specific product needs, which makes them hard to homogenize. Once raw data is stored into the data platform, we start cleaning and processing it and send it to Big Brain. When aggregating the data, we maintain both local and global dictionaries, so it is always possible for local teams to follow up their own metrics and keep track of historical data locally.

Basically, raw data is available at any time so we can always reprocess it. At the same time, because data follows a single format, thus comparable across markets and products, we can run common analysis and get useful benchmarks. Suddenly it becomes possible to trail the entire user journey, from their very first interaction with our different apps and sites.

None of this would have been possible without a strong collaboration with our local operations. Any top down approach would have failed to scale and provide benefits from day one. BI literature is full of gloomy big corp data warehouse endeavors. It is worth mentioning that when we started this project, the level of data maturity in our local operations was disparate and if some had very complete and well implemented data warehouses, other smaller operations had nothing at all.

It would have been unrealistic and counter-effective to set as an ambition to replace existing data warehouses and to stop local BI efforts. Right from the start we took an “open source” approach to the development of Big Brain. A core team would development basic functionalities, needed by all, without disrupting local road maps too much, and local teams would develop specific functionalities urgent to them but making sure it could be reused by other operations. Like building blocks, Big Brain functionalities, or “modules” as we call them, are assembled.


Focusing on data visualization

What’s next? We still have a long road ahead of us. We want to increase data collection, not all sites are currently providing data into Big Brain today and we will continue our effort to unlock access to relevant data at all levels of the organization. An increasing area of focus for our team now is data visualization. As more data becomes available, we can answer more complex business questions but we still need to find the best way to share findings throughout the organization. In that context, data visualization is no luxury! Having the right chart and graphs can save precious time understanding and turning data into action.

Schibsted has an amazing amount of data on user needs and behavior. This is a great foundation for creating successful digital products and services.  It is also a huge challenge. To be able to use all data it needs to be comparable.

With a common platform, common data formats and common logic it is possible to trail the entire user journey through products and act to make them stay and come back. This is why data is essential and how it can be used to create real engagement.