Can Help You Make Sense of Your Raw Data

Unstructured data can be a mess. By analogy, if you’ve ever tried to organize a messy drawer, or worse, a garage filled with “unstructured” items, you know that the process can be more challenging than you might think.

Luckily, in digital publishing, there’s a wealth of structured data readily available — no spring-cleaning initiative required. Your CMS, which was built to help publish content to the web, creates a massive amount of structured data, almost as an accidental byproduct. Layered atop this is metadata required for good display in search engines, social networks, and lately, distributed content initiatives like Facebook Instant Articles.

By default, most publishers have a plethora of categorizations built into their sites, including things that you can see easily like sections, titles, and author names. But there is additional information about things you might not be able to see — the word count of an article, the date of any modifications, visitor geography, engaged time per session, type of device, paid campaign tracking variables, and much more.

Many media companies do not understand how to take advantage of these rich data sets — though they may try.

We have seen our savviest customers developing in-house data analyst and data scientist teams, and they want to not only understand audience data better, but also to audit the data collected from their websites for its compliance with privacy policies.

Andrew Montalenti (Co-founder and CTO,

That’s why has decided to open our own data, in raw form, for our customers.

Introducing the Raw Data Pipeline’s Raw Data Pipeline collects engagement data via a tracking pixel, such as visitors, views, engaged time, and traffic sources. This is the same tracking methodology that powers our real-time dashboards. The Data Pipeline then streams data in real-time to a (but customer-secure) “S3 bucket” and “Kinesis Stream”. These are modern and secure data storage and access mechanisms powered by Amazon Web Services (AWS), the world’s largest cloud hosting provider. then provides your team with open source code recipes for easy data access; these can be used to load your data into hosted data warehouses such as Amazon Redshift and Google BigQuery. If that sounds like a lot of technical jargon to you, don’t worry. Your engineering team will know that these are the most modern and scalable tools you can use to make sense of these sorts of large data sets.

Our Raw Data Pipeline allows digital publishers to conduct more complex analysis of the dataset than is allowed in the existing dashboard. For example: What are people who use Samsung Galaxy S5 devices reading? How much traffic was generated in the first hour of any content that received Facebook referrals? Are visitors from New York City more loyal to my site than those from the rest of the country?

The Raw Data Pipeline also allows digital publishers to build a “product” on top of the infrastructure, using Business Intelligence tools to create custom visualizations and reports. You can get 100 percent of your data into tools like Looker and Tableau, with advanced segmentation capabilities that extend far beyond the usual real-time dashboard. In this way, you can immediately deliver real business value to your team, as opposed to rebuilding existing data infrastructure.

Benefits of the Raw Data Pipeline’s Raw Data Pipeline helps digital publishers to see the entire universe of audience interaction, tracking users from anonymous visit to email newsletter signup to paid subscription, and more. This will help to unify all business units around a single stream of data, so that they are no longer using disparate technology to find information.

Understanding and presenting site view data is critical to our business, but the particular work of our data science team often requires access to the more granular data that underpins dashboards and reports. Getting this access is mostly painful. Excruciatingly painful — bad memories, scars and all. has gone out of its way to make this data readily available and query-able, working with us directly to implement a data pipeline that requires no maintenance on our end. Essentially, they carry the architectural burden and we can quickly get down to crafting audience profiles and article recommendations and that easy pathway to calculation is invaluable to us.

Haile Owusu

Chief Data Scientist,  Mashable

Is your media company or brand looking for an event analytics platform they can trust — backed by’s powerful data infrastructure and aligned with its existing web dashboard, iOS app, browser plugin, and easy-access APIs? We may not be able to help you organize your messy garage, but we can help you to make sense of your unstructured data, so you can finally get back to your most important work.

Contact us today to get started.