How to prepare for the Parse.ly Data Pipeline v2.3.0 release

We are making improvements to the Parse.ly Data Pipeline!

In this new release, we are adding a lot of new functionality to the Data Pipeline, notably identifiers for pageloads, support for filtering to specific channels, and Apple News data for customers with an enabled Apple News channel. We are also versioning our schema on a go-forward basis.

However, we do want to make you aware that some of these changes might require changes to your ingest code or to your query code that makes use of Parse.ly data. Please read below for more information.

New data:

Parse.ly now supports Apple News Real-time analytics. This is automatically disabled for all Parse.ly Data Pipeline customers. Send us a message at support@parsely.com to learn more about enabling Apple News in the data pipeline.

New columns:

  • channel: The Parse.ly-defined channel the event came in on. Can be strings like fbia (Facebook Instant Articles), apln-rta (Apple News), amp (Accelerated Mobile Pages), or website. If we add a new channel, that value will appear here. (Note that your Parse.ly account must be integrated AMP, Facebook Instant, or Apple News for those values to appear.)
  • pageview_id: A unique identifier for the pageview associated with an event. This will remain consistent for all events for a given pageview. This allows for sophisticated aggregation in your data warehouse like correlating all heartbeat events for each pageview event. This will either be null, or a long integer like 17542680 and will always appear.
  • pageload_id: A unique identifier for the pageload of an event. This is useful for single-page apps where there may be multiple calls to trackPageview for a single page load. This will either be null, or a long integer like 17542680 and will always appear.
  • videostart_id: A unique identifier for a given videostart event, allowing you to correlate other events, like vheartbeat to their originating videostart. This will either be null, or a long integer like 17542680 and will always appear.
  • schema_version: This is a new field and indicates the matching schema version from pypi and the parsely_raw_data repo. For example, this new release will be 2.3.0. The other versions that are on pypi can be found here: https://pypi.org/project/parsely_raw_data/ We will maintain our documentation by schema version, enabling back-referencing of older versions for historical data.
  • Note, if pageview_id or pageload_id are null, you may be on an older tracking version, please message support@parsely.com  to upgrade your tracker.

Altered column types and default behavior:

  • Any column that does not appear for ANY event will receive a null value. This logic will look at the schema that we have defined and auto-populate missing columns with null. All columns, regardless of whether they are null or not, will now be available for every event. This means that each event will have 122 columns, with null values populated where appropriate.
  • flags_is_amp: This is a field that has existed for a few years. However it is typically null instead of False. This will now be of type Boolean and either True or False.
Thanks, we will be in contact shortly.
Do you have an urgent support question?