Concepts

« Why use the Parse.ly API? | Concepts | Application Programming Interface (API) »

Concepts

Basics

API Keys

Access to the Parse.ly API is provided to you when you receive an API Key from our developers. An API Key is a unique string your application uses to identify itself to Parse.ly’s servers. You can contact us for an API Key.

Authentication

To comply with the emerging OAuth web standard, all requests to the Parse.ly API that access sensitive user profile data must use OAuth for authentication. We use a form of OAuth called 2-legged OAuth, aka “Signed Fetch”. More information can be found in our OAuth Background document.

Implementation

OAuth’s primary reason for existence is to provide an open standard for cross-service API access delegation. For example, if a user needs to give Facebook.com authorization to download information about that user’s favorite music from Last.fm, Facebook.com would request authorization tokens from Last.fm via the OAuth protocol. The user would then authorize Facebook’s access to Last.fm via an interactive web-based screen. This prevents the user from having to share his Last.fm logon credentials with Facebook.com.

This three-step procedure is also known as 3-legged OAuth, and is often lovingly referred to as “the OAuth dance”. The 2-legged version used by Parse.ly skips the interactive step. The differences are well-described in the presentation, Wherefore Art Thou, OAuth?

Also, OAuth should not be confused with OpenID, a complementary but very different standard. Whereas OpenID provides a “single sign-on” mechanism to users for numerous web-based services, OAuth is a protocol that is used to authorize access to data associated with user accounts.

HTTP, REST and JSON

The Parse.ly API is exposed as a set of HTTP resources, following the RESTful pattern of API design (see REST). Each of these resources have various operations supported, which are mapped to URLs and HTTP methods. All request and response formats use JavaScript Object Notation, aka JSON (see JSON). If you wish not to use the direct HTTP API, we also provide language bindings, described later on in this documentation.

Note

For simplicity, Parse.ly only provides JSON as an output format for its API. If other formats – like YAML, XML, Protocol Buffers, or Thrift – would make more sense for your application, let us know.

Account Management

The purpose of integrating the Parse.ly API into your site is to actually learn about the trending topics on your site and the connection between those topics and your audience. This data can then be used to make personalized, targeted recommendations to your users, based on the interactions they have with your content. Parse.ly’s servers stores and analyzes a cached copy of content from your site. In many ways, our store of your site is similar to a search engine’s, but it is much more structured. We also regularly do custom work for large sites to make our data store link with your existing content management system’s metadata.

User Profiles

A UserProfile resource is associated with a single unique user. These are generally used following one or both of the following patterns:

  • Identifiable Profile: If your site already features a user account system, you can integrate UserProfile with identifiable user accounts in your site. For example, if you have a user named “John Smith” with the user ID “john.smith”, you can link that account with a corresponding Parse.ly UserProfile, and ensure his data follows him whenever he is logged in to your site.
  • Anonymous Profile: If you want to offer Parse.ly recommendations to anonymous visitors (very common to start out), you need to utilize our JavaScript Tracker, which identifies anonymous visitors in a way similar to systems like Google Analytics. Our tracker will automatically create UserProfile resources that correspond to your unique visitors, and use metrics from their browsing experience to power content recommendations, without requiring the user to log in at all.

UserProfile resources can also include personal information used to enhance or otherwise link that profile with other information sources. We refer to this as Profile Metadata. Howevever, the main purpose of a UserProfile is to:

  • Capture any explicit interests (e.g., topics, people, events) the user cares about
  • Keep a history of the posts the user has visited on your site
  • Capture data about the user’s interactions with content on your site

As a Parse.ly API user, you own this data. But, you derive value from it primarily by using our Query API, described later in this document. That is the API that allows you to recommend articles to specific users based on their reading behavior and tastes.

Interests

Interest resources specify topics that resonate with a user. These can be programmatically generated to populate the UserProfile with information about a user’s interests. Users with largely overlapping Interest sets are given similar recommendations.

Interest resources also include a rank, that allows you to specify which interests are more important or less important to a user. These influence results ordering in the Query API.

Data Sources

Once the JavaScript tracker is installed on your site, Parse.ly will automatically start indexing your content. Optionally, you can contact the team to add RSS/Atom feeds, which can be crawled in near-real-time to update content as it is published on your site.

Implementation

How does Parse.ly do near-real-time delivery of content from your RSS/Atom feeds? Parse.ly embraces an emerging standard known as the PubSubHubbub protocol. We integrate with our PubSubHubbub provider – currently, Superfeedr – via the XMPP / XEP-0060 standard. In fact, one of our engineers has open sourced our Superfeedr XMPP wrapper for Python; check out the sfpy project.

Channels

For multi-property online companies, we offer the ability to segment your content into one or more Channels. For example, Gawker.com, LifeHacker.com and Gizmodo.com are all run by Gawker Media, but it may be useful to be able to see analytics/recommendations for a single property rather than have all the data jumbled together. Channels provide this level of content segmentation.

Queries

Once you have your resources set up and have started to see some UserProfile, and Interest instances, you are ready to get to one of the core values of our API: retrieving content recommendations via queries.

Queries are executed against a single UserProfile, and return personalized content recommendations based on that profile’s interests and sources. Our recommendations are powered by a proprietary combination of naive Bayesian inference and collaborative filtering (see Algorithms and Technical Background). Results are scored based on how closely the article matches that user profile’s interests and reading behavior. The higher the score, the more likely that content will resonate with that user, and thus the more likely that user is to click on that article and enjoy reading it.

The result of a Query API call is a paginated list of Item resources, where each Item corresponds roughly to an article, with fields common to RSS/Atom feeds, like title and summary and link. An Item also includes fields that expose Parse.ly’s analysis results, like score_explanation.

You are free to display the results from our API any way you like, subject to our Terms of Use. We also provide a Whitelabel JavaScript Widget which can be used for standard integration use cases.

Our standard result set is sorted by score and recency, however, other sorting orders and search methods are available. For example, we make it possible to do full text searching and dated queries.

Reading Behavior

As users interact with content on your site, they are giving you valuable information about what articles, topics, and areas of interest resonate with them. Parse.ly allows you to tap into this valuable user data to connect your users with content they’ll love.

However, in order for our system to work properly, we need to be able to track what a user does with your content and analyze that content for clues about the user’s interests. We offer two forms of informing our system about your user’s reading behavior, known as explicit actions and implicit actions.

  • Implicit: by installing the JavaScript Tracker, our system monitors what parts of your site the user is visiting, what links they are clicking on, and how long they are spending on each piece of content. We analyze this content to build up statistical text profiles for each user (see our algorithms). We use these metrics and data to power a model of the user’s interest based on their implicit behavior on your site. Without the user even realizing he is using a content recommendation system, his past behavior on your site is creating virtuous circles of valid recommendations / positive browsing experiences from the Parse.ly API. Our recommendations also downplay content that would not resonate with that user and avoid boring and unengaging browsing experiences.
  • Explicit: if you use our Whitelabel JavaScript Widget in its advanced form, or if you want to build a specialized reading interface powered by Parse.ly recommendations, you will prefer our explicit reading behavior model. In this model, you actually make API calls to Parse.ly to tell it what a user has done with a piece of content on your site. The actions currently available are: Starred, Shared, Marked Read, Marked Unread, Archived, and Deleted. These actions act as a form of user training, but are meant to be seamlessly integrated into your site.

Our users often mix the implicit and explicit models for best results, but good results can often be achieved with the implicit model alone. We are constantly improving each to make it work even better.

Real-Time Updates

Parse.ly’s backend infrastructure is powered by message queues and “real-time” protocols like XMPP and PubSubHubbub. We therefore process data in “near-real-time” as that data flows into our system. An obvious use case for Parse.ly’s API is in real-time applications, where updates are delivered to your client as they are made available rather than you having to poll our service for new information.

However, at this time, our API does not expose a real-time interface. Every query to the Parse.ly API gives back a JSON document which represents a “point-in-time” snapshot of the latest data we have for that user.

We are exploring different avenues for implementing a real-time interface, for example by offering an XMPP API ourselves, or by offering a PubSubHubbub hub. If the real-time use case is particularly important to you, If you are interested in this option and would like to sponsor our development of a real-time content recommendations API, please contact us!