Tech docs
Integration Docs
Metadata

Metadata

#What forms of metadata does Parse.ly accept? What should I include in my metadata that I can see in the Parse.ly dashboard?

We accept the three formats listed in our metadata documentation.

Please that there should be a value for each field (as seen on the example on that page). Complete metadata gives you the most value and data within the dashboard Parse.ly dashboard.

#What are the most common metadata mistakes?

#The Parse.ly Canonical URL

This is the URL that Parse.ly considers to be the source of truth for the metadata about a particular post or page specified as the “url” property in a  JSON-LD tag.

A single piece of content may have multiple URLs associated with it, but Parse.ly will only retrieve content from the one designated as the canonical URL. For example, you might have the URL example.com/article on your desktop site, while the mobile version is at m.example.com/article. That’s fine, as long as the URL in the metadata is the same text string for both. Similarly, you might serve two different versions of an article: one with an “http” URL, and one with “https.” Again, no problem; just make sure they both have the same canonical URL in their metadata.

The canonical URL allows us to aggregate data together across all URLs that share a common canonical URL. For more details about how this works, read about how the Parse.ly Crawler works.

One of the most common mistakes is omitting or incorrectly specifying the canonical URL. Any variations in the canonical URL, such as:

  • http vs https
  • urls with and without /,
  • different URLs for website vs AMP pages

will result in duplicate posts and skew your data in the Parse.ly dashboard.

Note that the criteria Parse.ly uses to identify the canonical URL differs from the common usage of the term, in that we rarely rely on the value of the <link rel="canonical"> tag. For information on how to properly set the Parse.ly canonical URL, please see our metadata documentation.

Note that if you include a post-id value in your metadata, which is optional, it will take precedence over the canonical URL value, and we will group your articles by post-id instead.  However, we discourage including post-id values whenever possible. Grouping articles by canonical URL is a simpler and more reliable implementation.

#Metadata

Invalid metadata as a result of small errors is another common problem. Our documentation outlines your metadata formatting options (we recommend JSON-LD). A single error in the metadata tag such as:

  • a missing quotation mark
  • an unescaped special character
  • a field name in the wrong case
  • a relative URL

may prevent an article from registering its metadata properly, causing it to show incorrectly as an index or no-metas page in your dashboard. You should escape double-quotes within your metadata values.

#Article Section Value

You can only list one value for an article’s section, though you can list up to 100 values in the tags/keywords field. This should be formatted as an array of strings. You can also list multiple author values, again as an array of strings. These values are case-sensitive, which is important to remember if you’re trying to pull data from our API. There are examples here for reference.

Many publishers want to track subsections in the tags field. The best way to do that is to separate the section/subsections with a colon in a single tag; for example “sports:football” or “sports:basketball:wnba”.

#Publication Date and UTC Time

An article’s publication date should be listed in UTC ISO 8601 format, with no offset and in UTC, in your metadata; for example: "pub_date": "2013-08-15T13:00:00Z". We’ll display those dates and times in your dashboard using your local timezone.

rocket emoji