What are the most common mistakes publishers make with their article metadata?
The most common mistakes publishers make involve the canonical URL. An article’s canonical URL is denoted by the “url” value in JSON-LD or “parsely-link” in repeated metatags. The reason the canonical URL is so important is that Parse.ly uses it to group traffic from different versions of the same article in one place. For example, you might have the URL example.com/article on your desktop site, while the mobile version is at m.example.com/article. That’s fine, as long as the url in the metadata is the same text string for both. Similarly, you might serve two different versions of an article: one with an “http” url, and one with “https.” Again, no problem; just make sure they both have the same canonical url in their metadata. We recommend using the desktop version of a URL, prefixed with “http://”, as your canonical.
Note that if you include a post-id value in your metadata, which is optional, it will take precedence over the canonical URL value, and we will group your articles by post-id instead. However, we discourage including post-id values whenever possible. Grouping articles by canonical URL is a simpler and more reliable implementation.
Invalid metadata as a result of small errors is another common problem. Our documentation outlines your metadata formatting options (we recommend JSON-LD). A single error in the metadata tag–a missing quotation mark, an unescaped special character, a field name in the wrong case–may prevent an article from registering its metadata properly, causing it to show incorrectly as an index or no-metas page in your dashboard. You should escape double-quotes within your metadata values.
You can only list one value for an article’s section, though you can list as many values as you want in the tags/keywords field (formatted as an array of strings). You can also list multiple author values, again as an array of strings. These values are case-sensitive, which is important to remember if you’re trying to pull data from our API.
Many publishers want to track subsections in the tags field. The best way to do that is to separate the section/subsections with a colon in a single tag; for example “sports:football” or “sports:basketball:wnba”.
An article’s publication date should be listed in UTC format, with no offset, in your metadata; for example: “pub_date”: “2013-08-15T13:00:00Z”. We’ll display it in the dashboard according to your local timezone, however.