Frontpage API Documentation

Request

To use the Frontpage API, perform a HTTP GET request on the following endpoint:

http://www.diffbot.com/api/frontpage?token=...&url=...

Provide the following arguments:

Parameter	Description
`token`	Developer token
`url`	Frontpage URL from which to extract items (URL encoded)
Optional parameters
`timeout`	Specify a value in milliseconds (e.g., `&timeout=15000`) to override the default API timeout of 5000ms.
`format`	Format the response output in `xml` (default) or `json`
`all`	Returns all content from page, including navigation and similar links that the Diffbot visual processing engine considers less important / non-core.
Basic authentication
To access pages that require a login/password (using basic access authentication), include the username and password in your `url` parameter, e.g.: `url=http%3A%2F%2FUSERNAME:PASSWORD@www.diffbot.com`

Alternatively, you can POST the content to analyze directly to the same endpoint. Specify the Content-Type header as either text/plain or text/html.

Response

DML (Diffbot Markup Language) is an XML format for encoding the extracted structural information from the page. A DML consists of a single info section and a list of items.

Info field	Type	Description
`id`	long	DMLID of the URL
`title`	string	Extracted title of the page
`sourceURL`	url	the URL this was extracted from
`icon`	url	A link to a small icon/favicon representing the page
`numItems`	int	The number of items in this DML document

Some of the fields found in Items

Item field	Type	Description
`id`	long	Unique hashcode/id of item
`title`	string	Title of item
`description`	string	innerHTML content of item
`xroot`	xpath	XPATH of where item was found on the page
`pubDate`	timestamp	Timestamp when item was detected on page
`link`	URL	Extracted permalink (if applicable) of item
`type`	{IMAGE,LINK,STORY,CHUNK}	Extracted type of the item, whether the item represents an image, permalink, story (image+summary), or html chunk.
`img`	URL	Extracted image from item
`textSummary`	string	A plain-text summary of the item
`sp`	double<-[0,1]	Spam score - the probability that the item is spam/ad
`sr`	double<-[1,5]	Static rank - the quality score of the item on a 1 to 5 scale
`fresh`	double<-[0,1]	Fresh score - the percentage of the item that has changed compared to the previous crawl

Documentation

Analyze API

Article API

Discussion API

Image API

Product API

Video API (BETA)

Custom APIs

Bulk Processing

Crawlbot

Search API

Account API

Error Codes

Libraries

Changelog

Frontpage API

Request

Response