Batch API

The Diffbot Batch API allows developers to submit up to 50 individual API calls in a single HTTP request. It is optimized for mobile- and mobile-like environments where latency in individual response time is problematic. Results are streamed and returned realtime in the order they are processed.

Request

To make a batch request, build a JSON object for each individual API operation and POST to http://www.diffbot.com/api/batch.

Provide the following arguments:

ArgumentDescription
tokenDeveloper token
batchJSON array of individual requests
Optional Arguments
timeoutAmount of time in milliseconds to wait for a return response. By default, the batch API will wait until all individual HTTP requests have returned.
Required individual HTTP request Parameters
methodGET or POST
relative_urlCorresponds to the appropriate API request after http://www.diffbot.com, and must include any required parameters of the individual API -- including your token and the individual url. Each relative URL needs to be fully URL encoded.

The following sample request makes two calls to the Diffbot Article API. Note that the full data object in our sample is URL encoded as default cURL headers include a "Content-Type" of application/x-www-form-urlencoded. The url content should always be URL encoded.

curl
    -d 'token=...'
    -d 'batch=[
            {"method": "GET", "relative_url": "/api/article?token=...%26url=http%3A%2F%2Fblogs.wsj.com%2Fventurecapital%2F2012%2F05%2F31%2Finvestors-back-diffbots-visual-learning-robot-for-web-content%2F%3Fmod%3Dgoogle_news_blog"},
            {"method": "GET", "relative_url": "/api/article?token=...%26url=http%3A%2F%2Fgigaom.com%2Fcloud%2Fsilicon-valley-royalty-pony-up-2m-to-scale-diffbots-visual-learning-robot"}
        ]'
    http://www.diffbot.com/api/batch

Response

The Batch API will stream its responses as soon as they are individually available from each Diffbot server. Note: responses will not necessarily be returned in the order submitted.

Each object in the array will contain the following components:

ParameterDescription
codeHTTP response code. In the event of a timeout, this will return 500.
methodMethod of the initial request, either GET or POST.
relative_urlFull relative URL, as submitted in the initial request.
bodyContent of the response. In a successful call this will be the full Diffbot output as per the individual API documentation.
headersJSON array of the HTTP headers.

Example Response

[
	{"headers":[
		{ "name":null,
		  "value":"HTTP\/1.1 200 OK"},
		{ "name":"Transfer-Encoding",
		  "value":"chunked"},
		{ "name":"Vary",
		  "value":"Accept-Encoding"},
		{ "name":"Date",
		  "value":"Tue, 17 Jul 2012 22:23:57 GMT"},
		{ "name":"Content-Type",
		  "value":"application\/json;charset=UTF-8"},
		{ "name":"Server",
		  "value":"Apache-Coyote\/1.1"}
	],
        "method":"GET",
        "code":200,
        "relative_url":"\/api\/article?url=http:\/\/blogs.wsj.com\/venturecapital\/2012\/05\/31\/\/investors-back-diffbots-visual-learning-robot-for-web-content&token=..."},
	"body":"{
		\"icon\":\ "http:\\\/\\\/s.wsj.net\\\/favicon.ico\",
		\"author\":\ "Lizette Chapman\",
		\"text\":\ "Mike Tung wants to take the web apart and re-build it for a new audience: computers.\\nA graduate of Stanford University\\u2019s Artificial Intelligence program, Tung created Diffbot with the goal of creating a visual learning robot that extracts and analyzes web content the same way humans do.\\nThe idea, which has gathered $2 million in seed funding, is simple.\\nTung determined that all content on the web can be categorized into 18 or so different \\u201Cpage types\\u201D (like a home page, social networking profile, review etc.) that can be visually analyzed using layout and contextual cues.\\nSo, just like people, Diffbot looks at a webpage and instantly identifies the important objects on the page while stripping out other components like advertising banners and privacy policies that are not core to the person\\u2019s reason for visiting the page.\\nDiffbot makes its APIs available to developers\\u2013the Palo Alto, Calif.-based start-up is now processing 100 million API calls each month\\u2013who are using it for to transform the web into a usable database. They then use that data for applications they\\u2019re developing to mobilize websites, migrate content management systems, generate tags and aggregate articles, among other things.\\nTung, who heads the six-person team, said he waited as long as he could to raise the seed round, which closed this month.\\n\\u201CWe\\u2019re at a point now where our servers are going down,\\u201D he said, referring to the increasing demand by developers for the technology. \\u201CWe need to expand our offering.\\u201D\\nTung said that along with expanding the team with a few key hires, Diffbot will expand from the current two categories in news and home pages it now covers to the other 16 or so.\\nDiffbot investors include Stanford accelerator StartX, where Tung is still an entrepreneur in residence, as well as Matrix Partners. Individual investors include Sky Dayton, founder of EarthLink; Andy Bechtolsheim, co-founder of Sun Microsystems; Joi Ito, director of the MIT Media Lab, Brad Garlinghouse, CEO of YouSendIt; and executives from Facebook, Twitter and Yahoo.\\nDayton, who visits the start-up about once a week, has helped with hiring and in talking to customers. He pointed to Diffbot being used with within apps at AOL, OnSwype and others.\\n\\u201CI make customer intros, but it kind of sells itself,\\u201D said Mr. Dayton.\",
		\"title\": \"Investors Back Diffbot\\u2019s \\u2018Visual Learning Robot\\u2019 for Web Content\",
		\"url\": \"http:\\\/\\\/blogs.wsj.com\\\/venturecapital\\\/2012\\\/05\\\/31\\\/investors-back-diffbots-visual-learning-robot-for-web-content\",
		"code":200},
	{"headers":[
		{ "name":null,
		  "value":"HTTP\/1.1 200 OK"},
		{ "name":"Transfer-Encoding",
		  "value":"chunked"},
		{ "name":"Vary",
		  "value":"Accept-Encoding"},
		{ "name":"Date",
		  "value":"Tue, 17 Jul 2012 22:23:52 GMT"},
		{ "name":"Content-Type",
		  "value":"application\/json;charset=UTF-8"},
		{ "name":"Server",
		  "value":"Apache-Coyote\/1.1"}
	],
        "method":"GET",
        "code":200,
        "relative_url":"\/api\/article?url=http:\/\/gigaom.com\/cloud\/silicon-valley-royalty-pony-up-2m-to-scale-diffbots-visual-learning-robot&token=..."},
	"body":"{
		\"icon\": \"http:\\\/\\\/s1.wp.com\\\/wp-content\\\/themes\\\/vip\\\/gigaom\\\/img\\\/apple-touch-icon.png?m=1294760499g\",
		\"author\": \"Barb Darrow\",
		\"text\": \"What do tech luminaries Andy Bechtolsheim, Sky Dayton, Joi Ito and Brad Garlinghouse have in common? They\\u2019re all backing Diffbot, the startup that\\u2019s building visual robot technology that parses web site content to make it easier to reuse.\\nDiffbot, the first company funded out of Stanford\\u2019s StartX accelerator program, makes its APIs available to users wanting to extract the components of web pages in a way that makes that content reusable and easier to mash up into apps, Diffbot founder and CEO Michael Tung told me this week. It\\u2019s identified 18 web page types and the API handles two of them \\u2014 front page and article \\u2014 to date and is building support for the others. GigaOM\\u2019s Ryan Kim covered the launch of Diffbot\\u2019s first APIs last fall.\\nUnlocking web content\\n\\u201CWe\\u2019ve got this great thing, the Internet, full of web pages, the problem is they\\u2019re made for human beings to read and understand, particularly people in front of a browser \\u2026 but that\\u2019s inaccessible to software applications, hundreds of thousands of apps like Siri, that only work with a handful of APIs that they\\u2019re hard-coded for,\\u201D Tung said.\\n\\u201CYelp is great for searching places, Flipboard is great for discovering news. Our main insight is the web can be broken down into 18 types of pages, news, people,places, photos, etc. and our goal is to teach a machine to understand all that,\\u201D Tung said. The company is working on more APIs to bring all that content into its reach.\\nAt a recent hackathon, one participant built a web reader for his blind father using Diffbot\\u2019s APIs. \\u201CFor a blind person, using the web is miserable. [Today's] screen readers read all the text starting at the top, including the nav bar and scroll down. Diffbot analyses that page, determines the title, author, text and can read it in a more natural way,\\u201D Tung said.\\nDiffbot can look at web pages created for human beings and analyze them visually so the app can treat the web as a big data base. It is now processing more 100 million API calls monthly for software developers using the service for Web site mobilization, tag generation and other functions.\\nA-list backers\\nBechtolsheim, the founder of Sun Microsystems; Sky Dayton, founder of Earthlink and Boingo; Joi Ito, director of the MIT Media Lab: Brad Garlinghouse, a former Yahoo exec and now CEO of YouSendIt (see disclosure) all invested in this $2 million seed round as did Jonathan Heiliger, the Facebook vet now at North Bridge Venture Capital Partners.\\nThe company is using a freemium model, encouraging developers and others to submit URLs to the system for content extraction. The service is free up to a certain number of API calls. \\u201DWe want to apply Diffbot to the entire web, but it\\u2019s expensive to build a web crawler; we only analyze the URLs that people send us,\\u201D Tung said.\\nJohn Davi, Diffbot\\u2019s VP of product and a Cisco veteran, said the submissions in themselves will be valuable. \\u201COur long-term vision is to avail ourselves of the cream of the content that comes out. We\\u2019ll be able to see the important pages \\u2014 the articles and recipes that people submit \\u2014 and we think there\\u2019s value in knowing that.\\u201D\\nDisclosure: YouSendIt is backed by Alloy Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media.\\nRelated research and analysis from GigaOM Pro:\\nSubscriber content. Sign up for a free trial.\",
		\"title\": \"Silicon Valley stars pony up $2M to scale Diffbot's visual learning robot\",
		\"date\": \"31, 2012, 6:00am\",
		\"url\": \"http:\\\/\\\/gigaom.com\\\/cloud\\\/silicon-valley-royalty-pony-up-2m-to-scale-diffbots-visual-learning-robot\\\/\",
		"code":200}
]