For Immediate Release: Jul 31, 2013

Diffbot’s Revolutionary Product API Automatically Extracts Data from Any Product Web Page

First of Its Kind API Uses Computer Vision to Turn Any E-Commerce Site Into a Product Database

PALO ALTO-- Diffbot, inventors of computer vision technology that sees web pages like humans do, today announced the release of its Product API, which automatically identifies and extracts product data from any shopping web page.

Diffbot also announced updates to its Crawlbot spidering service, which can accurately determine which pages on a shopping site are product pages. Diffbot now offers a turnkey solution for retrieving the entire catalog from any e-commerce site -- without need of a published API or any action on the part of the retailer.

Developed over the course of two years, the Product API’s pioneering algorithm is built on Diffbot’s core vision technology which has accurately extracted structured data from billions of web pages. The API advances Diffbot’s machine learning, natural-language processing and computer vision systems to identify and structure information regardless of a site’s design, layout, markup or even its (human) language.

The Product API automatically makes available data such as price, discount/savings, shipping cost, product description, images, SKU and manufacturer's product number. The technology allows developers to immediately use product data from any e-commerce site in their web or mobile applications.

The Product API will enable developers to rapidly build applications that can:

  • track and compare prices from any site
  • augment user bookmark or clipping data with product pricing and other information
  • track merchandise availability across multiple storefronts
  • migrate entire shopping sites to new platforms without the need of back-end integration
  • deploy entire APIs on-the-fly for partner and other integrations

"E-commerce is one of the most popular activities on the web. With 28% of US internet users shopping on a daily basis, we figured we should teach our robot how to understand products," said Mike Tung, CEO of Diffbot.[1] "The Product API represents our latest advances in pushing the capabilities of automated page extraction. We are one step closer to the imminent goal of making the entire web machine-readable."

Last year, Diffbot conducted a study which found that 8% of links shared on Twitter are for product pages -- a total of more than eight million product links per day.[2] [3] [4] Just as with news articles, intelligent automation to help sift through the vast quantities of products offered and shared online is something needed by consumers and businesses alike.

The Product API joins Diffbot's previous computer vision APIs, including the Frontpage API (for extracting content from home pages), the Article API (for extracting news article and blog post content), the Image API, and its Page Classifier API, which automatically determines the type of page of any web link.

About Diffbot:

Diffbot is a robot that examines the Web using artificial intelligence, computer vision, machine learning and natural language processing, and provides developers with robust tools to find, extract and understand the objects from any Web page for use in their applications. Thousands of developers use Diffbot APIs to create consumer-friendly applications that use visual interpretation of the Web to re-imagine search, the mobile web and hundreds of other consumer applications. Customers include AOL, Betaworks (Digg/Instapaper), CBS Interactive, Salesforce and StumbleUpon. It is based in Palo Alto, CA.

To learn more visit www.diffbot.com

[1] http://pewinternet.org/Trend-Data-(Adults)/Online-Activities-Daily.aspx
[2] http://www.diffbot.com/products/automatic/classifier/#t-infographic
[3] https://blog.twitter.com/2013/celebrating-twitter7
[4] http://techcrunch.com/2010/09/14/twitter-seeing-90-million-tweets-per-day/