For Immediate Release: Aug 25, 2011
Diffbot Enables Software Applications To Look At the Web With A Human Set Of Eyes
Production API Lets Developers Use Visual Learning Technology to Create New Applications Tying Web Content to Context, Structure, and Action
PALO ALTO, Calif.-- Diffbot Corp, developers of Diffbot visual content and layout recognition (VCLR™) technology, today released the production build of their new set of visual learning APIs. Diffbot, a learning robot that offers visual understanding of Web content, provides APIs enabling software developers to easily create applications that use computer vision algorithms to understand the layout and meaning of Web content, and look at the Web similar to the way people do. Developers use Diffbot to create rich user experiences around content simply by submitting a link to the Diffbot API.
“I first came up with the idea in my college dorm where I needed to be instantly notified when a new assignment was posted on class websites. I built Diffbot to stay ahead of my classmates and quickly realized the potential commercial applications.”
The API’s computer vision technology perceives context and visual layout similar to the way people do; understanding common page layouts (like headlines, bylines and articles), contextual keywords, and content changes buried deep within pages enables applications to follow websites, observe when changes occur, and display that content in a variety of media. Diffbot is the foundation of a new way to develop around and consume Web-based content.
“At its core, Diffbot is an enabling technology, targeted to developers that aggregate content or have their own need to offer personalization,” said Diffbot Co-Founder Mike Tung. “I first came up with the idea in my college dorm where I needed to be instantly notified when a new assignment was posted on class websites. I built Diffbot to stay ahead of my classmates and quickly realized the potential commercial applications.”
The Diffbot technology currently consists of two types of APIs:
1. On-Demand - The Diffbot On-Demand API is divided into page types:“Frontpage” and “Article.” The Frontpage API is designed for analyzing home pages and index pages using common layout markers (like headlines, bylines, images, articles, ads, and more), while the Article API is used to extract clean article text, pictures, and tags from news article webpages.
2. Follow - The Follow API is used to follow the changes or updates made to any webpage. Diffbot automatically determines the part of the page that the developer likely wants to follow, extracts metadata such as the title, image and text summary, scores, and segments the page into structurally meaningful sections.
Diffbot allows developers to build applications that can:
- Extract and analyze information displayed on an article page
- Understand key words and phrases in the context of the larger article and generate tags to allow developers to categorize, sort and personalize content
- Analyze homepages and index pages to understand when content has been changed
- Generate an RSS feed enabling an application to follow anything on the Internet
- Display or use the raw components of an article page in any manner
- Convert any webpage into a mobile format
- Create snapshots of sites derived from embedded links
One well-known application using the Diffbot API is Editions by AOL. The recently-launched news magazine for the iPad who’s tagline reads: “the magazine that reads you,” uses Diffbot to identify and extract relevant content tags from news sources on the Web, helping enable the personalization aspect of the app. Diffbot supplies natural language processing to cross-reference against Wikipedia, determine relevance by context and deliver keyword tags. For example, Diffbot can determine that an article about Barak Obama is related to “politics” even though the word doesn’t appear in the article, or that an article about a new computer is about Apple the technology company, and not apple the fruit.
“We are impressed with the Diffbot team and look forward to collaborating with them on future releases of Editions,” said Sol Lipman, Senior Director of AOL’s Mobile First Division. “The API is easy-to-use and understands webpage structure and content better than any other technology we’ve seen.”
About Diffbot:
Diffbot is a learning robot that offers visual understanding of Web content. Their VCLR technology provides developers automated, visual understanding of Web content that is as easy to implement into applications as submitting a URL. The company offers an innovative API that enables developers to easily create applications that apply computer vision algorithms for the purpose of extracting information and understanding the visual layouts of various webpages. Simply provide a link to a webpage and Diffbot analyzes the content. Diffbot lets any application look at Web content with a human set of eyes; rather than simply seeing text, links, and pictures, Diffbot interprets layout, contextual keywords, common sections, and changes to content in a way that lets developers easily break out that content, organize it and present it to users for direct action. With hundreds of developers currently using their API, Diffbot is the foundation of a new way to develop around and consume Web-based content.
Based in Palo Alto, Diffbot is part of the Stanford University’s accelerator StartX, a non-profit organization whose mission is to provide an entrepreneurial education to startups founded by Stanford students. StartX provides peer community, mentorship, real-time and customized educational content and infrastructure resources.
For more information visit: http://www.diffbot.com
Useful Links:
- Diffbot website: http://www.diffbot.com
- Editions website: http://www.editions.com
- StartX website: http://startx.stanford.edu