
Zanran is a search engine for finding graphs, tables and charts on the Internet. Put more simply: Zanran is Google for data.
How it works… technology overview
Unlike existing search engines, Zanran doesn't work by spotting wording in the text and looking for images - it's the other way round. The system examines millions of images and decides for each one whether it's a graph, chart or table - whether it has numerical content.
The core technology is made up of patented computer vision algorithms that decide whether an image is numerical. They are accurate (about 98%).
Programmes then take suitable text near that image/table and build the search engine around that text. At present, Zanran extract tables and images from HTML, PDF and Excel files.
Zanran's indexing of the numerical content on the web is possible thanks to the access of vast amount of storage and processing power available in cloud computing.
The Founders
Zanran is based in London in the United Kingdom. The two founders are Dr Jonathan Goldhill and Dr Yves Dassas. Jon Goldhill has a PhD in chemistry and an MBA from London Business School while Yves Dassas studied engineering in Belgium and has a PhD in Electrochemistry from Columbia University in New-York. They have worked together for many years and sold their main telecom business in 2003-2004.
Data Search on the Internet
On the Internet, numerical information can be found in:
* unstructured text (e.g. ‘..there were 2,914 reported offences..’)
* structured tables - Excel, and tables in HTML
* images - jpg’s of graphs etc., and vector graphic images in PDF’s
* dynamically generated tables or graphs - from databases
Zanran has concentrated on finding images - the millions of graphs, charts and tables predominantly in PDF’s. Zanran has included structured tables - to broaden the range of results.
Why the emphasis on images? Partly because they make up huge volume of otherwise neglected data. And partly because images tend to convey a wider and a deeper story: ‘a picture tells a thousand words’.
Other companies in Data Search:
* WolframAlpha provides search on curated data with a strong emphasis on mathematical and scientific information.
* Google Public Data provides a graphical interface to structured data from 27 (as at April 2011) public sites
* Infochimps is a repository of user-submitted data.
Zanran's vision of the data internet is of a diverse, fast growing, unstructured and uncurated sea of images - with rich information content - and with islands of numerical tables and databases set in it.
< Prev   Next >