|
Datafari is an Open source enterprise search software using Apache Solr (the projects from the Apache Software Foundation). It is a packaged search engine in a sense that it proposes the connection to data sources, the indexing, the search and the graphical administration of the system. The software is distributed (architecture of type Distributed computing) using SolrCloud. Datafari uses Apache Solr for the index and search phases. It combines Apache ManifoldCF, Apache Solr and Apache Cassandra. based on HTML5, CSS3 and jQuery. Background Datafari was created by France Labs. France Labs looked for an open source search software to enhance its R&D with a new intranet relevancy algorithm. The team found that there was nothing being well maintained and available under an Apache License and created Datafari. It became independent from the research on the algorithm, considering that it had search value by itself. The first open source version was released on March 4, 2015 with version number 1.0. In May 2015, thanks to Datafari, France Labs won the Big Data trophy at the IT Innovation Forum. Version 2.0 (this major version change was due to the technical migration from Solr 4 to Solr 5) has been released on Sept. 7th 2015. The company France Labs was created end of 2011 by Cédric Ulmer, Olivier Tavard and Aurélien Mazoyer ; its headquarters are located in Nice, France. Functionnalities The main functionalities of Datafari 2.1 are : For users of the search engine * Textual search including boolean operators ; * A crawler based on Apache ManifoldCF that allows for the indexing of CMS (Alfresco, Sharepoint, ...), websites, fileshares (Netapp, Samba, Windows), emails, databases, HDFS. Check the website of ManifoldCF for a more complete list. ; * « Full text » analysis and a plugin system to add transformation filters at the indexing and search phases ; * Multilingual management and automatic recognition of more than 20 languages ; * REST API for configuring and searching thanks to Apache Solr and Apache ManifoldCF ; * Fully configurable search relevancy algorithm ; * Graphical interface in HTML5 and javascript that uses HTML widgets, in responsive design ; * Use of Apache Tika to analyse and extract content and metadata of several document types (MSOffice, OpenOffice, HTML, XML, PDF, RTF, TXT, ZIP, EXIF, MP3...) ; * Likes and favorites (v2.1) to like results and store the results to check them later ; * Email alert system to receive notifications of new results in push mode (receiving information) rather than pull mode (actively querying) ; For administrators of the search engine * Graphical analytics tool of users search queries ; * Administration tool of the Solr used in Datafari ; * Tool to analyse the performances and the relevancy computation of the queries ; * Administration tool for security with connexion to AD or LDAP ; * Tool to manage synonyms ; * Tool to manage promolinks, allowing for data that is not in the index to be displayed for identified keywords ; * Tool to manage crawling connectors, with several off-the-shelf data sources (Sharepoint, fileshares, emails, websites web, CMIS...) and the capacity to create new ones ; Development * The source code of Datafari is available on Github; * The technical documentation of Datafari is available on Confluence; Partners * Datafari is part of the Charm Partner Program of Ubuntu Notes and references
|
|
|