|
In 2005 the Department of Justice, Department of Homeland Security, Department of Defense, and the General Services Administration strongly requested a full-text indexing and search of content through XML standards conforming to the Global JXDM and the National Information Exchange Model, (NIEM). The evolution of search technologies, XML (eXtensible Markup Language) as an open standard, and the world wide web have created a combination of standards that enables a dynamic construction of web content through light-weight queries and results. The Department of Justice and Department of Homeland Security, requested a search tool by Bohica Associates Corporation (BAC) that leverages these standards and exposes a simple interface to support the index and search of a variety of content through industry standard tools. Description of BACEngine By utilizing a calling application the BACEngine constructs an XML string to encapsulate the query. The calling application will then either use a socket connection or web service to initiate a search session and pass a query to the engine. The BACEngine is a multi-threaded application that will support multiple simultaneous queries and will return the result in an XML stream that contains the indexed fields for the hits plus the normalized score associated with each hit. In the .NET framework, this XML stream can be bound to a dataset and then exposed in a control in the web page. A web service interface is available for SOAP protocols if desired and is commonly used in Apache web servers. A Boolean thesaurus is available to expand or limit query terms. For example, a thesaurus entry of: agriculture=farming or crop, will expand the term agriculture to be the Boolean union of agriculture, farming, and crop. This allows the search engine to tune the vocabulary of each search index to the application it supports. Indexing is performed through utilities that can provide incremental or full indexing of the underlying application data while the system is live. Often, the reindex is performed on a schedule and is automated through the windows at command or the UNIX cron command. BAC has timed the indexing of 200,000 SQL Server records on a Windows machine to be completed in a few seconds. BAC has also used this approach to drive voice interface solutions and services. Any data application that currently has a web interface may be exposed through a speech recognition engine and a toll-free telephone number to allow for transaction processing through a telephone interface. In this instance, the XML is translated to VoiceXML so that is commercial text-to-speech engines may perform speech synthesizing. These engines are capable of producing web GET and POST commands and allow the BAC engine to be the gateway from a speech engine to a data repository with a small amount of custom web page development to format the XML queries. Online catalogue and credit card processing modules have been developed and are available to be delivered as a quick start service package. The BACEngine has been deployed on CDROM and DVD mediums to enable the indexing, compression, and distribution of content in a stand-alone web application, which is customizable using the open source .NET framework from www.go-mono.com. BAC tests and distributes a version of mono on the required platforms with a platform specific launcher application for each platform requested by the client. This solution is tested for security and vulnerability using Nessus and other leading hacker tool suites.
|
|
|