DMelt:Text/2 Searching in text using Lucene

From jWork.ORG
Jump to: navigation, search
Limitted access. Reguest membership or login to this link first if you are already a member
Contents

Searching_in_text_using_Lucene

DMelt include the Apache Lucene library ([lucene.apache.org]) that can be used for Java-based indexing and search technology, as well as spell checking, hit highlighting and advanced analysis/tokenization capabilities. DMelt includes the version 2.3.2. This is older, flagship, version version of lucene. The main advantage of this version is that it is simple, and there is robust C++ support via the project CLucente ([clucene.sourceforge.net]), a high-performance, scalable, C++ analog of Java Lucene. CLucene is a port of the very Java Lucene 2.3.2, and many parts of DMelt web page are powered by the C++ version of Lucene.

As usual in DMelt, one can recast all Java statements in Java scripting languages.

Let us consider a simple example. Let us assume we have a list with text, where each sentence is one entry to the list. Let us make a search egine in Python that create index file (in the memory), and then we will use it to search for a given word, printing the score of our results. The score close to 1 means largest likelihood for finding the given word.

In this example we will use org.apache.lucene.index.IndexWriter to write the index file in memory, and then we will apply org.apache.lucene.search.IndexSearcher to find a given word. Your simple search engine will look as:


More information on this topic is in DMelt books