What is Lucene?

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.Lucene can plain text, integers, index PDF, Office Documents. etc.,

How Lucene enables Faster Search?

Lucence creates something called Inverted Index. Normally we map document -> terms in the document. But, Lucene does the reverse. Creates a index term -> list of documents containing the term, which makes it faster to search.

Lucene uses different parser for different types of documents. For eg: a HTML parser does the same reprocessing such as filtering html tags and so on. The HTML parser outputs the text content and lucene analyzer extracts tokens and related info such as token frequency from text content. The lucene analyzer then writes tokens and related information into index files of lucene.

Core Lucene Classes

Directory, FSDirectory, RAMDirectory

Directory containing IndexFile system based index dirMemory based index dir DirectoryindexDirectory = FSDirectory.open(new File(‘c://lucene//nodes’));
IndexWriter Handling writing to index – addDocument, updateDocument, deleteDocuments, merge etc IndexWriter writer = new IndexWriter(indexDirectory,new StandardAnalyzer(Version.LUCENE_30),new MaxFieldLength(1010101));
IndexSearcher Search using indexReader – search(query, int) IndexSearcher searcher = new IndexSearcher(indexDirectory);
Document DTO used to index and search Document document = new Document();
Field Each document contains multiple fields. Has 2 part, name, value. new Field(‘id’, ’1′, Store.YES, Index.NOT_ANALYZED)
Term A word from test. Used in search.2 parts.Field to search and value to search Term term = new Term(‘id’, ’1′);
Query Base of all types of queries – TermQuery, BooleanQuery, PrefixQuery, RangeQuery, WildcardQuery, PhraseQuery etc. Query query = new TermQuery(term);
Analyzer Builds tokens from text, and helps in building index terms from text new StandardAnalyzer()

The Lucene Directory

Directory – is the data space on which lucene operates. It can be a File System or a Memory.

Below are the often used Directory

Directory Description Example
FSDirectory File System based Directory Directory = FSDirectory.open(File file);
// File -> Directory path
RAMDirectory Memory based Lucene directory Directory = new MemoryDirectory()Directory = new MemoryDirectory(Directory dir) // load File based Directory to memory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s