Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.Lucene can plain text, integers, index PDF, Office Documents. etc.,
How Lucene enables Faster Search?
Lucence creates something called Inverted Index. Normally we map document -> terms in the document. But, Lucene does the reverse. Creates a index term -> list of documents containing the term, which makes it faster to search.
Lucene uses different parser for different types of documents. For eg: a HTML parser does the same reprocessing such as filtering html tags and so on. The HTML parser outputs the text content and lucene analyzer extracts tokens and related info such as token frequency from text content. The lucene analyzer then writes tokens and related information into index files of lucene.
Core Lucene Classes
Directory, FSDirectory, RAMDirectory
|Directory containing IndexFile system based index dirMemory based index dir||DirectoryindexDirectory = FSDirectory.open(new File(‘c://lucene//nodes’));|
|IndexWriter||Handling writing to index – addDocument, updateDocument, deleteDocuments, merge etc||IndexWriter writer = new IndexWriter(indexDirectory,new StandardAnalyzer(Version.LUCENE_30),new MaxFieldLength(1010101));|
|IndexSearcher||Search using indexReader – search(query, int)||IndexSearcher searcher = new IndexSearcher(indexDirectory);|
|Document||DTO used to index and search||Document document = new Document();|
|Field||Each document contains multiple fields. Has 2 part, name, value.||new Field(‘id’, ’1′, Store.YES, Index.NOT_ANALYZED)|
|Term||A word from test. Used in search.2 parts.Field to search and value to search||Term term = new Term(‘id’, ’1′);|
|Query||Base of all types of queries – TermQuery, BooleanQuery, PrefixQuery, RangeQuery, WildcardQuery, PhraseQuery etc.||Query query = new TermQuery(term);|
|Analyzer||Builds tokens from text, and helps in building index terms from text||new StandardAnalyzer()|
The Lucene Directory
Directory – is the data space on which lucene operates. It can be a File System or a Memory.
Below are the often used Directory
|FSDirectory||File System based Directory||Directory = FSDirectory.open(File file);
// File -> Directory path
|RAMDirectory||Memory based Lucene directory||Directory = new MemoryDirectory()Directory = new MemoryDirectory(Directory dir) // load File based Directory to memory|