Implementing the language model
Lucene implemented two language models, LMDirichletSimilarity and LMJelinekMercerSimilarity, based on different distribution smoothing methods. Smoothing is a technique that adds a constant weight so that the zero query term frequency on partially matched documents does not result in a zero score where it's useless in ranking. We will look at these two implementations and see how their weight distributions affect scoring.
How to do it…
We will take a look at LMDirichletSimilarity first and we will reuse our test case from the previous section, but will revert the extended second sentence input:
StandardAnalyzer analyzer = new StandardAnalyzer(); Directory directory = new RAMDirectory(); IndexWriterConfig config = new ...
Get Lucene 4 Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.