如何在Lucene中查询自动完成/建议？（lucene查询原理）

25-02-21 12

在本文中，我们将带你了解如何在Lucene中查询自动完成/建议？在这篇文章中，我们将为您详细介绍如何在Lucene中查询自动完成/建议？的方方面面，并解答lucene查询原理常见的疑惑，同时我们还将给

在本文中，我们将带你了解如何在Lucene中查询自动完成/建议？在这篇文章中，我们将为您详细介绍如何在Lucene中查询自动完成/建议？的方方面面，并解答lucene查询原理常见的疑惑，同时我们还将给您一些技巧，以帮助您实现更有效的Algolia - 自动完成允许除查询建议之外的项目、angularjs – ngTagsInput – 只允许选择自动完成建议？、c# – 在Lucene中搜索TokenStream字段、c# – 如何在Lucene.NET中搜索Field.Index.NOT_ANALYZED字段？。

本文目录一览：
如何在Lucene中查询自动完成/建议？（lucene查询原理）
Algolia - 自动完成允许除查询建议之外的项目
angularjs – ngTagsInput – 只允许选择自动完成建议？
c# – 在Lucene中搜索TokenStream字段
c# – 如何在Lucene.NET中搜索Field.Index.NOT_ANALYZED字段？
如何在Lucene中查询自动完成/建议？（lucene查询原理）
我正在寻找一种在Lucene中执行查询自动完成/建议的方法。我已经在Google上搜索了一些，并玩了一些，但是我看到的所有示例似乎都在Solr中设置了过滤器。我们不使用Solr，也不打算在不久的将来使用Solr，而且Solr显然无论如何都只是围绕Lucene，所以我想一定有办法做到这一点！
我已经研究过使用EdgeNGramFilter，但我意识到我必须在索引字段上运行过滤器并取出令牌，然后将其与输入的Query进行比较…我只是在努力使两者之间建立连接这两个代码有点复杂，因此非常感谢帮助！
为了清楚我在寻找什么（我意识到我并不太清楚），我正在寻找一种解决方案，其中在搜索术语时会返回建议查询的列表。在搜索字段中输入“
inter”时，它将返回建议查询的列表，例如“ internet”，“ international”等。

答案1
小编典典
基于@Alexandre Victoor的回答，我在contrib程序包中（并使用其中包含的LuceneDictionary）编写了一个基于Lucene
Spellchecker的小类，它完全可以实现我想要的功能。
这允许从具有单个字段的单个源索引重新索引，并提供术语建议。结果将按照原始索引中带有该术语的匹配文档的数量进行排序，因此，较流行的术语会首先出现。似乎工作得很好：）
import java.io.IOException;import java.io.Reader;import java.util.ArrayList;import java.util.HashMap;import java.util.Iterator;import java.util.List;import java.util.Map;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.ISOLatin1AccentFilter;import org.apache.lucene.analysis.LowerCaseFilter;import org.apache.lucene.analysis.StopFilter;import org.apache.lucene.analysis.TokenStream;import org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter;import org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter.Side;import org.apache.lucene.analysis.standard.StandardFilter;import org.apache.lucene.analysis.standard.StandardTokenizer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.index.CorruptIndexException;import org.apache.lucene.index.IndexReader;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.index.Term;import org.apache.lucene.search.IndexSearcher;import org.apache.lucene.search.Query;import org.apache.lucene.search.ScoreDoc;import org.apache.lucene.search.Sort;import org.apache.lucene.search.TermQuery;import org.apache.lucene.search.TopDocs;import org.apache.lucene.search.spell.LuceneDictionary;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;/** * Search term auto-completer, works for single terms (so use on the last term * of the query). * <p> * Returns more popular terms first. * * @author Mat Mannion, M.Mannion@warwick.ac.uk */public final class Autocompleter { private static final String GRAMMED_WORDS_FIELD = "words"; private static final String SOURCE_WORD_FIELD = "sourceWord"; private static final String COUNT_FIELD = "count"; private static final String[] ENGLISH_STOP_WORDS = { "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "i", "if", "in", "into", "is", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with" }; private final Directory autoCompleteDirectory; private IndexReader autoCompleteReader; private IndexSearcher autoCompleteSearcher; public Autocompleter(String autoCompleteDir) throws IOException { this.autoCompleteDirectory = FSDirectory.getDirectory(autoCompleteDir, null); reOpenReader(); } public List<String> suggestTermsFor(String term) throws IOException { // get the top 5 terms for query Query query = new TermQuery(new Term(GRAMMED_WORDS_FIELD, term)); Sort sort = new Sort(COUNT_FIELD, true); TopDocs docs = autoCompleteSearcher.search(query, null, 5, sort); List<String> suggestions = new ArrayList<String>(); for (ScoreDoc doc : docs.scoreDocs) { suggestions.add(autoCompleteReader.document(doc.doc).get( SOURCE_WORD_FIELD)); } return suggestions; } @SuppressWarnings("unchecked") public void reIndex(Directory sourceDirectory, String fieldToAutocomplete) throws CorruptIndexException, IOException { // build a dictionary (from the spell package) IndexReader sourceReader = IndexReader.open(sourceDirectory); LuceneDictionary dict = new LuceneDictionary(sourceReader, fieldToAutocomplete); // code from // org.apache.lucene.search.spell.SpellChecker.indexDictionary( // Dictionary) IndexReader.unlock(autoCompleteDirectory); // use a custom analyzer so we can do EdgeNGramFiltering IndexWriter writer = new IndexWriter(autoCompleteDirectory, new Analyzer() { public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new StandardTokenizer(reader); result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new ISOLatin1AccentFilter(result); result = new StopFilter(result, ENGLISH_STOP_WORDS); result = new EdgeNGramTokenFilter( result, Side.FRONT,1, 20); return result; } }, true); writer.setMergeFactor(300); writer.setMaxBufferedDocs(150); // go through every word, storing the original word (incl. n-grams) // and the number of times it occurs Map<String, Integer> wordsMap = new HashMap<String, Integer>(); Iterator<String> iter = (Iterator<String>) dict.getWordsIterator(); while (iter.hasNext()) { String word = iter.next(); int len = word.length(); if (len < 3) { continue; // too short we bail but "too long" is fine... } if (wordsMap.containsKey(word)) { throw new IllegalStateException( "This should never happen in Lucene 2.3.2"); // wordsMap.put(word, wordsMap.get(word) + 1); } else { // use the number of documents this word appears in wordsMap.put(word, sourceReader.docFreq(new Term( fieldToAutocomplete, word))); } } for (String word : wordsMap.keySet()) { // ok index the word Document doc = new Document(); doc.add(new Field(SOURCE_WORD_FIELD, word, Field.Store.YES, Field.Index.UN_TOKENIZED)); // orig term doc.add(new Field(GRAMMED_WORDS_FIELD, word, Field.Store.YES, Field.Index.TOKENIZED)); // grammed doc.add(new Field(COUNT_FIELD, Integer.toString(wordsMap.get(word)), Field.Store.NO, Field.Index.UN_TOKENIZED)); // count writer.addDocument(doc); } sourceReader.close(); // close writer writer.optimize(); writer.close(); // re-open our reader reOpenReader(); } private void reOpenReader() throws CorruptIndexException, IOException { if (autoCompleteReader == null) { autoCompleteReader = IndexReader.open(autoCompleteDirectory); } else { autoCompleteReader.reopen(); } autoCompleteSearcher = new IndexSearcher(autoCompleteReader); } public static void main(String[] args) throws Exception { Autocompleter autocomplete = new Autocompleter("/index/autocomplete"); // run this to re-index from the current index, shouldn''t need to do // this very often // autocomplete.reIndex(FSDirectory.getDirectory("/index/live", null), // "content"); String term = "steve"; System.out.println(autocomplete.suggestTermsFor(term)); // prints [steve, steven, stevens, stevenson, stevenage] }}
Algolia - 自动完成允许除查询建议之外的项目

如何解决Algolia - 自动完成允许除查询建议之外的项目？

我们在同一页面上实现了 instaearc 和自动完成。这个工作很好，直到我们从 autosggest 中选择项目，但当我们在建议列表中输入不可用的单词时，搜索不是。这是必需的，因为我们的查询建议列表现在很小。我们怎么能做到这一点...

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

angularjs – ngTagsInput – 只允许选择自动完成建议？

如果你是从一个来源拉

[{text：’a’},{text：’ab’},{text：’abc’}]

并且用户输入’abcd’,你如何阻止用户创建’abcd’标签？

解决方法

只需将addFromAutocompleteOnly设置为true,只允许将来自自动完成弹出窗口的建议添加为标记.

<tags-input ng-model="tags" add-from-autocomplete-only="true"> <auto-complete source="loadTags($query)"></auto-complete> </tags-input>

Plunker

c# – 在Lucene中搜索TokenStream字段

我刚刚开始使用Lucene,我觉得我必须对它有一个基本的误解,但是从样本和文档中我无法弄清楚这个问题.

我似乎无法让Lucene返回用TokenStream初始化的字段的结果,而用字符串初始化的字段工作正常.我正在使用Lucene.NET 2.9.2 RC2.

[编辑]我也尝试使用最新的Java版本(3.0.3)并看到相同的行为,所以它不是端口的一些怪癖.

这是一个基本的例子：

Directory index = new RAMDirectory(); Document doc = new Document(); doc.Add(new Field("fieldName",new StandardTokenizer(new StringReader("Field Value Goes Here")))); IndexWriter iw = new IndexWriter(index,new StandardAnalyzer()); iw.AddDocument(doc); iw.Commit(); iw.Close(); Query q = new QueryParser("fieldName",new StandardAnalyzer()).Parse("value"); IndexSearcher searcher = new IndexSearcher(index,true); Console.WriteLine(searcher.Search(q).Length());

(我意识到这使用了不推荐使用2.9的API,但这仅仅是为了简洁…假装指定版本的参数在那里,我使用了一个新的搜索).

这不会返回任何结果.

但是,如果我替换添加字段的行

doc.Add(new Field("fieldName","Field Value Goes Here",Field.Store.NO,Field.Index.ANALYZED));

然后查询返回命中,正如我所料.如果我使用TextReader版本,它也有效.

两个字段都被索引和标记化,(我认为)是相同的标记器/分析器(我也尝试过其他字段),并且都没有存储,所以我的直觉是它们的行为应该相同.我错过了什么？

解决方法

我找到了答案是套管.

StandardAnalyzer创建的令牌流具有LowerCaseFilter,而直接创建StandardTokenizer不会应用此类过滤器.

c# – 如何在Lucene.NET中搜索Field.Index.NOT_ANALYZED字段？

我是Lucene.NET的新手.我正在添加字段

Field.Index.NOT_ANALYZED

在Lucene文档中.有一个默认字段在文档中添加为

Field.Index.ANALYZED

我在搜索默认字段时没有任何困难;但是当我搜索特定字段时,Lucene返回0文档.但是,如果我改变,

Field.Index.NOT_ANALYZED

至

Field.Index.ANALYZED

事情正常.我认为与Analyzer有关.任何人都可以指导我如何搜索Field.Index.NOT_ANALYZED字段吗？

以下是我创建查询解析器的方法：

QueryParser parser = new QueryParser( Version.LUCENE_30,"content",new StandardAnalyzer(Version.LUCENE_30));

解决方法

ANALYZED只表示该值在被索引之前通过Analyzer传递,而NOT_ANALYZED表示该值将按原样索引.后者意味着像“hello world”这样的值将被编入索引,就像字符串“hello world”一样.但是,QueryParser类的语法将空格解析为term-separator,创建两个术语“hello”和“world”.

如果您创建了一个var q = new TermQuery(new Term(field,“hello world”))而不是调用var q = queryParser.Parse(field,“hello world”),您将能够匹配该字段.

今天关于如何在Lucene中查询自动完成/建议？和lucene查询原理的介绍到此结束，谢谢您的阅读，有关Algolia - 自动完成允许除查询建议之外的项目、angularjs – ngTagsInput – 只允许选择自动完成建议？、c# – 在Lucene中搜索TokenStream字段、c# – 如何在Lucene.NET中搜索Field.Index.NOT_ANALYZED字段？等更多相关知识的信息可以在本站进行查询。

本文标签：

Lucene

自动完成查询

建议功能

lucene查询原理