GVKun编程网logo

elasticsearch模糊匹配max_expansions和min_similarity(elasticsearch 模糊匹配)

13

在本文中,我们将带你了解elasticsearch模糊匹配max_expansions和min_similarity在这篇文章中,我们将为您详细介绍elasticsearch模糊匹配max_expan

在本文中,我们将带你了解elasticsearch模糊匹配max_expansions和min_similarity在这篇文章中,我们将为您详细介绍elasticsearch模糊匹配max_expansions和min_similarity的方方面面,并解答elasticsearch 模糊匹配常见的疑惑,同时我们还将给您一些技巧,以帮助您实现更有效的Building a Recipe Search Site with Angular and Elasticsearch、Elastic search模糊匹配,精确匹配显示在前、Elasticsearch document_missing_exception、elasticsearch 入库错误 mapper_parsing_exception

本文目录一览:

elasticsearch模糊匹配max_expansions和min_similarity(elasticsearch 模糊匹配)

elasticsearch模糊匹配max_expansions和min_similarity(elasticsearch 模糊匹配)

如何解决elasticsearch模糊匹配max_expansions和min_similarity?

min_similarity是零和一之间的值。从Lucene文档中:

For example, for a minimumSimilarity of 0.5 a term of the same length 
as the query term is considered similar to the query term if the edit 
distance between both terms is less than length(term)*0.5

所谓的“编辑距离”是Levenshtein距离。

该查询在内部工作的方式是:

  • 当考虑min_similarity到时,它将查找索引中存在的所有可能与搜索词匹配的词
  • 然后搜索所有这些术语。

您可以想象此查询可能有多繁重!

为了解决这个问题,您可以设置max_expansions参数以指定应考虑的最大匹配词数。

解决方法

我在项目中使用模糊匹配,主要是查找拼写错误和具有相同名称的不同拼写。我需要完全了解elasticsearch的模糊匹配如何工作以及它如何使用标题中提到的2个参数。

据我了解, min_similarity 是查询的字符串与数据库中的字符串匹配的百分比。我找不到有关此值如何计算的确切描述。

据我了解, max_expansions
是应该执行搜索的Levenshtein距离。如果这实际上是Levenshtein距离,对我来说将是理想的解决方案。无论如何,这是行不通的,例如我有“
Samvel”一词

queryStr      max_expansions         matches?
samvel        0                      Should not be 0. error (but levenshtein distance   can be 0!)
samvel        1                      Yes
samvvel       1                      Yes
samvvell      1                      Yes (but it shouldn''t have)
samvelll      1                      Yes (but it shouldn''t have)
saamvelll     1                      No (but for some weird reason it matches with Samvelian)
saamvelll     anything bigger than 1 No

该文档说了我实际上不理解的内容:

Add max_expansions to the fuzzy query allowing to control the maximum number 
of terms to match. Default to unbounded (or bounded by the max clause count in 
boolean query).

因此,请任何人向我解释这些参数究竟如何影响搜索结果。

Building a Recipe Search Site with Angular and Elasticsearch

https://www.sitepoint.com/building-recipe-search-site-angular-elasticsearch/


ByAdam BardApril 15,2014


Have you ever wanted to build a search feature into an application? In the old days,you might have found yourself wrangling withSolr,or building your own search service on top ofLucene— if you were lucky. But,since 2010,there’s been an easier way:Elasticsearch.

Elasticsearch is an open-source storage engine built on Lucene. It’s more than a search engine; it’s a true document store,albeit one emphasizing search performance over consistency or durability. This means that,for many applications,you can use Elasticsearch as your entire backend. Applications such as…

Building a Recipe Search Engine

In this article,you’ll learn how to use Elasticsearch withAngularJSto create a search engine for recipes,just like the one atOpenRecipeSearch.com. Why recipes?

  1. OpenRecipesexists,which makes our job a lot easier.
  2. Why not?

OpenRecipes is an open-source project that scrapes a bunch of recipe sites for recipes,then provides them for download in a handy JSON format. That’s great for us,because Elasticsearch uses JSON too. However,we have to get Elasticsearch up and running before we can Feed it all those recipes.

Download Elasticsearchand unzip it into whatever directory you like. Next,open a terminal,cdto the directory you just unzipped,and runbin/elasticsearch(bin/elasticsearch.baton Windows). Ta-da! You’ve just started your very own elasticsearch instance. Leave that running while you follow along.

One of the great features of Elasticsearch is its out-of-the-Box RESTful backend,which makes it easy to interact with from many environments. We’ll be using theJavaScript driver,but you Could usewhichever one you like; the code is going to look very similar either way. If you like,you can refer to thishandy reference(disclaimer: written by me).

Now,you’ll need a copy of theOpenRecipes database. It’s just a big file full of JSON documents,so it’s straightfoward to write a quick Node.js script to get them in there. You’ll need to get the JavaScript Elasticsearch library for this,so runnpm install elasticsearch. Then,create a file namedload_recipes.js,and add the following code.

var fs = require('fs');
var es 'elasticsearch'var client = new es.Client({
  host: 'localhost:9200'
};

fs.readFile'recipeitems-latest.json', {encoding'utf-8'function(err) {
  if { throw err; }

  // Build up a giant bulk request for elasticsearch.
  bulk_request = datasplit'\n'reduce((bulk_request{
    var obj;

    try {
      obj = JSONparse(line;
    } catch(e{
      consolelog'Done reading';
      return bulk_request}

    // Rework the data slightly
    recipe = {
      id: obj._id.$oid// Was originally a mongodb entry
      name.name.source.url.recipeYield.ingredients.prepTime.cookTime.datePublished.description
    ;

    bulk_requestpush{index: {_index'recipes''recipe': recipe.id;
    bulk_request(recipe;
    ;
  [];

  // A little voodoo to simulate synchronous insert
  var busy = false;
  var callback { console}

    busy // Recursively whittle away at bulk_request,1000 at a time.
  var perhaps_insert (!busy{
      busy true;
      clientbulk{
        body: bulk_requestslice(01000)
      ;
      bulk_request = bulk_request;
      console.length}

    .length > {
      setTimeout(perhaps_insert10else 'Inserted all records.'}
  ;

  perhaps_insert;
;

Next,run the script using the commandnode load_recipes.js. 160,000 records later,we have a full database of recipes ready to go. Check it out withcurlif you have it handy:

$ curl -XPOST http://localhost:9200/recipes/recipe/_search -d '{"query": {"match": {"_all": "cake"}}}'
ottom:32px; padding-top:0px; padding-bottom:0px; direction:ltr; font-family:Roboto,you might be OK usingcurlto search for recipes,but if the world is going to love your recipe search,you’ll need to…

Build a Recipe Search UI

This is where Angular comes in. I chose Angular for two reasons: because I wanted to,and because Elasticsearch’s JavaScript library comes with an experimental Angular adapter. I’ll leave the design as an exercise to the reader,but I’ll show you the important parts of the HTML.

Get your hands on Angular and Elasticsearch Now. I recommendBower,but you can just download them too. Open yourindex.htmlfile and insert them wherever you usually put your JavaScript (I prefer just before the closingbodytag myself,but that’s a whole other argument):

<script src="path/to/angular/angular.js"></script>
"path/to/elasticsearch/elasticsearch.angular.js>
ottom:32px; padding-top:0px; padding-bottom:0px; direction:ltr; font-family:Roboto,let’s stop to think about how our app is going to work:

  1. The user enters a query.
  2. We send the query as a search to Elasticsearch.
  3. We retrieve the results.
  4. We render the results for the user.

The following code sample shows the key HTML for our search engine,with Angular directives in place. If you’ve never used Angular,that’s OK. You only need to kNow a few things to grok this example:

  1. HTML attributes starting withngare Angular directives.
  2. The dynamic parts of your Angular applications are enclosed with anng-appand anng-controller.ng-appandng-controllerdon’t need to be on the same element,but they can be.
  3. All other references to variables in the HTML refer to properties on the$scopeobject that we’ll meet in the JavaScript.
  4. The parts enclosed in{{}}are template variables,like in Django/Jinja2/Liquid/Mustache templates.
<div ng-app"myOpenRecipes" ng-controller"recipeCtrl>

  <!-- The search Box puts the term into $scope.searchTerm and calls $scope.search() on submit -->
  <section class"searchField>
    <form ng-submit"search()>
      <input type"textng-model"searchTerm"submitvalue"Search for recipes</form>
  </section<!-- In results,we show a message if there are no results,and a list of results otherwise. -->
  "results"no-recipesng-hide"recipes.length>No results</div>

    <!-- We show one of these elements for each recipe in $scope.recipes. The ng-cloak directive prevents our templates from showing on load. -->
    <article "recipeng-repeat"recipe in recipesng-cloak<h2>
        <a ng-href"{{recipe.url}}>{{recipe.name}}</a</h2<ul<li "ingredient in recipe.ingredients>{{ ingredient }}</li</ul>

      <p>
        {{recipe.description}}
        >... more at {{recipe.source}}</p</article<!-- We put a link that calls $scope.loadMore to load more recipes and append them to the results.-->
    "load-more"allResultsng-click"loadMore()>More...myOpenRecipes(via theng-appattribute).

/** * Create the module. Set it up to use html5 mode. */
window.MyOpenRecipes = angularmodule'myOpenRecipes'['$locationProvider'($locationProvider{
    $locationProviderhtml5Mode(]
 For those new to Angular,the['$locationProvider',function($locationProvider) {...}]business is our way of telling Angular that we’d like it to pass$locationProviderto our handler function so we can use it. This system of dependency injection removes the need for us to rely on global variables (except the globalangularand theMyOpenRecipeswe just created).

ottom:32px; padding-top:0px; padding-bottom:0px; direction:ltr; font-family:Roboto,we’ll write the controller,namedrecipeCtrl. We need to make sure to initialize therecipes,240)">allResults,andsearchTermvariables used in the template,as well as providingsearch()andloadMore()as actions.

/** * Create a controller to interact with the UI. */
MyOpenRecipescontroller'recipeCtrl''recipeService''$scope''$location'(recipescope{
  // Provide some nice initial choices
  var initChoices [
      "rendang""nasi goreng""pad thai""pizza""lasagne""ice cream""schnitzel""hummous"
  var idx = Mathfloor(Mathrandom) * initChoices// Initialize the scope defaults.
  $scope.recipes ;        // An array of recipe results to display
  $scope.page = ;            // A counter to keep track of our current page
  $scope.allResults ;  // Whether or not all results have been found.

  // And,a random search term to start if none was present on page load.
  $scope.searchTerm = $locationsearch.q || initChoices[idx/** * A fresh search. Reset the scope variables to their defaults,set * the q query parameter,and load more results. */
  $scope.search {
    $scope;
    $scope;
    $location{'q': $scope.searchTermloadMore/** * Load the next page of results,incrementing the page counter. * When query is finished,push results onto $scope.recipes and decide * whether all results have been returned (i.e. were 10 results returned?) */
  $scope.loadMore {
    recipes($scope.page++then(results{
      !== {
        $scope;
      }

      var ii ;

      for ; ii < results; ii.recipes[ii}
    // Load results on first run
  $scope You should recognize everything on the$scopeobject from the HTML. Notice that our actual search query relies on a mysterIoUs object calledrecipeService. A service is Angular’s way of providing reusable utilities for doing things like talking to outside resources. Unfortunately,Angular doesn’t providerecipeService,so we’ll have to write it ourselves. Here’s what it looks like:

MyOpenRecipesfactory'$q''esFactory'($qelasticsearch{
    host: $locationhost+ ':9200'
  /** * Given a term and an offset,load another round of 10 recipes. * * Returns a promise. */
  var search (termvar deferred = $qdefervar query {
      match{
        _all: term
      ;

    client{
      index{
        size: from(offset || * : query
      (result;

      hits_in .hits || {for< hits_in{
        hits_out(hits_in._source}

      deferredresolve(hits_out.rejectreturn deferred.promise// Since this is a factory method,we return an object representing the actual service.
  return {
    search: search
   Our service is quite barebones. It exposes a single method,240)">search(),that allows us to send a query to Elasticsearch’s,searching across all fields for the given term. You can see that in thequerypassed in the body of the call tosearch:{"match": {"_all": term}}._allis a special keyword that lets us search all fields. If instead,our query was{"match": {"title": term}},we would only see recipes that contained the search term in the title.

The results come back in order of decreasing “score”,which is Elasticsearch’s guess at the document’s relevance based on keyword frequency and placement. For a more complicated search,we Could tune the relative weights of the score (i.e. a hit in the title is worth more than in the description),but the default seems to do pretty well without it.

You’ll also notice that the search accepts anoffsetargument. Since the results are ordered,we can use this to fetch more results if requested by telling Elasticsearch to skip the firstnresults.

Some Notes on Deployment

Deployment is a bit beyond the scope of this article,but if you want to take your recipe search live,you need to be careful. Elasticsearch has no concept of users or permissions. If you want to prevent just anyone from adding or deleting recipes,you’ll need to find some way to prevent access to those REST endpoints on your Elasticsearch instance. For example,OpenRecipeSearch.comuses Nginx as a proxy in front of Elasticsearch to prevent outside access to all endpoints butrecipes/recipe/_search.

Congratulations,You’ve Made a Recipe Search

ottom:32px; padding-top:0px; padding-bottom:0px; direction:ltr; font-family:Roboto,if you openindex.htmlin a browser,you should see an unstyled list of recipes,since our controller fetches some randomly for you on page load. If you enter a new search,you’ll get 10 results relating to whatever you searched for,and if you click “More…” at the bottom of the page,some more recipes should appear (if there are indeed more recipes to fetch).

That’s all there is to it! You can find all the necessary files to run this project onGitHub.

Elastic search模糊匹配,精确匹配显示在前

Elastic search模糊匹配,精确匹配显示在前

如何解决Elastic search模糊匹配,精确匹配显示在前?

我最终没有使用模糊匹配来解决我的问题,而是使用了ngram。

/**
 * Map - Create a new index with property mapping
 */
public function map()
{
    $params[''index''] = self::INDEX;

    $params[''body''][''settings''] = array(
        ''index'' => array(
            ''analysis'' => array(
                ''analyzer'' => array(
                    ''product_analyzer'' => array(
                        ''type''      => ''custom'',
                        ''tokenizer'' => ''whitespace'',
                        ''filter''    => array(''lowercase'', ''product_ngram''),
                    ),
                ),
                ''filter'' =>  array(
                    ''product_ngram'' => array(
                        ''type'' => ''nGram'',
                        ''min_gram'' => 3,
                        ''max_gram'' => 5,
                    ),
                )
            ),

        )
    );

    //all the beans
    $mapping = array(
        ''_source''    => array(
            ''enabled'' => true
        ),
        ''properties'' => array(
            ''id''          => array(
                ''type'' => ''string'',
            ),
            ''name''        => array(
                ''type''     => ''string'',
                ''analyzer'' => ''product_analyzer'',
                ''boost''    => ''10'',
            ),
            ''brand''       => array(
                ''type'' => ''string'',
                ''analyzer'' => ''product_analyzer'',
                ''boost''    => ''5'',
            ),
            ''description'' => array(
                ''type'' => ''string'',
            ),
            ''barcodes''    => array(
                ''type'' => ''string''
            ),
        ),
    );

    $params[''body''][''mappings''][self::TYPE] = $mapping;

    $this->_client->indices()->create($params);
}


public function search($query)
{
    $return = $this->_client->search(
        array(
            ''index'' => self::INDEX,
            ''type''  => self::TYPE,
            ''body''  => array(
                ''query'' => array(
                    ''multi_match'' => array(
                        ''query''  => $query,
                        ''fields'' => array(''id'', ''name'', ''brand'', ''description'', ''barcodes''),
                    ),
                ),
                ''size'' => ''5000'',
            ),
        )
    );

    $productIds = array();

    if (!empty($return[''hits''][''hits''])) {
        foreach ($return[''hits''][''hits''] as $hit) {
            $productIds[] = $hit[''_id''];
        }
    }

    return $productIds;
}

结果正是我想要的。它根据搜索查询中包含的ngram部分构造匹配项。

解决方法

我想在查询上使用模糊匹配,但在结果顶部显示完全匹配。

我已经尝试了以下方法。

$return = $this->_client->search(
            array(
                ''index'' => self::INDEX,''type''  => self::TYPE,''body''  => array(
                    ''query'' => array(
                        ''bool'' => array(
                            ''must'' => array(
                                ''multi_match'' => array(
                                    ''query''     => $query,''fields''    => array(''name'',''brand'',''description''),''boost''     => 10,),''fuzzy_like_this'' => array(
                                    ''like_text'' => $query,''fuzziness'' => 1,''size'' => ''5000'',)
        );

由于格式错误的查询错误,此方法不起作用。

有任何想法吗?

Elasticsearch document_missing_exception

Elasticsearch document_missing_exception

是的,您是正确的,document_missing_exception仅是由于ES中不存在所请求的文档,您可以轻松查看ES源代码以发现该问题,并且仅在被调用的地方来自UpdateRequest和this方法注释对此进行了更好的解释:

通过ES代码

  /**
     * Sets the index request to be used if the document does not exists. Otherwise,a
     * {@link org.elasticsearch.index.engine.DocumentMissingException} is thrown.
     */
    public UpdateRequest upsert(IndexRequest upsertRequest) {
        this.upsertRequest = upsertRequest;
        return this;
    }

elasticsearch 入库错误 mapper_parsing_exception

elasticsearch 入库错误 mapper_parsing_exception

    最近在使用 java api 进行 ES 入库操作时,报如下错误:

{"took":150,"errors":true,"items":[{"index":{"_index":"test","_type":"type1","_id":"794719072","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}}}}

经过排查发现,在入库过程中,java 错误的将字符串 1 (“1”)进行了入库操作,即将如下格式的数据进行入库:

{"index":{"_index":"test","_type":"type1"}}
"1"

 则会报上述错误,入库数据格式有误,mapping 无法解析

关于elasticsearch模糊匹配max_expansions和min_similarityelasticsearch 模糊匹配的问题我们已经讲解完毕,感谢您的阅读,如果还想了解更多关于Building a Recipe Search Site with Angular and Elasticsearch、Elastic search模糊匹配,精确匹配显示在前、Elasticsearch document_missing_exception、elasticsearch 入库错误 mapper_parsing_exception等相关内容,可以在本站寻找。

本文标签: