Elasticsearch：使用文档中的自定义分数字段进行影响力评分（elasticsearch修改分片数量）

25-02-20 12

对于Elasticsearch：使用文档中的自定义分数字段进行影响力评分感兴趣的读者，本文将提供您所需要的所有信息，我们将详细讲解elasticsearch修改分片数量，并且为您提供关于elastic

对于Elasticsearch：使用文档中的自定义分数字段进行影响力评分感兴趣的读者，本文将提供您所需要的所有信息，我们将详细讲解elasticsearch修改分片数量，并且为您提供关于elasticsearch Mapping使用自定义分词器、Elasticsearch 为什么不支持对分组字段以外的字段进行排序、elasticsearch-将嵌套字段与文档中的另一个字段进行比较、Elasticsearch-片段在文档中的位置的宝贵知识。

本文目录一览：

Elasticsearch：使用文档中的自定义分数字段进行影响力评分（elasticsearch修改分片数量）
elasticsearch Mapping使用自定义分词器
Elasticsearch 为什么不支持对分组字段以外的字段进行排序
elasticsearch-将嵌套字段与文档中的另一个字段进行比较
Elasticsearch-片段在文档中的位置

Elasticsearch：使用文档中的自定义分数字段进行影响力评分（elasticsearch修改分片数量）

我有一组通过NLP算法从文本中提取的单词，以及每个文档中每个单词的相关分数。

例如：

document 1: {  "vocab": [ {"wtag":"James Bond", "rscore": 2.14 },                           {"wtag":"world", "rscore": 0.86 },                           ....,                           {"wtag":"somemore", "rscore": 3.15 }                        ]             }document 2: {  "vocab": [ {"wtag":"hiii", "rscore": 1.34 },                           {"wtag":"world", "rscore": 0.94 },                          ....,                           {"wtag":"somemore", "rscore": 3.23 }                         ]             }

我希望每个文档中rscore的match
wtag都可以影响_scoreES给它的给定值，或者乘以或加到上_score，以影响_score结果文档的最终（依次，顺序）。有什么办法可以做到这一点？

答案1

小编典典

解决此问题的另一种方法是使用嵌套文档：

首先设置映射以创建vocab一个嵌套文档，这意味着每个wtag/ rscore文档将在内部作为单独的文档建立索引：

curl -XPUT "http://localhost:9200/myindex/" -d''{  "settings": {"number_of_shards": 1},   "mappings": {    "mytype": {      "properties": {        "vocab": {          "type": "nested",          "fields": {            "wtag": {              "type": "string"            },            "rscore": {              "type": "float"            }          }        }      }    }  }}''

然后索引您的文档：

curl -XPUT "http://localhost:9200/myindex/mytype/1" -d''{  "vocab": [    {      "wtag": "James Bond",      "rscore": 2.14    },    {      "wtag": "world",      "rscore": 0.86    },    {      "wtag": "somemore",      "rscore": 3.15    }  ]}''curl -XPUT "http://localhost:9200/myindex/mytype/2" -d''{  "vocab": [    {      "wtag": "hiii",      "rscore": 1.34    },    {      "wtag": "world",      "rscore": 0.94    },    {      "wtag": "somemore",      "rscore": 3.23    }  ]}''

并运行nested查询以匹配所有嵌套文档，并rscore为每个与之匹配的嵌套文档求和：

curl -XGET "http://localhost:9200/myindex/mytype/_search" -d''{  "query": {    "nested": {      "path": "vocab",      "score_mode": "sum",      "query": {        "function_score": {          "query": {            "match": {              "vocab.wtag": "james bond world"            }          },          "script_score": {            "script": "doc[\"rscore\"].value"          }        }      }    }  }}''

elasticsearch Mapping使用自定义分词器

创建索引及配置分析器

PUT /my_index
{
    "settings": {
        "analysis": {
            "char_filter": {
                "&_to_and": {
                    "type":       "mapping",
                    "mappings": [ "& => and "]
            }},
            "filter": {
                "my_stopwords": {
                    "type":       "stop",
                    "stopwords": [ "the", "a" ]
            }},
            "analyzer": {
                "my_analyzer": {
                    "type":         "custom",
                    "char_filter":  [ "html_strip", "&_to_and" ],
                    "tokenizer":    "standard",
                    "filter":       [ "lowercase", "my_stopwords" ]
            }}
        } 
    }
}

创建索引类型与Mapping使用分析器

PUT /my_index/_mapping/_doc
{
	"_doc": {
		"properties": {
			"title": {
				"type": "text",
				"analyzer": "my_analyzer",
				"search_analyzer": "my_analyzer",
				"search_quote_analyzer": "my_analyzer"
			}
		}
	}
	
}

插入数据

POST /my_index/_doc/1

{
"title":"the a <a>你好</a> & "
}

检索

POST /my_index/_search

{
	"query": {
	    "match": {
	      "title": "你好"
	    }
	}
}

&替换为and

POST /my_index/_search

{
	"query": {
	    "match": {
	      "title": "and"
	    }
	}
}

the a过滤停止词

POST /my_index/_search

{
	"query": {
	    "match": {
	      "title": "the a"
	    }
	}
}

Elasticsearch 为什么不支持对分组字段以外的字段进行排序

SQL举例子：

select login_user_id,login_user_name,login_time from login group by login_time order by user_level

我觉得这是一个很正常的需求

为什么es 不支持呢

elasticsearch-将嵌套字段与文档中的另一个字段进行比较

我需要比较同一文档中的2个字段，实际值无关紧要。考虑以下文档：

_source: {    id: 123,    primary_content_type_id: 12,    content: [        {            id: 4,            content_type_id: 1            assigned: true        },        {            id: 5,            content_type_id: 12,            assigned: false        }    ]}

我需要找到所有未分配主要内容的文档。我无法找到一种方法来比较primary_content_type_id和嵌套的content.content_type_id以确保它们是相同的值。这是我使用脚本尝试过的。我认为我不了解脚本，但这可能是解决此问题的一种方式：

{    "filter": {        "nested": {            "path": "content",            "filter": {                "bool": {                    "must": [                        {                            "term": {                                "content.assigned": false                            }                        },                        {                            "script": {                                "script": "primary_content_type_id==content.content_type_id"                            }                        }                    ]                }            }        }    }}

请注意，如果我删除过滤器的脚本部分，并用另一个术语过滤器替换为，并在过滤器的脚本部分content_type_id =12添加了另一个过滤器，则会很好地工作primary_content_id =12。问题在于，我将不知道（或对我的用例来说也无关紧要）primary_content_type_idor
的值是什么content.content_type_id。只不过与content_type_id匹配的内容所分配的false无关紧要primary_content_type_id。

Elasticsearch是否可以进行此检查？

答案1

小编典典

对于嵌套搜索，您要搜索没有父对象的嵌套对象。不幸的是，没有可以与nested对象一起应用的隐藏联接。

至少在当前，这意味着您不会在脚本中同时收到“父”文档和嵌套文档。您可以通过以下两种方式替换脚本并测试结果来确认这一点：

# Parent Document does not exist"script": {  "script": "doc[''primary_content_type_id''].value == 12"}# Nested Document should exist"script": {  "script": "doc[''content.content_type_id''].value == 12"}

您可以
通过在objects上循环来以低于性能的方式执行此操作（而不是天生就让ES使用来为您执行此操作nested）。这意味着您必须将文档和nested文档重新索引为单个文档才能正常工作。考虑到您尝试使用它的方式，这可能并没有太大不同，甚至可能会表现得更好（特别是在缺少替代方法的情况下）。

# This assumes that your default scripting language is Groovy (default in 1.4)# Note1: "find" will loop across all of the values, but it will#  appropriately short circuit if it finds any!# Note2: It would be preferable to use doc throughout, but since we need the#  arrays (plural!) to be in the _same_ order, then we need to parse the#  _source. This inherently means that you must _store_ the _source, which#  is the default. Parsing the _source only happens on the first touch."script": {  "script": "_source.content.find { it.content_type_id == _source.primary_content_type_id && ! it.assigned } != null",  "_cache" : true}

我缓存的结果，因为没有动态发生在这里（例如，不比较日期now为实例），所以它是很安全的高速缓存，从而使未来的查找多
快。默认情况下，大多数过滤器都是缓存的，但是脚本是少数例外之一。

由于必须比较两个值以确保找到正确的内部对象，因此您正在重复一些
工作，但这实际上是不可避免的。拥有term过滤器最有可能胜过没有过滤器的情况。

Elasticsearch-片段在文档中的位置

我正在执行类似下面的短语查询。它返回给我按相关性排序的突出显示的片段。自然，我希望用户单击一个片段，然后将文档滚动到相应的位置。但是，我在Elasticsearch中看不到任何方法来找出片段在原始文档中的位置。有任何想法吗？

GET documents/doc/_search{   "query": {        "match_phrase": {            "text": {                "query": "hello world",                "slop":  10            }        }    },     "highlight" : {        "order" : "score",        "fields" : {            "text" : {"fragment_size" : 100, "number_of_fragments" : 10}        }    }}

答案1

小编典典

在此期间，我们找不到合适的解决方案，并遭到了以下黑客攻击（对我们而言非常有效）：在索引之前，我们用“ [index]”注释文本中的每个单词，以便“
一些要索引的文本 ”变成“ some [00]文本[01]到[02]索引[03]
”。然后，我们使用char过滤器，如下所示。当突出显示返回时，我们从突出显示文本中解析出单词位置。

"settings": {    "analysis": {      "char_filter": {        "remove_annotation": {          "type": "pattern_replace",          "pattern": "\\[[0-9]+\\]",          "replacement": ""        }      },      "analyzer": {        "annotated_english_language_analyzer": {          "type": "custom",          "char_filter": [            "remove_annotation"          ],          ...

请注意，注释索引应填充到log10(text_length)+1数字，以使找到的突出显示（删除注释后）的宽度不取决于发现的位置（从开始到结束）。

关于Elasticsearch：使用文档中的自定义分数字段进行影响力评分和elasticsearch修改分片数量的介绍现已完结，谢谢您的耐心阅读，如果想了解更多关于elasticsearch Mapping使用自定义分词器、Elasticsearch 为什么不支持对分组字段以外的字段进行排序、elasticsearch-将嵌套字段与文档中的另一个字段进行比较、Elasticsearch-片段在文档中的位置的相关知识，请在本站寻找。

本文标签：