在本文中,我们将带你了解Elasticsearch:使用最新的ElasticsearchJavaclient8.0来创建索引并搜索在这篇文章中,我们将为您详细介绍Elasticsearch:使用最新的
在本文中,我们将带你了解Elasticsearch:使用最新的 Elasticsearch Java client 8.0 来创建索引并搜索在这篇文章中,我们将为您详细介绍Elasticsearch:使用最新的 Elasticsearch Java client 8.0 来创建索引并搜索的方方面面,并解答elasticsearch建立索引常见的疑惑,同时我们还将给您一些技巧,以帮助您实现更有效的46、elasticsearch(搜索引擎)scrapy写入数据到elasticsearch中、centos7.5系统elasticsearch使用滚动和全新安装升级到最新的elasticsearch7.4.2版本、ElasticSearch (一) ElasticSearch 的应用场景及为什么要选择 ElasticSearch?、ElasticSearch - 学习笔记 02-springboot 整合 jestclient 操作 elasticSearch。
本文目录一览:- Elasticsearch:使用最新的 Elasticsearch Java client 8.0 来创建索引并搜索(elasticsearch建立索引)
- 46、elasticsearch(搜索引擎)scrapy写入数据到elasticsearch中
- centos7.5系统elasticsearch使用滚动和全新安装升级到最新的elasticsearch7.4.2版本
- ElasticSearch (一) ElasticSearch 的应用场景及为什么要选择 ElasticSearch?
- ElasticSearch - 学习笔记 02-springboot 整合 jestclient 操作 elasticSearch
Elasticsearch:使用最新的 Elasticsearch Java client 8.0 来创建索引并搜索(elasticsearch建立索引)
这篇文章,我来详细地描述如何使用最新的
前提条件
- Java 8 及以后的版本
- 一个 JSON 对象映射库,允许你的应用程序类与 Elasticsearch API 无缝集成。 Java 客户端支持
Jackson 或像 Eclipse Yasson 的 JSON-B 库。
版本托管在
为什么需要一个新的 Java client?
也许有许多的开发者好奇为啥需要新的 client,以前的那个 High level rest client 不是好好的吗?以前的那个 High level REST client API 有如下的问题:
- 和 Elasticsearch server 共享很多的代码
- 拉取大量依赖 (30 + MB)。很多代码并不实用
- 容易误解:之前的 API 暴露了许多 Elasticsearch server 的内部情况
- 用手来书写 API
- API 在不同的版本中有时并不一致
- 需要大量的维护工作(400 多个 endpoints)
- 没有 JSON/object 映射的集成
- 你需要使用 byte buffers 来自己映射
新的 Java client API 具有一下的优点:
- 使用代码来生成 API
- 基于官方的 Elasticsearch API 正式文档
- Java client API 是新一代 Elasticsearch client 的第一个。后续有针对其它的语言发布
- 99% 的代码是自动生成的
- 一个提供更加现代 API 接口的机会
- 流畅的 functional builders
- 接近 Elasticsearch JSON 格式的分层 DSL
- 到/从和应用程序类的自动映射
- 保持 Java 8 的兼容性
安装
如果你还没有安装好自己的 Elasticsearch 及 Kibana 的话,请参阅我之前的文章:
- 如何在 Linux,MacOS 及 Windows 上进行安装 Elasticsearch
- Kibana:如何在 Linux,MacOS 及 Windows上安装 Elastic 栈中的 Kibana
- Elasticsearch:设置 Elastic 账户安全
如果你想在 Elastic Stack 8.0 上试用的话。你可以参阅文章 “
展示
在今天的展示中,我将使用 Maven 项目来进行展示尽管 gradle 也可以。为了方便大家的学习,我把我创建的项目上传到 github 上 GitHub - liu-xiao-guo/ElasticsearchJava-search8
首先,我们的 pom.xml 文件如下:
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>ElasticsearchJava-search8</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<elastic.version>8.0.1</elastic.version>
</properties>
<dependencies>
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>${elastic.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.3</version>
</dependency>
<!-- Needed only if you use the spring-boot Maven plugin -->
<dependency>
<groupId>jakarta.json</groupId>
<artifactId>jakarta.json-api</artifactId>
<version>2.0.1</version>
</dependency>
</dependencies>
</project>
如上所示,我们使用了 8.0.1 的版本。你也可以使用在地址 Maven Central Repository Search 上的最新版本 8.1.1。
接下来,我们创建一个叫做 Product.java 的文件:
Product.java
public class Product {
private String id;
private String name;
private int price;
public Product() {
}
public Product(String id,String name,int price) {
this.id = id;
this.name = name;
this.price = price;
}
public String getId() {
return id;
}
public String getName() {
return name;
}
public int getPrice() {
return price;
}
public void setId(String id) {
this.id = id;
}
public void setName(String name) {
this.name = name;
}
public void setPrice(int price) {
this.price = price;
}
@Override
public String toString() {
return "Product{" +
"id='" + id + '\'' +
",name='" + name + '\'' +
",price=" + price +
'}';
}
}
我们再接下来创建 ElasticsearchJava.java 文件:
import co.elastic.clients.elasticsearch.ElasticsearchAsyncclient;
import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch._types.query_dsl.*;
import co.elastic.clients.elasticsearch.core.*;
import co.elastic.clients.elasticsearch.core.search.Hit;
import co.elastic.clients.elasticsearch.core.search.TotalHits;
import co.elastic.clients.elasticsearch.core.search.TotalHitsRelation;
import co.elastic.clients.json.JsonData;
import co.elastic.clients.json.jackson.JacksonjsonpMapper;
import co.elastic.clients.transport.ElasticsearchTransport;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.client.HttpAsyncclientBuilder;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import java.io.IOException;
import java.util.List;
public class ElasticsearchJava {
private static ElasticsearchClient client = null;
private static ElasticsearchAsyncclient asyncclient = null;
private static synchronized void makeConnection() {
// Create the low-level client
final CredentialsProvider credentialsProvider =
new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,new UsernamePasswordCredentials("elastic","password"));
RestClientBuilder builder = RestClient.builder(
new HttpHost("localhost",9200))
.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
@Override
public HttpAsyncclientBuilder customizeHttpClient(
HttpAsyncclientBuilder httpClientBuilder) {
return httpClientBuilder
.setDefaultCredentialsProvider(credentialsProvider);
}
});
RestClient restClient = builder.build();
// Create the transport with a Jackson mapper
ElasticsearchTransport transport = new RestClientTransport(
restClient,new JacksonjsonpMapper());
// And create the API client
client = new ElasticsearchClient(transport);
asyncclient = new ElasticsearchAsyncclient(transport);
}
public static void main(String[] args) throws IOException {
makeConnection();
// Index data to an index products
Product product = new Product("abc","Bag",42);
IndexRequest<Object> indexRequest = new IndexRequest.Builder<>()
.index("products")
.id("abc")
.document(product)
.build();
client.index(indexRequest);
Product product1 = new Product("efg",42);
client.index(builder -> builder
.index("products")
.id(product1.getId())
.document(product1)
);
// Search for a data
TermQuery query = QueryBuilders.term()
.field("name")
.value("bag")
.build();
SearchRequest request = new SearchRequest.Builder()
.index("products")
.query(query._toQuery())
.build();
SearchResponse<Product> search =
client.search(
request,Product.class
);
for (Hit<Product> hit: search.hits().hits()) {
Product pd = hit.source();
System.out.println(pd);
}
// Match search
String searchText = "bag";
SearchResponse<Product> response1 = client.search(s -> s
.index("products")
.query(q -> q
.match(t -> t
.field("name")
.query(searchText)
)
),Product.class
);
TotalHits total1 = response1.hits().total();
boolean isExactResult = total1.relation() == TotalHitsRelation.Eq;
if (isExactResult) {
System.out.println("There are " + total1.value() + " results");
} else {
System.out.println("There are more than " + total1.value() + " results");
}
List<Hit<Product>> hits1 = response1.hits().hits();
for (Hit<Product> hit: hits1) {
Product pd2 = hit.source();
System.out.println("Found product " + pd2.getId() + ",score " + hit.score());
}
// Term search
SearchResponse<Product> search1 = client.search(s -> s
.index("products")
.query(q -> q
.term(t -> t
.field("name")
.value(v -> v.stringValue("bag"))
)),Product.class);
for (Hit<Product> hit: search1.hits().hits()) {
Product pd = hit.source();
System.out.println(pd);
}
// Splitting complex DSL
TermQuery termQuery = TermQuery.of(t ->t.field("name").value("bag"));
SearchResponse<Product> search2 = client.search(s -> s
.index("products")
.query(termQuery._toQuery()),Product.class
);
for (Hit<Product> hit: search2.hits().hits()) {
Product pd = hit.source();
System.out.println(pd);
}
// Search by product name
Query byName = MatchQuery.of(m -> m
.field("name")
.query("bag")
)._toQuery();
// Search by max price
Query byMaxPrice = RangeQuery.of(r -> r
.field("price")
.gte(JsonData.of(10))
)._toQuery();
// Combine name and price queries to search the product index
SearchResponse<Product> response = client.search(s -> s
.index("products")
.query(q -> q
.bool(b -> b
.must(byName)
.should(byMaxPrice)
)
),Product.class
);
List<Hit<Product>> hits = response.hits().hits();
for (Hit<Product> hit: hits) {
Product product2 = hit.source();
System.out.println("Found product " + product2.getId() + ",score " + hit.score());
}
// Creating aggregations
SearchResponse<Void> search3 = client.search( b-> b
.index("products")
.size(0)
.aggregations("price-histo",a -> a
.histogram(h -> h
.field("price")
.interval(20.0)
)
),Void.class
);
long firstBucketCount = search3.aggregations()
.get("price-histo")
.histogram()
.buckets().array()
.get(0)
.docCount();
System.out.println("doc count: " + firstBucketCount);
}
}
在上面,代码也非常直接。我们使用如下的代码来连接到 Elasticsearch:
private static synchronized void makeConnection() {
// Create the low-level client
final CredentialsProvider credentialsProvider =
new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,new JacksonjsonpMapper());
// And create the API client
client = new ElasticsearchClient(transport);
asyncclient = new ElasticsearchAsyncclient(transport);
}
在上面,我们使用 elastic 这个超级用户来进行访问。它的密码是 password。这个在实际的使用中,需要根据自己的情况来进行设置。
在下面,我们使用如下的两种格式来写入数据到 products 索引中:
// Index data to an index products
Product product = new Product("abc",42);
client.index(builder -> builder
.index("products")
.id(product1.getId())
.document(product1)
);
上述的写入类似于在 Kibana 中输入如下的指令:
PUT products/_doc/abc
{
"id": "abc","name": "Bag","price": 42
}
我们可以在 Kibana 中进行查看:
GET products/_search
上面的命令显示:
{
"took" : 0,"timed_out" : false,"_shards" : {
"total" : 1,"successful" : 1,"skipped" : 0,"Failed" : 0
},"hits" : {
"total" : {
"value" : 2,"relation" : "eq"
},"max_score" : 1.0,"hits" : [
{
"_index" : "products","_id" : "abc","_score" : 1.0,"_source" : {
"id" : "abc","name" : "Bag","price" : 42
}
},{
"_index" : "products","_id" : "efg","_source" : {
"id" : "efg","price" : 42
}
}
]
}
}
显然我们写入的数据是成功的。
接下来,我使用了如下的两种格式来进行搜索:
// Search for a data
TermQuery query = QueryBuilders.term()
.field("name")
.value("bag")
.build();
SearchRequest request = new SearchRequest.Builder()
.index("products")
.query(query._toQuery())
.build();
SearchResponse<Product> search =
client.search(
request,Product.class
);
for (Hit<Product> hit: search.hits().hits()) {
Product pd = hit.source();
System.out.println(pd);
}
SearchResponse<Product> search1 = client.search(s -> s
.index("products")
.query(q -> q
.term(t -> t
.field("name")
.value(v -> v.stringValue("bag"))
)),Product.class);
for (Hit<Product> hit: search1.hits().hits()) {
Product pd = hit.source();
System.out.println(pd);
}
这个搜索相当于:
GET products/_search
{
"query": {
"term": {
"name": {
"value": "bag"
}
}
}
}
上面的搜索结果为:
{
"took" : 0,"max_score" : 0.18232156,"_score" : 0.18232156,"price" : 42
}
}
]
}
}
Java 代码输出的结果为:
Product{id='abc',name='Bag',price=42}
Product{id='efg',price=42}
Product{id='abc',price=42}
我们使用如下的代码来简化一个复杂的 DSL:
// Splitting complex DSL
TermQuery termQuery = TermQuery.of(t ->t.field("name").value("bag"));
SearchResponse<Product> search2 = client.search(s -> s
.index("products")
.query(termQuery._toQuery()),Product.class
);
for (Hit<Product> hit: search2.hits().hits()) {
Product pd = hit.source();
System.out.println(pd);
}
同样上面的输出结果为:
Product{id='abc',price=42}
我们使用如下的代码:
// Search by product name
Query byName = MatchQuery.of(m -> m
.field("name")
.query("bag")
)._toQuery();
// Search by max price
Query byMaxPrice = RangeQuery.of(r -> r
.field("price")
.gte(JsonData.of(10))
)._toQuery();
// Combine name and price queries to search the product index
SearchResponse<Product> response = client.search(s -> s
.index("products")
.query(q -> q
.bool(b -> b
.must(byName)
.should(byMaxPrice)
)
),score " + hit.score());
}
来实现如下的一个搜索:
GET products/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "bag"
}
}
],"should": [
{
"range": {
"price": {
"gte": 10
}
}
}
]
}
}
}
它显示的结果是:
{
"took" : 1,"hits" : {
"total" : {
"value" : 1,"max_score" : 1.287682,"_score" : 1.287682,"price" : 42
}
}
]
}
}
而 Java 的输出结果为:
Found product abc,score 1.287682
最后,使用了一个 aggregation:
// Creating aggregations
SearchResponse<Void> search3 = client.search( b-> b
.index("products")
.size(0)
.aggregations("price-histo",Void.class
);
long firstBucketCount = search3.aggregations()
.get("price-histo")
.histogram()
.buckets().array()
.get(0)
.docCount();
System.out.println("doc count: " + firstBucketCount);
}
上面的 aggregation 相当于如下的请求:
GET products/_search
{
"size": 0,"aggs": {
"price-histo": {
"histogram": {
"field": "price","interval": 50
}
}
}
}
它的响应结果为:
{
"took" : 0,"max_score" : null,"hits" : [ ]
},"aggregations" : {
"price-histo" : {
"buckets" : [
{
"key" : 0.0,"doc_count" : 2
}
]
}
}
}
我们的 Java 代码的输出结果为:
doc count: 2
46、elasticsearch(搜索引擎)scrapy写入数据到elasticsearch中
【百度云搜索,搜各种资料:http://www.lqkweb.com】
【搜网盘,搜各种资料:http://www.swpan.cn】
前面我们讲到的elasticsearch(搜索引擎)操作,如:增、删、改、查等操作都是用的elasticsearch的语言命令,就像sql命令一样,当然elasticsearch官方也提供了一个python操作elasticsearch(搜索引擎)的接口包,就像sqlalchemy操作数据库一样的ORM框,这样我们操作elasticsearch就不用写命令了,用elasticsearch-dsl-py这个模块来操作,也就是用python的方式操作一个类即可
elasticsearch-dsl-py下载
下载地址:https://github.com/elastic/el...
文档说明:http://elasticsearch-dsl.read...
首先安装好elasticsearch-dsl-py模块
1、elasticsearch-dsl模块使用说明
create_connection(hosts=[''127.0.0.1'']):连接elasticsearch(搜索引擎)服务器方法,可以连接多台服务器
class Meta:设置索引名称和表名称
索引类名称.init(): 生成索引和表以及字段
实例化索引类.save():将数据写入elasticsearch(搜索引擎)
elasticsearch_orm.py 操作elasticsearch(搜索引擎)文件
#!/usr/bin/env python
# -*- coding:utf8 -*-
from datetime import datetime
from elasticsearch_dsl import DocType, Date, Nested, Boolean, \
analyzer, InnerObjectWrapper, Completion, Keyword, Text, Integer
# 更多字段类型见第三百六十四节elasticsearch(搜索引擎)的mapping映射管理
from elasticsearch_dsl.connections import connections # 导入连接elasticsearch(搜索引擎)服务器方法
connections.create_connection(hosts=[''127.0.0.1''])
class lagouType(DocType): # 自定义一个类来继承DocType类
# Text类型需要分词,所以需要知道中文分词器,ik_max_wordwei为中文分词器
title = Text(analyzer="ik_max_word") # 设置,字段名称=字段类型,Text为字符串类型并且可以分词建立倒排索引
description = Text(analyzer="ik_max_word")
keywords = Text(analyzer="ik_max_word")
url = Keyword() # 设置,字段名称=字段类型,Keyword为普通字符串类型,不分词
riqi = Date() # 设置,字段名称=字段类型,Date日期类型
class Meta: # Meta是固定写法
index = "lagou" # 设置索引名称(相当于数据库名称)
doc_type = ''biao'' # 设置表名称
if __name__ == "__main__": # 判断在本代码文件执行才执行里面的方法,其他页面调用的则不执行里面的方法
lagouType.init() # 生成elasticsearch(搜索引擎)的索引,表,字段等信息
# 使用方法说明:
# 在要要操作elasticsearch(搜索引擎)的页面,导入此模块
# lagou = lagouType() #实例化类
# lagou.title = ''值'' #要写入字段=值
# lagou.description = ''值''
# lagou.keywords = ''值''
# lagou.url = ''值''
# lagou.riqi = ''值''
# lagou.save() #将数据写入elasticsearch(搜索引擎)
2、scrapy写入数据到elasticsearch中
爬虫文件
# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from adc.items import LagouItem,LagouItemLoader #导入items容器类,和ItemLoader类
import time
class LagouSpider(CrawlSpider): #创建爬虫类
name = ''lagou'' #爬虫名称
allowed_domains = [''www.luyin.org''] #起始域名
start_urls = [''http://www.luyin.org/''] #起始url
custom_settings = {
"AUTOTHROTTLE_ENABLED": True, #覆盖掉settings.py里的相同设置,开启COOKIES
"DOWNLOAD_DELAY":5
}
rules = (
#配置抓取列表页规则
Rule(LinkExtractor(allow=(''ggwa/.*'')), follow=True),
#配置抓取内容页规则
Rule(LinkExtractor(allow=(''post/\d+.html.*'')), callback=''parse_job'', follow=True),
)
def parse_job(self, response): #回调函数,注意:因为CrawlS模板的源码创建了parse回调函数,所以切记我们不能创建parse名称的函数
atime = time.localtime(time.time()) #获取系统当前时间
dqatime = "{0}-{1}-{2} {3}:{4}:{5}".format(
atime.tm_year,
atime.tm_mon,
atime.tm_mday,
atime.tm_hour,
atime.tm_min,
atime.tm_sec
) # 将格式化时间日期,单独取出来拼接成一个完整日期
url = response.url
item_loader = LagouItemLoader(LagouItem(), response=response) # 将数据填充进items.py文件的LagouItem
item_loader.add_xpath(''title'', ''/html/head/title/text()'')
item_loader.add_xpath(''description'', ''/html/head/meta[@name="Description"]/@content'')
item_loader.add_xpath(''keywords'', ''/html/head/meta[@name="keywords"]/@content'')
item_loader.add_value(''url'', url)
item_loader.add_value(''riqi'', dqatime)
article_item = item_loader.load_item()
yield article_item
items.py文件
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
#items.py,文件是专门用于,接收爬虫获取到的数据信息的,就相当于是容器文件
import scrapy
from scrapy.loader.processors import MapCompose,TakeFirst
from scrapy.loader import ItemLoader #导入ItemLoader类也就加载items容器类填充数据
from adc.models.elasticsearch_orm import lagouType #导入elasticsearch操作模块
class LagouItemLoader(ItemLoader): #自定义Loader继承ItemLoader类,在爬虫页面调用这个类填充数据到Item类
default_output_processor = TakeFirst() #默认利用ItemLoader类,加载items容器类填充数据,是列表类型,可以通过TakeFirst()方法,获取到列表里的内容
def tianjia(value): #自定义数据预处理函数
return value #将处理后的数据返给Item
class LagouItem(scrapy.Item): #设置爬虫获取到的信息容器类
title = scrapy.Field( #接收爬虫获取到的title信息
input_processor=MapCompose(tianjia), #将数据预处理函数名称传入MapCompose方法里处理,数据预处理函数的形式参数value会自动接收字段title
)
description = scrapy.Field()
keywords = scrapy.Field()
url = scrapy.Field()
riqi = scrapy.Field()
def save_to_es(self):
lagou = lagouType() # 实例化elasticsearch(搜索引擎对象)
lagou.title = self[''title''] # 字段名称=值
lagou.description = self[''description'']
lagou.keywords = self[''keywords'']
lagou.url = self[''url'']
lagou.riqi = self[''riqi'']
lagou.save() # 将数据写入elasticsearch(搜索引擎对象)
return
pipelines.py文件
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don''t forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
from adc.models.elasticsearch_orm import lagouType #导入elasticsearch操作模块
class AdcPipeline(object):
def process_item(self, item, spider):
#也可以在这里将数据写入elasticsearch搜索引擎,这里的缺点是统一处理
# lagou = lagouType()
# lagou.title = item[''title'']
# lagou.description = item[''description'']
# lagou.keywords = item[''keywords'']
# lagou.url = item[''url'']
# lagou.riqi = item[''riqi'']
# lagou.save()
item.save_to_es() #执行items.py文件的save_to_es方法将数据写入elasticsearch搜索引擎
return item
settings.py文件,注册pipelines
# Configure item pipelines
# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
''adc.pipelines.AdcPipeline'': 300,
}
main.py爬虫启动文件
#!/usr/bin/env python
# -*- coding:utf8 -*-
from scrapy.cmdline import execute #导入执行scrapy命令方法
import sys
import os
sys.path.append(os.path.join(os.getcwd())) #给Python解释器,添加模块新路径 ,将main.py文件所在目录添加到Python解释器
execute([''scrapy'', ''crawl'', ''lagou'', ''--nolog'']) #执行scrapy命令
# execute([''scrapy'', ''crawl'', ''lagou'']) #执行scrapy命令
运行爬虫
写入elasticsearch(搜索引擎)情况
补充:elasticsearch-dsl 的 增删改查
#!/usr/bin/env python
# -*- coding:utf8 -*-
from datetime import datetime
from elasticsearch_dsl import DocType, Date, Nested, Boolean, \
analyzer, InnerObjectWrapper, Completion, Keyword, Text, Integer
# 更多字段类型见第三百六十四节elasticsearch(搜索引擎)的mapping映射管理
from elasticsearch_dsl.connections import connections # 导入连接elasticsearch(搜索引擎)服务器方法
connections.create_connection(hosts=[''127.0.0.1''])
class lagouType(DocType): # 自定义一个类来继承DocType类
# Text类型需要分词,所以需要知道中文分词器,ik_max_wordwei为中文分词器
title = Text(analyzer="ik_max_word") # 设置,字段名称=字段类型,Text为字符串类型并且可以分词建立倒排索引
description = Text(analyzer="ik_max_word")
keywords = Text(analyzer="ik_max_word")
url = Keyword() # 设置,字段名称=字段类型,Keyword为普通字符串类型,不分词
riqi = Date() # 设置,字段名称=字段类型,Date日期类型
class Meta: # Meta是固定写法
index = "lagou" # 设置索引名称(相当于数据库名称)
doc_type = ''biao'' # 设置表名称
if __name__ == "__main__": # 判断在本代码文件执行才执行里面的方法,其他页面调用的则不执行里面的方法
lagouType.init() # 生成elasticsearch(搜索引擎)的索引,表,字段等信息
# 使用方法说明:
# 在要要操作elasticsearch(搜索引擎)的页面,导入此模块
# lagou = lagouType() #实例化类
# lagou.title = ''值'' #要写入字段=值
# lagou.description = ''值''
# lagou.keywords = ''值''
# lagou.url = ''值''
# lagou.riqi = ''值''
# lagou.save() #将数据写入elasticsearch(搜索引擎)
1新增数据
from adc.models.elasticsearch_orm import lagouType #导入刚才配置的elasticsearch操作模块
lagou = lagouType() # 实例化elasticsearch(搜索引擎对象)
lagou._id = 1 #自定义ID,很重要,以后都是根据ID来操作
lagou.title = self[''title''] # 字段名称=值
lagou.description = self[''description'']
lagou.keywords = self[''keywords'']
lagou.url = self[''url'']
lagou.riqi = self[''riqi'']
lagou.save() # 将数据写入elasticsearch(搜索引擎对象)
2删除指定数据
from adc.models.elasticsearch_orm import lagouType #导入刚才配置的elasticsearch操作模块
sousuo_orm = lagouType() # 实例化
sousuo_orm.get(id=1).delete() # 删除id等于1的数据
3修改指定的数据
from adc.models.elasticsearch_orm import lagouType #导入刚才配置的elasticsearch操作模块
sousuo_orm = lagouType() # 实例化
sousuo_orm.get(id=1).update(title=''123456789'') # 修改id等于1的数据
以上全部使用elasticsearch-dsl模块
注意下面使用的原生elasticsearch模块
删除指定使用,就是相当于删除指定数据库
使用原生elasticsearch模块删除指定索引
from elasticsearch import Elasticsearch # 导入原生的elasticsearch(搜索引擎)接口
client = Elasticsearch(hosts=settings.Elasticsearch_hosts) # 连接原生的elasticsearch
# 使用原生elasticsearch模块删除指定索引
#要做容错处理,如果索引不存在会报错
try:
client.indices.delete(index=''jxiou_zuopin'')
except Exception as e:
pass
原生查询
from elasticsearch import Elasticsearch # 导入原生的elasticsearch(搜索引擎)接口
client = Elasticsearch(hosts=Elasticsearch_hosts) # 连接原生的elasticsearch
response = client.search( # 原生的elasticsearch接口的search()方法,就是搜索,可以支持原生elasticsearch语句查询
index="jxiou_zuopin", # 设置索引名称
doc_type="zuopin", # 设置表名称
body={ # 书写elasticsearch语句
"query": {
"multi_match": { # multi_match查询
"query": sousuoci, # 查询关键词
"fields": ["title"] # 查询字段
}
},
"from": (page - 1) * tiaoshu, # 从第几条开始获取
"size": tiaoshu, # 获取多少条数据
"highlight": { # 查询关键词高亮处理
"pre_tags": [''<span >''], # 高亮开始标签
"post_tags": [''</span>''], # 高亮结束标签
"fields": { # 高亮设置
"title": {} # 高亮字段
}
}
}
)
# 开始获取数据
total_nums = response["hits"]["total"] # 获取查询结果的总条数
hit_list = [] # 设置一个列表来储存搜索到的信息,返回给html页面
for hit in response["hits"]["hits"]: # 循环查询到的结果
hit_dict = {} # 设置一个字典来储存循环结果
if "title" in hit["highlight"]: # 判断title字段,如果高亮字段有类容
hit_dict["title"] = "".join(hit["highlight"]["title"]) # 获取高亮里的title
else:
hit_dict["title"] = hit["_source"]["title"] # 否则获取不是高亮里的title
hit_dict["id"] = hit["_source"]["nid"] # 获取返回nid
# 加密样音地址
hit_dict["yangsrc"] = jia_mi(str(hit["_source"]["yangsrc"])) # 获取返回yangsrc
hit_list.append(hit_dict)
centos7.5系统elasticsearch使用滚动和全新安装升级到最新的elasticsearch7.4.2版本
背景:
生产环境大量使用 elasticsearch 集群,不同的业务使用不同版本的elasticsearch
es经常曝出一些大的漏洞,需要进行版本升级,并且使用x-pack的基本验证功能,避免用户数据泄露
x-pack免费版本特征:
基本的TLS 功能,可对通信进行加密
文件和原生 Realm,可用于创建和管理用户
基于角色的访问控制,可用于控制用户对集群 API 和索引的访问权限;
通过针对 Kibana Spaces 的安全功能,还可允许在 Kibana 中实现多租户。
升级的两种策略:
1.滚动性升级,即不中断业务服务,一台一台进行升级
2.全新部署新版本,然后将数据迁移到新版本的es集群中
这两种方式都需要将数据恢复到新版本的es集群中,可以先进行快照备份
1.升级前先备份低版本的elasticserch的数据:快照方式
原理:即将老版本的es数据打个快照备份出来写入到 /opt/esback 目录中并进行nfs挂载到某台服务器中(作为nfs服务端的机器要有足够量的磁盘空间,最好磁盘性能比较好),新、旧两个es集群的配置文件中都引用配置 path.repo: ["/opt/esback/"],
这样新的集群也能对这个目录进行操作了,等待新集群搭建好后,直接把 /opt/esback 目录中的文件恢复到新集群的 索引 index 中即可
使用Mount nfs进行挂载共享(所有的es集群节点都可以访问):
目标:将本地es备份出来的数据目录/opt/esback 目录挂载到nfs的共享目录 /opt/es_snapshot,这样恢复的时候就都可以访问这个共享目录进行恢复了
// 在10.10.18.92上创建共享目录
创建共享目录,即作为nfs的共享目录
mkdir /opt/es_snapshot
创建本地备份出来的目录
/opt/esback
# 在集群所有节点中创建 /opt/esback 目录,即将es数据备份出来的目标目录
# 将其中一台es客户端作为nfs服务端
#nfs服务端的操作
# vim /etc/exports
# 注意此处的anonuid和gid要和运行es程序的用户保持一致
# 添加指定 uid 和 gid 的用户
groupadd -g 1000 elastic
useradd -u 1000 -g elastic elastic
# 修改 gid和 uid为500 命令示例:
usermod -u 500 es
groupmod -g 500 es
/opt/es_snapshot *(insecure,rw,no_root_squash,sync,anonuid=1000,anongid=1000)
// 查看共享文件夹
yum install -y exportfs
exportfs -rv
// nfs服务端修改nfs配置
vim /etc/sysconfig/nfs
修改如下:
RPCNFSDARGS="-N 2 -N 3"
----->启用
# Turn off v4 protocol support
RPCNFSDARGS="-N 4" ---->启用
重启生效
systemctl restart nfs
// 客户端操作
yum install -y nfs-utils
// 重启启动新集群机器的NFS服务
systemctl restart nfs
//每一台es节点服务器上进行Mount挂载
mount -t nfs 10.10.18.90:/opt/es_snapshot /opt/esback -o proto=tcp -o nolock
列出nfs服务端共享的目录:
[root@sz_kp_wanghong_dev02_18_93:/home/wanxing]# showmount -e 10.10.18.90
Export list for 10.10.18.92:
/opt/es_snapshot *
// 在旧机器上将共享目录的权限付给ES的运行用户
chown elastic:elastic -R /opt/esback
2.创建ES仓库my_backup
修改配置文件:
vim elasticsearch.yml
# 添加如下配置(需要在旧集群的每个节点上添加),重新启动集群
path.repo: ["/opt/esback"]
创建快照仓库 my_backup 命令:
curl -H "Content-Type: application/json" -v -XPUT http://10.10.18.90:9200/_snapshot/my_backup -d ''
{
"type": "fs",
"settings": {
"location": "/opt/esback",
"compress": true
}
}
''
# 返回值
{"acknowledged":true}
# 报错的处理
''RemoteTransportException[[ictr_node1][10.10.18.93:9300][internal:admin/repository/verify]]
# 权限不够
chown -R es.es /opt/es_snapshot/
chown -R es.es /opt/esback_20191104/
# 创建所有索引的备份
# curl -H "Content-Type: application/json" -v -XPUT http://10.10.18.90:9200/_snapshot/my_backup/snapshot20191107
{"accepted":true}
查看备份
[elastic@szyyelk01t slave02]$ curl -XGET http://10.10.18.90:9200/_snapshot/my_backup/snapshot20191107?pretty
{
"snapshots" : [
{
"snapshot" : "snapshot20191107",
"uuid" : "0_4SOntVS1GH-7irHjKBMQ",
"version_id" : 6030299,
"version" : "6.3.2",
"indices" : [
"support_faq_categorys",
"ticket_list",
"templates_search",
"site_page_search",
"support",
"templates_page_search",
"support_new_articles",
"article_version",
"blocks_version",
"search",
"version",
"article_search",
"templates",
"learn",
"templates_version",
"blocks_search",
"templates_page_version"
],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2019-11-07T01:35:00.811Z",
"start_time_in_millis" : 1573090500811,
"end_time" : "2019-11-07T01:35:03.702Z",
"end_time_in_millis" : 1573090503702,
"duration_in_millis" : 2891,
"failures" : [ ],
"shards" : {
"total" : 71,
"failed" : 0,
"successful" : 71
}
}
]
}
升级方式1:滚动升级 elasticsearch5.6.16 --> elasticsearch6.8.4
1.备份数据,避免出现问题后回滚
2.先升级到新版本,然后安装x-pack,此时再要求开发同事修改代码适配
a.先下载新版本的6.8.4
①关闭自动分片
curl -v -XPUT http://10.10.18.92:9200/_cluster/settings -d ''{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}''
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]# curl -v -XPUT http://10.10.18.92:9200/_cluster/settings -d ''{
> "persistent": {
> "cluster.routing.allocation.enable": "none"
> }
> }''
* Hostname was NOT found in DNS cache
* Trying 10.10.18.92...
* Connected to 10.10.18.92 (10.10.18.92) port 9200 (#0)
> PUT /_cluster/settings HTTP/1.1
> User-Agent: curl/7.36.0
> Host: 10.10.18.92:9200
> Accept: */*
> Content-Length: 73
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 73 out of 73 bytes
< HTTP/1.1 200 OK
< Warning: 299 Elasticsearch-5.6.15-fe7575a "Content type detection for rest requests is deprecated. Specify the content type using the [Content-Type] header." "Tue, 05 Nov 2019 08:14:44 GMT"
< content-type: application/json; charset=UTF-8
< content-length: 106
<
* Connection #0 to host 10.10.18.92 left intact
{"acknowledged":true,"persistent":{"cluster":{"routing":{"allocation":{"enable":"none"}}}},"transient":{}}
②暂时禁用非必要的索引并执行同步刷新
curl -XPOST http://10.10.18.92:9200/_flush/synced
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]# curl -XPOST http://10.10.18.92:9200/_flush/synced
{"_shards":{"total":28,"successful":28,"failed":0},"channel_rel":{"total":4,"successful":4,"failed":0},".kibana":{"total":2,"successful":2,"failed":0},"channel":{"total":6,"successful":6,"failed":0},"video":{"total":4,"successful":4,"failed":0},"channel_list":{"total":6,"successful":6,"failed":0},"influecer":{"total":6,"successful":6,"failed":0}}
注意: 如果是从6.3之前的版本升级上来的,需要注意提前要移除X-Pack插件,然后再去升级版本。执行bin/elasticsearch-plugin remove x-pack
a. 备份原来的elasticsearch目录,然后解压新版的elasticsearch。
b. 如果使用外部的配置路径,配置ES_PATH_CONF环境变量到那个位置。如果没有的话,拷贝老的配置目录过来新的elasticsearch目录就可以了。
c. 检查path.data是否指向正确的数据目录
d. 检查path.log是否指向正确的日志目录
新集群的配置文件
[es@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]$ more config/elasticsearch.yml
cluster.name: kp-dev-application
node.name: ictr_node2
node.master: true
node.attr.rack: r1
node.max_local_storage_nodes: 3
network.host: 10.10.18.92
http.port: 9200
transport.tcp.port: 9300
path.repo: ["/opt/esback_20191104"]
discovery.zen.minimum_master_nodes: 1
http.cors.enabled: true
http.cors.allow-origin: "*"
# 新集群的数据还是指向老版本 es5.6.15 的数据存储目录
path.data: /opt/es-node/elasticsearch-5.6.15/data
path.logs: /opt/es-node/elasticsearch-5.6.15/logs
# 启用安全认证
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
③关闭节点
④重新启动节点,注意要切换到es用户,不能使用root用户
chown -R es.es elasticsearch-6.8.4
[es@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]$ bin/elasticsearch -d
在其他节点重复以上过程
启动升级后的节点,并通过查看日志和使用下面命令来检查节点是否正确加入到集群
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-5.6.15]# curl http://10.10.18.92:9200/_cat/nodes
10.10.18.93 16 98 56 1.22 0.50 0.29 di - ictr_node1
10.10.18.92 16 88 8 0.08 0.26 0.31 mdi * ictr_node2
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-5.6.15]# curl http://10.10.18.92:9200/_cat/indices
yellow open channel vRFQoIhmT8WmSbDCfph0ag 3 1 53374 0 44.2mb 44.2mb
yellow open channel_rel ZeeBbkogT5KtxzziUYtu_Q 2 1 459528 0 168.8mb 168.8mb
yellow open channel_list 1dk8uH8bTeikez0lFR2mJg 3 1 5509390 78630 7gb 7gb
yellow open video HNhyt9ioSEayAotGVXRCVg 2 1 798369 228155 1.6gb 1.6gb
yellow open .kibana lY82G_-XSniyd_bnMOLuQg 1 1 15 1 146.3kb 146.3kb
yellow open influecer RQtQWXKIRE2UYyZlCvv7bA 3 1 148526 48641 272.8mb 272.8mb
节点加入集群后,删除cluster.routing.allocation.enable设置以启用分片分配并开始使用节点:
curl -H "Content-Type: application/json" -v -XPUT http://10.10.18.92:9200/_cluster/settings -d ''{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}''
重新打开分片报错:
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-5.6.15]# curl -v -XPUT http://10.10.18.92:9200/_cluster/settings -d ''{
> "persistent": {
> "cluster.routing.allocation.enable": "true"
> }
> }''
* Hostname was NOT found in DNS cache
* Trying 10.10.18.92...
* Connected to 10.10.18.92 (10.10.18.92) port 9200 (#0)
> PUT /_cluster/settings HTTP/1.1
> User-Agent: curl/7.36.0
> Host: 10.10.18.92:9200
> Accept: */*
> Content-Length: 73
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 73 out of 73 bytes
< HTTP/1.1 406 Not Acceptable
< content-type: application/json; charset=UTF-8
< content-length: 97
<
* Connection #0 to host 10.10.18.92 left intact
{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-5.6.15]# curl http://10.10.18.92:9200/_cluster/health?pretty
{
"cluster_name" : "kp-dev-application",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 14,
"active_shards" : 28,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
安装新版本中文分词插件
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.4/elasticsearch-analysis-ik-6.8.4.zip
# 解压在plugin目录重新启动elasticsearch即可
cd /opt/es-node/elasticsearch-6.8.4/plugins
unzip -d elasticsearch-analysis-ik elasticsearch-analysis-ik-6.8.4.zip
ot@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4/plugins]# curl http://10.10.18.92:9200/_xpack?pretty
{
"build" : {
"hash" : "bca0c8d",
"date" : "2019-10-16T06:19:49.319352Z"
},
"license" : {
"uid" : "4de9d1c1-59f6-4dfd-8d48-baefd0a583d0",
"type" : "basic",
"mode" : "basic",
"status" : "active"
},
"features" : {
"ccr" : {
"description" : "Cross Cluster Replication",
"available" : false,
"enabled" : true
},
"graph" : {
"description" : "Graph Data Exploration for the Elastic Stack",
"available" : false,
"enabled" : true
},
"ilm" : {
"description" : "Index lifecycle management for the Elastic Stack",
"available" : true,
"enabled" : true
},
"logstash" : {
"description" : "Logstash management component for X-Pack",
"available" : false,
"enabled" : true
},
"ml" : {
"description" : "Machine Learning for the Elastic Stack",
"available" : false,
"enabled" : true,
"native_code_info" : {
"version" : "6.8.4",
"build_hash" : "93ad89b02ff490"
}
},
"monitoring" : {
"description" : "Monitoring for the Elastic Stack",
"available" : true,
"enabled" : true
},
"rollup" : {
"description" : "Time series pre-aggregation and rollup",
"available" : true,
"enabled" : true
},
"security" : {
"description" : "Security for the Elastic Stack",
"available" : true,
"enabled" : false
},
"sql" : {
"description" : "SQL access to Elasticsearch",
"available" : true,
"enabled" : true
},
"watcher" : {
"description" : "Alerting, Notification and Automation for the Elastic Stack",
"available" : false,
"enabled" : true
}
},
"tagline" : "You know, for X"
}
3.启用x-pack的密码验证
# 生成证书
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]# bin/elasticsearch-certutil ca
This tool assists you in the generation of X.509 certificates and certificate
signing requests for use with SSL/TLS in the Elastic stack.
The ''ca'' mode generates a new ''certificate authority''
This will create a new X.509 certificate and private key that can be used
to sign certificate when running in ''cert'' mode.
Use the ''ca-dn'' option if you wish to configure the ''distinguished name''
of the certificate authority
By default the ''ca'' mode produces a single PKCS#12 output file which holds:
* The CA certificate
* The CA''s private key
If you elect to generate PEM format certificates (the -pem option), then the output will
be a zip file containing individual files for the CA certificate and private key
Please enter the desired output file [elastic-stack-ca.p12]:
Enter password for elastic-stack-ca.p12 :
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]# ls
bin config elastic-stack-ca.p12 lib LICENSE.txt logs modules NOTICE.txt plugins README.textile
[root@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]# bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
This tool assists you in the generation of X.509 certificates and certificate
signing requests for use with SSL/TLS in the Elastic stack.
The ''cert'' mode generates X.509 certificate and private keys.
* By default, this generates a single certificate and key for use
on a single instance.
* The ''-multiple'' option will prompt you to enter details for multiple
instances and will generate a certificate and key for each one
* The ''-in'' option allows for the certificate generation to be automated by describing
the details of each instance in a YAML file
* An instance is any piece of the Elastic Stack that requires an SSL certificate.
Depending on your configuration, Elasticsearch, Logstash, Kibana, and Beats
may all require a certificate and private key.
* The minimum required value for each instance is a name. This can simply be the
hostname, which will be used as the Common Name of the certificate. A full
distinguished name may also be used.
* A filename value may be required for each instance. This is necessary when the
name would result in an invalid file or directory name. The name provided here
is used as the directory name (within the zip) and the prefix for the key and
certificate files. The filename is required if you are prompted and the name
is not displayed in the prompt.
* IP addresses and DNS names are optional. Multiple values can be specified as a
comma separated string. If no IP addresses or DNS names are provided, you may
disable hostname verification in your SSL configuration.
* All certificates generated by this tool will be signed by a certificate authority (CA).
* The tool can automatically generate a new CA for you, or you can provide your own with the
-ca or -ca-cert command line options.
By default the ''cert'' mode produces a single PKCS#12 output file which holds:
* The instance certificate
* The private key for the instance certificate
* The CA certificate
If you specify any of the following options:
* -pem (PEM formatted output)
* -keep-ca-key (retain generated CA key)
* -multiple (generate multiple certificates)
* -in (generate certificates from an input file)
then the output will be be a zip file containing individual certificate/key files
Enter password for CA (elastic-stack-ca.p12) :
Please enter the desired output file [elastic-certificates.p12]:
Enter password for elastic-certificates.p12 :
Certificates written to /opt/es-node/elasticsearch-6.8.4/elastic-certificates.p12
This file should be properly secured as it contains the private key for
your instance.
This file is a self contained file and can be copied and used ''as is''
For each Elastic product that you wish to configure, you should copy
this ''.p12'' file to the relevant configuration directory
and then follow the SSL configuration instructions in the product guide.
For client applications, you may only need to copy the CA certificate and
configure the client to trust this certificate.
# 修改config/elasticsearch.yml配置
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /usr/local/elasticsearch/config/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/local/elasticsearch/config/elastic-certificates.p12
# 配置密码
[es@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]$ bin/elasticsearch-setup-passwords interactive
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Changed password for user [apm_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]
密码:espass
[es@sz_kp_wanghong_dev01_18_92:/opt/es-node/elasticsearch-6.8.4]$ curl --user elastic:espass -XGET ''http://10.10.18.92:9200/_cat/indices''
green open channel_rel ZeeBbkogT5KtxzziUYtu_Q 2 1 459528 0 337.7mb 168.8mb
green open .security-6 iQHndFBqRe2Ss2o7KMxyFg 1 1 6 0 38.3kb 19.1kb
green open .kibana lY82G_-XSniyd_bnMOLuQg 1 1 15 1 292.6kb 146.3kb
green open influecer RQtQWXKIRE2UYyZlCvv7bA 3 1 148526 48641 545.6mb 272.8mb
green open channel vRFQoIhmT8WmSbDCfph0ag 3 1 53374 0 88.4mb 44.2mb
green open channel_list 1dk8uH8bTeikez0lFR2mJg 3 1 5522172 78630 14gb 7gb
green open video HNhyt9ioSEayAotGVXRCVg 2 1 798369 228155 3.3gb 1.6gb
升级方式2:完全重启集群升级
即配置好全新的elasticsearch7.4.2集群,然后把数据恢复到新集群中
下载地址:wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.2-linux-x86_64.tar.gz
老版本的配置
# cms elasticsearch
[root@szyyelk01t opt]# egrep -v ''^#|^$'' elk-master/config/elasticsearch.yml
cluster.name: cms-uat-elastic
node.name: master
path.data: /opt/elk-master/data/data01,/opt/elk-master/data/data02
path.logs: /opt/elk-master/logs
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
network.host: 10.10.18.90
http.port: 9200
http.cors.enabled: true
http.cors.allow-origin: "*"
[root@szyyelk01t elk-slave]# egrep -v ''^$|^#'' slave01/config/elasticsearch.yml
cluster.name: cms-uat-elastic
node.name: slave01
path.data: /opt/elk-slave/slave01/data/data01,/opt/elk-slave/slave01/data/data02
path.logs: /opt/elk-slave/slave01/logs
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
network.host: 10.10.18.90
http.port: 8200
discovery.zen.ping.unicast.hosts: ["10.10.18.90"]
http.cors.enabled: true
http.cors.allow-origin: "*"
[root@szyyelk01t elk-slave]# egrep -v ''^$|^#'' slave02/config/elasticsearch.yml
cluster.name: cms-uat-elastic
node.name: slave02
path.data: /opt/elk-slave/slave02/data/data01,/opt/elk-slave/slave02/data/data02
path.logs: /opt/elk-slave/slave02/logs
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
network.host: 10.10.18.90
http.port: 8201
discovery.zen.ping.unicast.hosts: ["10.10.18.90"]
http.cors.enabled: true
http.cors.allow-origin: "*"
# 已经升级的线上配置参考:
[root@eus_filmora_db01:/usr/local/elasticsearch-7.4.1]# egrep -v ''^$|^#'' config/elasticsearch.yml
cluster.name: UOS_CLUSTER_ES
node.name: uos_node_1
path.data: /data/elasticsearch_data/data
path.logs: /data/elasticsearch_data/logs
bootstrap.memory_lock: true
network.host: 172.20.103.199
http.port: 9200
transport.tcp.port: 9300
node.master: true
node.data: true
discovery.seed_hosts: ["172.20.103.199:9300", "172.20.73.200:9300", "172.20.73.212:9300"]
cluster.initial_master_nodes: ["172.20.103.199", "172.20.73.200", "172.20.73.212"]
gateway.recover_after_nodes: 2
transport.tcp.compress: true
path.repo: ["/data/bak_es"]
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /usr/local/elasticsearch/config/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/local/elasticsearch/config/elastic-certificates.p12
############## cms 系统的 elasticsearch 6升级7.4.2
整体策略:
1.升级cms测试环境的es到7.4.2,然后做适配性的开发,再次升级内网的生产环境 --> cms海外环境(找海外业务不繁忙的时候操作,先和开发协商好)
测试环境其他人依赖进行测试,所以升级需要两套并存,新版本的es7.4.2使用自带的openjdk13.0
1.配置新版本的elasticsearch使用指定的jdk环境
# vim bin/elasticsear
export JAVA_HOME=/opt/elk7_onenode/elasticsearch-7.4.2/jdk
export PATH=$JAVA_HOME/bin:$PATH
mkdir /opt/elk7_onenode/elasticsearch-7.4.2/data
# 主节点配置
[elastic@szyyelk01t elasticsearch-7.4.2]$ more config/elasticsearch.yml
cluster.name: cms-uat-elastic7
node.name: cms_node01
node.master: true
node.data: true
discovery.seed_hosts: ["10.10.18.90:19300", "10.10.18.117:19300"]
cluster.initial_master_nodes: ["10.10.18.90"]
path.data: /opt/cms_elk7/elasticsearch-7.4.2/data
path.logs: /opt/cms_elk7/elasticsearch-7.4.2/logs
discovery.zen.minimum_master_nodes: 1
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
network.host: 10.10.18.90
http.cors.enabled: true
http.cors.allow-origin: "*"
transport.tcp.compress: true
path.repo: ["/opt/esback/"]
gateway.recover_after_nodes: 1
# 增加新的参数head插件可以访问es
http.port: 19200
transport.tcp.port: 19300
gateway.recover_after_time: 8m
# 以下配置可以减少当es节点短时间宕机或重启时shards重新分布带来的磁盘io读写浪费
discovery.zen.fd.ping_timeout: 300s
discovery.zen.fd.ping_retries: 8
discovery.zen.fd.ping_interval: 30s
discovery.zen.ping_timeout: 180s
# 启用安全认证
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
# 第二个节点配置
[elastic@cms-test:/opt/cms_elk7/elasticsearch-7.4.2]$ more config/elasticsearch.yml
cluster.name: cms-uat-elastic7
node.name: cms_node02
node.master: false
node.data: true
discovery.seed_hosts: ["10.10.18.90:19300", "10.10.18.117:19300"]
cluster.initial_master_nodes: ["10.10.18.90"]
path.data: /opt/cms_elk7/elasticsearch-7.4.2/data
path.logs: /opt/cms_elk7/elasticsearch-7.4.2/logs
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
network.host: 10.10.18.117
http.cors.enabled: true
http.cors.allow-origin: "*"
transport.tcp.compress: true
path.repo: ["/opt/esback/"]
gateway.recover_after_nodes: 1
# 增加新的参数head插件可以访问es
http.port: 19200
transport.tcp.port: 19300
gateway.recover_after_time: 8m
# 以下配置可以减少当es节点短时间宕机或重启时shards重新分布带来的磁盘io读写浪费
discovery.zen.fd.ping_timeout: 300s
discovery.zen.fd.ping_retries: 8
discovery.zen.fd.ping_interval: 30s
discovery.zen.ping_timeout: 180s
# 启用安全认证
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
# 设置密码
# 启用安全认证,只添加这个选项
xpack.security.enabled: true
#xpack.security.transport.ssl.enabled: true
#xpack.security.transport.ssl.verification_mode: certificate
#xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
#xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
elastic密码:
espass
在集群上配置TLS:
如果你在操作单节点ES则可以跳过本内容。
1.生成CA证书 :
bin/elasticsearch-certutil ca
将产生新文件 elastic-stack-ca.p12。该 elasticsearch-certutil 命令还会提示你输入密码以保护文件和密钥,请保留该文件的副本并记住其密码,此处我们设置为空
2.为集群中的每个节点生成证书和私钥
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
将产生新文件 elastic-certificates.p12。系统还会提示你输入密码,你可以输入证书和密钥的密码,也可以按Enter键将密码留空。默认情况下 elasticsearch-certutil 生成没有主机名信息的证书,这意味着你可以将证书用于集群中的每个节点,另外要关闭主机名验证。
将 elastic-certificates.p12 文件复制到每个节点上Elasticsearch配置目录中
无需将 elastic-stack-ca.p12 文件复制到此目录。
mkdir config/certs
mv elastic-certificates.p12 config/certs/
配置集群中的每个节点以使用其签名证书标识自身并在传输层上启用TLS
启用TLS并指定访问节点证书所需的信息,将以下信息添加到每个节点的 elasticsearch.yml 文件中:
xpack.security.enabled: true
3.设置密码
# 报错
[elastic@szyyelk01t elasticsearch-7.4.2]$ bin/elasticsearch-setup-passwords interactive
Failed to determine the health of the cluster running at http://10.10.18.90:19200
Unexpected response code [503] from calling GET http://10.10.18.90:19200/_cluster/health?pretty
Cause: master_not_discovered_exception
It is recommended that you resolve the issues with your cluster before running elasticsearch-setup-passwords.
It is very likely that the password changes will fail when run against an unhealthy cluster.
Do you want to continue with the password setup process [y/N]y
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Unexpected response code [503] from calling PUT http://10.10.18.90:19200/_security/user/apm_system/_password?pretty
Cause: Cluster state has not been recovered yet, cannot write to the [null] index
Possible next steps:
* Try running this tool again.
* Try running with the --verbose parameter for additional messages.
* Check the elasticsearch logs for additional error details.
* Use the change password API manually.
ERROR: Failed to set password for user [apm_system].
[elastic@szyyelk01t elasticsearch-7.4.2]$ bin/elasticsearch-setup-passwords interactive
Connection failure to: http://10.10.18.90:19200/_security/_authenticate?pretty failed: Connection refused
ERROR: Failed to connect to elasticsearch at http://10.10.18.90:19200/_security/_authenticate?pretty. Is the URL correct and elasticsearch running?
[elastic@szyyelk01t elasticsearch-7.4.2]$ bin/elasticsearch-setup-passwords interactive
Failed to determine the health of the cluster running at http://10.10.18.90:19200
Unexpected response code [503] from calling GET http://10.10.18.90:19200/_cluster/health?pretty
Cause: master_not_discovered_exception
It is recommended that you resolve the issues with your cluster before running elasticsearch-setup-passwords.
It is very likely that the password changes will fail when run against an unhealthy cluster.
Do you want to continue with the password setup process [y/N]^C[elastic@szyyelk01t elasticsearch-7.4.2]$ bin/elasticsearch-setup-passwords interactive
Failed to determine the health of the cluster running at http://10.10.18.90:19200
Unexpected response code [503] from calling GET http://10.10.18.90:19200/_cluster/health?pretty
Cause: master_not_discovered_exception
It is recommended that you resolve the issues with your cluster before running elasticsearch-setup-passwords.
It is very likely that the password changes will fail when run against an unhealthy cluster.
Do you want to continue with the password setup process [y/N]
解决办法:只配置一个主节点:cluster.initial_master_nodes: ["10.10.18.90"]
# 错误2处理
[2019-11-07T16:12:31,563][INFO ][o.e.c.c.JoinHelper ] [cms_node02] failed to join {cms_node01}{765pAegcS8S0Y3OrE9taMA}{Up16Gw9pQlyXg3n1wCHE8g}{10.10.18.90}{10.10.18.90:19300}{dilm}{ml.machine_memory=8362151936, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={cms_node02}{765pAegcS8S0Y3OrE9taMA}{ki1VVW27TnakEEFagCoDlg}{10.10.18.117}{10.10.18.117:19300}{dil}{ml.machine_memory=16853446656, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={cms_node02}{765pAegcS8S0Y3OrE9taMA}{ki1VVW27TnakEEFagCoDlg}{10.10.18.117}{10.10.18.117:19300}{dil}{ml.machine_memory=16853446656, xpack.installed=true, ml.max_open_jobs=20}, targetNode={cms_node01}{765pAegcS8S0Y3OrE9taMA}{Up16Gw9pQlyXg3n1wCHE8g}{10.10.18.90}{10.10.18.90:19300}{dilm}{ml.machine_memory=8362151936, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [cms_node01][10.10.18.90:19300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalArgumentException: can''t add node {cms_node02}{765pAegcS8S0Y3OrE9taMA}{ki1VVW27TnakEEFagCoDlg}{10.10.18.117}{10.10.18.117:19300}{dil}{ml.machine_memory=16853446656, ml.max_open_jobs=20, xpack.installed=true}, found existing node {cms_node01}{765pAegcS8S0Y3OrE9taMA}{Up16Gw9pQlyXg3n1wCHE8g}{10.10.18.90}{10.10.18.90:19300}{dilm}{ml.machine_memory=8362151936, xpack.installed=true, ml.max_open_jobs=20} with the same id but is a different node instance
at org.elasticsearch.cluster.node.DiscoveryNodes$Builder.add(DiscoveryNodes.java:618) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.coordination.JoinTaskExecutor.execute(JoinTaskExecutor.java:147) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.coordination.JoinHelper$1.execute(JoinHelper.java:119) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.4.2.jar:7.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
# 拷贝连着目录 都拷贝,删除 data 下面的所有文件重启即可
# 最终密码配置成功
[elastic@szyyelk01t elasticsearch-7.4.2]$ bin/elasticsearch-setup-passwords interactive
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Changed password for user [apm_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]
# 查看集群状态
[elastic@szyyelk01t elasticsearch-7.4.2]$ curl -H "Content-Type: application/json" -u elastic:espass http://10.10.18.90:19200/_cluster/health?pretty
{
"cluster_name" : "cms-uat-elastic7",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 1,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
# 查看刚才创建所有索引的备份
# curl -H "Content-Type: application/json" -v -XPUT http://10.10.18.90:9200/_snapshot/my_backup/snapshot20191107
{"accepted":true}
# 恢复全索引快照
#保证elasticsearch用户拥有快照目录的权限
chown -R elastic.elastic /opt/esback
#创建仓库
curl -H "Content-Type: application/json" -XPUT -u elastic:espass http://10.10.18.90:19200/_snapshot/backup -d ''
{
"type":"fs",
"settings":{"location":"/opt/esback"}
}''
#查询全索引快照备份
$ curl -XGET -u elastic:espass "http://10.10.18.90:19200/_snapshot/backup/_all" | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 700 100 700 0 0 160k 0 --:--:-- --:--:-- --:--:-- 170k
{
"snapshots": [
{
"duration_in_millis": 2891,
"end_time": "2019-11-07T01:35:03.702Z",
"end_time_in_millis": 1573090503702,
"failures": [],
"include_global_state": true,
"indices": [
"support_faq_categorys",
"ticket_list",
"templates_search",
"site_page_search",
"support",
"templates_page_search",
"support_new_articles",
"article_version",
"blocks_version",
"search",
"version",
"article_search",
"templates",
"learn",
"templates_version",
"blocks_search",
"templates_page_version"
],
"shards": {
"failed": 0,
"successful": 71,
"total": 71
},
"snapshot": "snapshot20191107",
"start_time": "2019-11-07T01:35:00.811Z",
"start_time_in_millis": 1573090500811,
"state": "SUCCESS",
"uuid": "0_4SOntVS1GH-7irHjKBMQ",
"version": "6.3.2",
"version_id": 6030299
}
]
}
#恢复全索引快照
[elastic@szyyelk01t elasticsearch-7.4.2]$ curl -XPOST -u elastic:espass ''http://10.10.18.90:19200/_snapshot/backup/snapshot20191107/_restore''?wait_for_completion=true
{"snapshot":{"snapshot":"snapshot20191107","indices":["templates_page_search","article_search","blocks_version","learn","templates_page_version","templates","version","site_page_search","support_new_articles","support_faq_categorys","search","templates_search","blocks_search","ticket_list","article_version","support","templates_version"],"shards":{"total":71,"failed":0,"successful":71}}}
# 查看已经恢复成功
[elastic@szyyelk01t elasticsearch-7.4.2]$ curl -H "Content-Type: application/json" -u elastic:espass http://10.10.18.90:19200/_cat/indices
green open templates_page_search tUKh1vaHRla6QamphIByLQ 5 1 104 10 965.3kb 482.6kb
green open article_search _LE5n_-KRSGVH6Z3I1YLNQ 5 1 44 2 1.5mb 797.8kb
green open blocks_version VRmv8fyESY6iclBYkhKJ_w 5 1 9 0 145.5kb 72.7kb
green open learn W4RyJnkrStaRJwQgS4MAug 3 1 89 1 841.6kb 420.8kb
green open templates_page_version _hHckKOfRuCPEojviySxVw 5 1 945 0 1.5mb 777kb
green open templates 7iJqDoBwTbOEHcyEzPLHbA 5 1 138 0 2mb 1mb
green open version mLbfHoA7SAu4RWHSHM3vtw 3 1 1 0 39.9kb 19.9kb
green open support_new_articles HvGe-CklRU-iua-_T1pLNA 3 1 1534 170 12mb 6mb
green open site_page_search xxk8IetTSr2HF2tEe2Vc1w 5 1 516 2 1.5mb 817.2kb
green open .security-7 xdRnCeykQGGPcqM3-_WFCw 1 1 6 0 39.5kb 19.8kb
green open search fOteaZd0QfaU_2fKBaWPdA 3 1 0 0 1.5kb 783b
green open support_faq_categorys h61nZp5bSQqV1UGVyHL7WA 3 1 0 0 1.5kb 783b
green open templates_search ru8oFeQDTtKovOmkjP6A0w 5 1 111 3 1.5mb 802.8kb
green open blocks_search 8vMOY6ebTs-4iJIwM2VG0Q 5 1 0 0 2.5kb 1.2kb
green open article_version qcF3Nft6QMezKqtPHyYLlA 5 1 344 0 5mb 2.5mb
green open ticket_list xpvXuhlqRFq5Y_zugq0qKw 3 1 403 0 2.1mb 1mb
green open support LypmJq0pRDy428-TKOy6Yg 3 1 0 0 1.5kb 783b
green open templates_version gI28sYWJT3GVgfBeyJhSLg 5 1 220 0 4.2mb 2.1mb
ElasticSearch (一) ElasticSearch 的应用场景及为什么要选择 ElasticSearch?
先了解一下数据的分类
结构化数据
又可以称之为行数据,存储在数据库里,可以用二维表结构来逻辑表达实现的数据。其实就是可以能够用数据或者统一的结构加以表示的数据。比如在数据表存储商品的库存,可以用整型表示,存储价格可以用浮点型表示,再比如给用户存储性别,可以用枚举表示,这都是结构化数据。
非结构化数据
无法用数字或者统一的结构表示的数据,称之为飞结构化数据。如:文本、图像、声音、网页。
其实结构化数据又数据非结构化数据。商品标题、描述、文章描述都是文本,其实文本就是非结构化数据。那么就可以说非结构化数据即为全文数据。
什么是全文检索?
一种将文件或者数据库中所有文本与检索项相匹配的文字资料检索方法,称之为全文检索。
全文检索的两种方法
顺序扫描法:将数据表的所有数据逐个扫描,再对文字描述扫描,符合条件的筛选出来,非常慢!
索引扫描法:全文检索的基本思路,也就是将非结构化数据中的一部分信息提取出来,重新组织,使其变得有一定结构,然后对此有一定结构的数据进行搜索,从而达到搜索相对快的目的。
全文检索的过程:
先索引的创建,然后索引搜索
为什么要选择用 ElasticSearch?
全文搜索属于最常见的需求,开源的 Elasticsearch (以下简称 Elastic)是目前全文搜索引擎的首选。
Elastic 的底层是开源库 Lucene。但是,你没法直接用 Lucene,必须自己写代码去调用它的接口。Elastic 是 Lucene 的封装,提供了 REST API 的操作接口,开箱即用。
分布式的实时文件存储,每个字段都被索引可被搜索。
分布式的实时分析搜索引擎。
可以扩展到上百台服务器,处理 PB 级别结构化或者非结构化数据。
所有功能集成在一个服务器里,可以通过 RESTful API、各种语言的客户端甚至命令与之交互。
上手容易,提供了很多合理的缺省值,开箱即用,学习成本低。
可以免费下载、使用和修改。
配置灵活,比 Sphinx 灵活的多。
ElasticSearch - 学习笔记 02-springboot 整合 jestclient 操作 elasticSearch


<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.5.16.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.huarui</groupId>
<artifactId>sb_elasticsearch_jestclient</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>sb_elasticsearch_jestclient</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>io.searchbox</groupId>
<artifactId>jest</artifactId>
<version>5.3.3</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>5.6.7</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>spring-snapshots</id>
<name>Spring Snapshots</name>
<url>https://repo.spring.io/snapshot</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>spring-snapshots</id>
<name>Spring Snapshots</name>
<url>https://repo.spring.io/snapshot</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</pluginRepository>
<pluginRepository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
</pluginRepository>
</pluginRepositories>
</project>
<dependency>
<groupId>io.searchbox</groupId>
<artifactId>jest</artifactId>
<version>5.3.3</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>5.6.7</version>
</dependency>
spring.elasticsearch.jest.uris = http://192.168.79.129:9200/
spring.elasticsearch.jest.read-timeout = 10000
spring.elasticsearch.jest.username =
spring.elasticsearch.jest.password =
junit
import com.huarui.entity.User;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestResult;
import io.searchbox.core.*;
import io.searchbox.indices.CreateIndex;
import io.searchbox.indices.DeleteIndex;
import io.searchbox.indices.mapping.GetMapping;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
@RunWith(SpringRunner.class)
@SpringBootTest
public class ElasticApplicationTests {
private static String indexName = "userindex";
private static String typeName = "user";
@Autowired
JestClient jestClient;
/**
* 新增数据
* @return
* @throws Exception
*/
@Test
public void insert() throws Exception {
User user = new User(1L, "张三", 20, "张三是个Java开发工程师","2018-4-25 11:07:42");
Index index = new Index.Builder(user).index(indexName).type(typeName).build();
try{
JestResult jr = jestClient.execute(index);
System.out.println(jr.isSucceeded());
}catch(IOException e){
e.printStackTrace();
}
}
}
关于Elasticsearch:使用最新的 Elasticsearch Java client 8.0 来创建索引并搜索和elasticsearch建立索引的介绍现已完结,谢谢您的耐心阅读,如果想了解更多关于46、elasticsearch(搜索引擎)scrapy写入数据到elasticsearch中、centos7.5系统elasticsearch使用滚动和全新安装升级到最新的elasticsearch7.4.2版本、ElasticSearch (一) ElasticSearch 的应用场景及为什么要选择 ElasticSearch?、ElasticSearch - 学习笔记 02-springboot 整合 jestclient 操作 elasticSearch的相关知识,请在本站寻找。
本文标签: