Docker 整合 Prometheus、Grafana 监控 Redis（prometheus监控docker 容器）

25-01-25 46

这篇文章主要围绕Docker整合Prometheus、Grafana监控Redis和prometheus监控docker容器展开，旨在为您提供一份详细的参考资料。我们将全面介绍Docker整合Prom

这篇文章主要围绕Docker 整合 Prometheus、Grafana 监控 Redis和prometheus监控docker 容器展开，旨在为您提供一份详细的参考资料。我们将全面介绍Docker 整合 Prometheus、Grafana 监控 Redis的优缺点，解答prometheus监控docker 容器的相关问题，同时也会为您带来cAdvisor+Prometheus+Grafana监控docker、CentOS-Docker监控minio集群(grafana+prometheus)、docker stack部署prometheus + grafana、Docker 下 Prometheus 和 Grafana 三部曲之一：极速体验的实用方法。

本文目录一览：

Docker 整合 Prometheus、Grafana 监控 Redis（prometheus监控docker 容器）
cAdvisor+Prometheus+Grafana监控docker
CentOS-Docker监控minio集群(grafana+prometheus)
docker stack部署prometheus + grafana
Docker 下 Prometheus 和 Grafana 三部曲之一：极速体验

Docker 整合 Prometheus、Grafana 监控 Redis（prometheus监控docker 容器）

前沿：Docker 环境下整合 Prometheus 和 Grafana 监控 Redis 性能

环境：Centos、Docker

一、下载镜像版本

docker pull prom/node-exporter
ocker run -d -p 9100:9100 prom/node-exporter

docker pull grafana/grafana
docker run -d --name=grafana -p 3000:3000 grafana/grafana

docker pull prom/prometheus
1、在 /usr/local/src 新建 touch prometheus.yml 文件，文件夹可自定义
2、编辑 vim prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

alerting:
alertmanagers:
- static_configs:
- targets:

rule_files:

scrape_configs:
- job_name: ''prometheus''
static_configs:
- targets: [''localhost:9090'']

- job_name: ''redis''
static_configs:
- targets: ['' 服务器 IP:9121'']
labels:
instance: redis

- job_name: ''linux''
static_configs:
- targets: ['' 服务器 IP:9100'']
labels:
instance: node
3、启动容器
sudo docker run -d -p 9090:9090 -v /usr/local/src/prometheus.yml:/usr/local/src/prometheus.yml prom/prometheus --config.file=/usr/local/src/prometheus.yml

docker pull redis
docker run -d --name=redis -p 6379:6379 redis

docker pull redis_exporter
docker run -d --name redis_exporter -p 9121:9121 redis_exporter --redis.addr redis://redis 连接 IP:6379

二、查看镜像容器是否启动成功

三、查看 Prometheus 是否创建完成

四、登陆 Grafana 创建数据源，登陆初始账号密码：admin/admin

cAdvisor+Prometheus+Grafana监控docker

一、cAdvisor(需要监控的主机都要安装)

官方地址：https://github.com/google/cadvisor

CAdvisor是谷歌开发的用于分析运行中容器的资源占用和性能指标的开源工具。CAdvisor是一个运行时的守护进程，负责收集、聚合、处理和输出运行中容器的信息。
注意在查找相关资料后发现这是最新版cAdvisor的bug，换成版本为google/cadvisor:v0.24.1 就ok了，映射主机端口默认是8080,可以修改。

sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8090:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:v0.24.1

cAdvisor exposes a web UI at its port:
http://<hostname>:<port>/

下图为cAdvisor的web界面，数据实时刷新但是不能存储。

查看json格式
http://192.168.247.212:8090/metrics

二、Prometheus

官方地址：https://prometheus.io/
随着容器技术的迅速发展，Kubernetes 已然成为大家追捧的容器集群管理系统。Prometheus 作为生态圈 Cloud Native Computing Foundation（简称：CNCF）中的重要一员,其活跃度仅次于 Kubernetes, 现已广泛用于 Kubernetes 集群的监控系统中。本文将简要介绍 Prometheus 的组成和相关概念，并实例演示 Prometheus 的安装，配置及使用，以便开发人员和云平台运维人员可以快速的掌握 Prometheus。
Prometheus 简介
Prometheus 是一套开源的系统监控报警框架。它启发于 Google 的 borgmon 监控系统，由工作在 SoundCloud 的 google 前员工在 2012 年创建，作为社区开源项目进行开发，并于 2015 年正式发布。2016 年，Prometheus 正式加入 Cloud Native Computing Foundation，成为受欢迎度仅次于 Kubernetes 的项目。
作为新一代的监控框架，Prometheus 具有以下特点：

强大的多维度数据模型：

时间序列数据通过 metric 名和键值对来区分。
所有的 metrics 都可以设置任意的多维标签。
数据模型更随意，不需要刻意设置为以点分隔的字符串。
可以对数据模型进行聚合，切割和切片操作。
支持双精度浮点类型，标签可以设为全 unicode。

    灵活而强大的查询语句（PromQL）：在同一个查询语句，可以对多个 metrics 进行乘法、加法、连接、取分数位等操作。
    易于管理： Prometheus server 是一个单独的二进制文件，可直接在本地工作，不依赖于分布式存储。
    高效：平均每个采样点仅占 3.5 bytes，且一个 Prometheus server 可以处理数百万的 metrics。
    使用 pull 模式采集时间序列数据，这样不仅有利于本机测试而且可以避免有问题的服务器推送坏的 metrics。
    可以采用 push gateway 的方式把时间序列数据推送至 Prometheus server 端。
    可以通过服务发现或者静态配置去获取监控的 targets。
    有多种可视化图形界面。
    易于伸缩。

需要指出的是，由于数据采集可能会有丢失，所以 Prometheus 不适用对采集数据要 100% 准确的情形。但如果用于记录时间序列数据，Prometheus 具有很大的查询优势，此外，Prometheus 适用于微服务的体系架构
Prometheus 组成及架构

Prometheus 生态圈中包含了多个组件，其中许多组件是可选的：

Prometheus Server: 用于收集和存储时间序列数据。
Client Library: 客户端库，为需要监控的服务生成相应的 metrics 并暴露给 Prometheus server。当 Prometheus server 来 pull 时，直接返回实时状态的 metrics。
Push Gateway: 主要用于短期的 jobs。由于这类 jobs 存在时间较短，可能在 Prometheus 来 pull 之前就消失了。为此，这次 jobs 可以直接向 Prometheus server 端推送它们的 metrics。这种方式主要用于服务层面的 metrics，对于机器层面的 metrices，需要使用 node exporter。
Exporters: 用于暴露已有的第三方服务的 metrics 给 Prometheus。
Alertmanager: 从 Prometheus server 端接收到 alerts 后，会进行去除重复数据，分组，并路由到对收的接受方式，发出报警。常见的接收方式有：电子邮件，pagerduty，OpsGenie, webhook 等。一些其他的工具。

Prometheus 架构图

安装步骤：

wget https://github.com/prometheus/prometheus/releases/download/v2.8.0/prometheus-2.8.0.linux-amd64.tar.gz
tar -xf prometheus-2.8.0.linux-amd64.tar.gz
cd prometheus-2.8.0.linux-amd64
修改配置文件prometheus.yml，添加以下内容
    static_configs:
    - targets: ['192.168.247.211:9090']
  - job_name: 'docker'
    static_configs:
    - targets:
      - "192.168.247.211:8090"
      - "192.168.247.212:8090"

cp prometheus promtool /usr/local/bin/

启动：
nohup prometheus --config.file=./prometheus.yml &

我的完整简单prometheus.yml配置文件：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['192.168.247.211:9090']
  - job_name: 'docker'
    static_configs:
    - targets: 
      - "192.168.247.211:8090"
      - "192.168.247.212:8090"

访问：http://192.168.247.211:9090

三、Grafana

官方地址：https://grafana.com/
安装步骤：

wget https://dl.grafana.com/oss/release/grafana-6.0.1-1.x86_64.rpm
sudo yum localinstall grafana-6.0.1-1.x86_64.rpm -y
systemctl daemon-reload
systemctl start grafana-server
systemctl status grafana-server
#设置开机自启动
Enable the systemd service so that Grafana starts at boot.
sudo systemctl enable grafana-server.service

1.访问：http://192.168.247.211:3000/login
默认密码：admin/admin

2.配置Prometheus数据源

3.下载模板模板地址：https://grafana.com/dashboards

4.导入模板

5.成品

CentOS-Docker监控minio集群(grafana+prometheus)

一、开启minio监控数据

传送门>>minio集群搭建

每台调整minio集群启动启动命令(增加标红内容)

export MINIO_ACCESS_KEY=minio
export MINIO_SECRET_KEY=minio123
export MINIO_PROMETHEUS_AUTH_TYPE="public"

/minio/bin/minio server --address :9000 --config-dir /etc/minio \
http://192.168.1.101/minio/data1 http://192.168.1.102/minio/data2 \
http://192.168.1.103/minio/data3 http://192.168.1.104/minio/data4 > minio.log

启动后验证是否可以获取监控信息（浏览器不可见）

$ curl http://192.168.1.101:9000/minio/prometheus/metrics

$ curl http://192.168.1.102:9000/minio/prometheus/metrics

$ curl http://192.168.1.103:9000/minio/prometheus/metrics

$ curl http://192.168.1.104:9000/minio/prometheus/metrics

二、配置prometheus + grafana监控

传送门>> 监控环境搭建可参考

1.创建配置文件

$ vim /minio/prometheus.yml

global:
scrape_interval: 15s
evaluation_interval: 15s

alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

scrape_configs:
- job_name: minio-job
metrics_path: /minio/prometheus/metrics
scheme: http
static_configs:
- targets: [''192.168.1.101:19000'']
labels:
group: minio
instance: minio-101
- targets: [''192.168.1.102:19000'']
labels:
group: minio
instance: minio-102
- targets: [''192.168.1.103:19000'']
labels:
group: minio
instance: minio-103
- targets: [''192.168.1.104:19000'']
labels:
group: minio
instance: minio-104

2.数据源prometheus服务

$ docker run --restart=unless-stopped -d --name=prometheus-minio -p 9090:9090 -v /minio/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

3.可视化展示grafana服务（默认用户名/密码：admin/admin ）

$ docker run --restart=unless-stopped -d --name=grafana -p 3000:3000 grafana/grafana

3.1.添加数据源

3.2.导入监控dashboard

目前：https://grafana.com/grafana/dashboards/13502

如果url更新请参考github获取最新地址>>仪表盘参考minio配置监控说明

三、查看监控

docker stack部署prometheus + grafana

通过docker stack部署prometheus、node-exporter、alertmanager和grafana。prometheus最新版本：2.19.2

swarm集群（一个节点）：

manager     192.168.30.135

mkdir -p /home/prom/{prometheus,prometheus/data,alertmanager,grafana}chmod 777 /home/prom/{prometheus/data,grafana}cd /home/prom

tree ..├── alertmanager
│   ├── alertmanager.yml
│   └── config.yml
├── docker-stack.yml
├── grafana
└── prometheus
    ├── alert-rules.yml
    ├── data
    └── prometheus.yml

4 directories, 5 files

Prometheus

vim /home/prom/prometheus/alert-rules.yml

groups:
  - name: node-alert    rules:
    - alert: NodeDown      expr: up{job="node"} == 0      for: 5m      labels:
        severity: critical        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} down"        description: "Instance: {{ $labels.instance }} 已经宕机 5分钟"        value: "{{ $value }}"
        
    - alert: NodecpuHigh      expr: (1 - avg by (instance) (irate(node_cpu_seconds_total{job="node",mode="idle"}[5m]))) * 100 > 80      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} cpu使用率过高"        description: "cpu 使用率超过 80%"
        value: "{{ $value }}"

    - alert: NodecpuIowaitHigh      expr: avg by (instance) (irate(node_cpu_seconds_total{job="node",mode="iowait"}[5m])) * 100 > 50      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} cpu iowait 使用率过高"        description: "cpu iowait 使用率超过 50%"
        value: "{{ $value }}"

    - alert: NodeLoad5High      expr: node_load5 > (count by (instance) (node_cpu_seconds_total{job="node",mode='system'})) * 1.2      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} load(5m) 过高"        description: "Load(5m) 过高，超出cpu核数 1.2倍"
        value: "{{ $value }}"

    - alert: NodeMemoryHigh      expr: (1 - node_memory_MemAvailable_bytes{job="node"} / node_memory_MemTotal_bytes{job="node"}) * 100 > 90      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} memory 使用率过高"        description: "Memory 使用率超过 90%"
        value: "{{ $value }}"

    - alert: NodediskRootHigh      expr: (1 - node_filesystem_avail_bytes{job="node",fstype=~"ext.*|xfs",mountpoint ="/"} / node_filesystem_size_bytes{job="node",fstype=~"ext.*|xfs",mountpoint ="/"}) * 100 > 90      for: 10m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk(/ 分区) 使用率过高"        description: "disk(/ 分区) 使用率超过 90%"
        value: "{{ $value }}"

    - alert: NodediskBootHigh      expr: (1 - node_filesystem_avail_bytes{job="node",fstype=~"ext.*|xfs",mountpoint ="/boot"} / node_filesystem_size_bytes{job="node",fstype=~"ext.*|xfs",mountpoint ="/boot"}) * 100 > 80      for: 10m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk(/boot 分区) 使用率过高"        description: "disk(/boot 分区) 使用率超过 80%"
        value: "{{ $value }}"

    - alert: NodediskReadHigh      expr: irate(node_disk_read_bytes_total{job="node"}[5m]) > 20 * (1024 ^ 2)      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk 读取字节数 速率过高"        description: "disk 读取字节数 速率超过 20 MB/s"
        value: "{{ $value }}"

    - alert: NodediskWriteHigh      expr: irate(node_disk_written_bytes_total{job="node"}[5m]) > 20 * (1024 ^ 2)      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk 写入字节数 速率过高"        description: "disk 写入字节数 速率超过 20 MB/s"
        value: "{{ $value }}"
        
    - alert: NodediskReadrateCountHigh      expr: irate(node_disk_reads_completed_total{job="node"}[5m]) > 3000      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk iops 每秒读取速率过高"        description: "disk iops 每秒读取速率超过 3000 iops"
        value: "{{ $value }}"

    - alert: NodediskWriterateCountHigh      expr: irate(node_disk_writes_completed_total{job="node"}[5m]) > 3000      for: 5m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk iops 每秒写入速率过高"        description: "disk iops 每秒写入速率超过 3000 iops"
        value: "{{ $value }}"

    - alert: NodeInodeRootUsedPercentHigh      expr: (1 - node_filesystem_files_free{job="node",fstype=~"ext4|xfs",mountpoint="/"} / node_filesystem_files{job="node",fstype=~"ext4|xfs",mountpoint="/"}) * 100 > 80      for: 10m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk(/ 分区) inode 使用率过高"        description: "disk (/ 分区) inode 使用率超过 80%"
        value: "{{ $value }}"

    - alert: NodeInodeBootUsedPercentHigh      expr: (1 - node_filesystem_files_free{job="node",fstype=~"ext4|xfs",mountpoint="/boot"} / node_filesystem_files{job="node",fstype=~"ext4|xfs",mountpoint="/boot"}) * 100 > 80      for: 10m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} disk(/boot 分区) inode 使用率过高"        description: "disk (/boot 分区) inode 使用率超过 80%"
        value: "{{ $value }}"
        
    - alert: NodeFilefdAllocatedPercentHigh      expr: node_filefd_allocated{job="node"} / node_filefd_maximum{job="node"} * 100 > 80      for: 10m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} filefd 打开百分比过高"        description: "Filefd 打开百分比 超过 80%"
        value: "{{ $value }}"

    - alert: NodeNetworkNetinBitRateHigh      expr: avg by (instance) (irate(node_network_receive_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]) * 8) > 20 * (1024 ^ 2) * 8      for: 3m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} network 接收比特数 速率过高"        description: "Network 接收比特数 速率超过 20MB/s"
        value: "{{ $value }}"

    - alert: NodeNetworkNetoutBitRateHigh      expr: avg by (instance) (irate(node_network_transmit_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]) * 8) > 20 * (1024 ^ 2) * 8      for: 3m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} network 发送比特数 速率过高"        description: "Network 发送比特数 速率超过 20MB/s"
        value: "{{ $value }}"
        
    - alert: NodeNetworkNetinPacketErrorRateHigh      expr: avg by (instance) (irate(node_network_receive_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m])) > 15      for: 3m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} 接收错误包 速率过高"        description: "Network 接收错误包 速率超过 15个/秒"
        value: "{{ $value }}"

    - alert: NodeNetworkNetoutPacketErrorRateHigh      expr: avg by (instance) (irate(node_network_transmit_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m])) > 15      for: 3m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} 发送错误包 速率过高"        description: "Network 发送错误包 速率超过 15个/秒"
        value: "{{ $value }}"

    - alert: NodeProcessBlockedHigh      expr: node_procs_blocked{job="node"} > 10      for: 10m      labels:
        severity: warning        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} 当前被阻塞的任务的数量过多"        description: "Process 当前被阻塞的任务的数量超过 10个"
        value: "{{ $value }}"

    - alert: NodeTimeOffsetHigh      expr: abs(node_timex_offset_seconds{job="node"}) > 3 * 60      for: 2m      labels:
        severity: info        instance: "{{ $labels.instance }}"
      annotations:
        summary: "instance: {{ $labels.instance }} 时间偏差过大"        description: "Time 节点的时间偏差超过 3m"
        value: "{{ $value }}"

vim /home/prom/prometheus/prometheus.yml

global:
  scrape_interval:     15s  evaluation_interval: 15salerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093rule_files:
  - "*rules.yml"
  scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['prometheus:9090']

  - job_name: 'node'
    static_configs:
    - targets: ['192.168.30.135:9100']

  - job_name: 'alertmanager'
    static_configs:
    - targets: ['alertmanager:9093']

Alertmanager

vim /home/prom/alertmanager/config.yml

targets:
  webhook:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx             #修改为钉钉机器人的webhook
    mention:
      all: true

vim /home/prom/alertmanager/alertmanager.yml

global:
  resolve_timeout: 5m  smtp_smarthost: 'smtp.163.com:465'             #邮箱smtp服务器代理，启用SSL发信, 端口一般是465
  smtp_from: 'alert@163.com'              #发送邮箱名称
  smtp_auth_username: 'alert@163.com'              #邮箱名称
  smtp_auth_password: 'password'                #邮箱密码或授权码
  smtp_require_tls: falseroute:
  receiver: 'default'
  group_wait: 10s  group_interval: 1m  repeat_interval: 1h  group_by: ['alertname']inhibit_rules:- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  equal: ['alertname', 'instance']
  receivers:- name: 'default'
  email_configs:
  - to: 'receiver@163.com'
    send_resolved: true
  webhook_configs:
  - url: 'http://dingtalk:8060/dingtalk/webhook/send'
    send_resolved: true

docker-stack.yml

vim /home/prom/docker-stack.yml

version: '3.7'services:
  dingtalk:
    image: timonwong/prometheus-webhook-dingtalk:latest    ports:
      - "8060:8060"
    configs:
      - source: dingtalk_config        target: /etc/prometheus-webhook-dingtalk/config.yml    networks:
      - prom    deploy:
      mode: replicated      replicas: 1

  alertmanager:
    image: prom/alertmanager:latest    ports:
      - "9093:9093"
      - "9094:9094"
    configs:
      - source: alertmanager_config        target: /etc/alertmanager/alertmanager.yml    networks:
      - prom    deploy:
      mode: replicated      replicas: 1

  prometheus:
    image: prom/prometheus:latest    ports:
      - "9090:9090"
    configs:
      - source: prometheus_config        target: /etc/prometheus/prometheus.yml      - source: alert_rules        target: /etc/prometheus/alert-rules.yml    networks:
      - prom    deploy:
      mode: replicated      replicas: 1
      placement:
        constraints:
          - node.role == manager  grafana:
    image: grafana/grafana:latest    ports:
      - "3000:3000"
    volumes:
      - type: volume        source: grafana        target: /var/lib/grafana    networks:
      - prom    deploy:
      mode: replicated      replicas: 1
      configs:
  dingtalk_config:
    file: ./alertmanager/config.yml    
  alertmanager_config:
    file: ./alertmanager/alertmanager.yml    
  prometheus_config:
    file: ./prometheus/prometheus.yml    
  alert_rules:
    file: ./prometheus/alert-rules.yml    
volumes:
  prometheus:
    driver: local    driver_opts:
      type: none      o: bind      device: /home/prom/prometheus/data  grafana:
    driver: local    driver_opts:
      type: none      o: bind      device: /home/prom/grafana      
networks:
  prom:
    driver: overlay

version: '3.7'services:
  node-exporter:
    image: prom/node-exporter:latest    ports:
      - "9100:9100"
    networks:
      - prom  dingtalk:
    image: timonwong/prometheus-webhook-dingtalk:latest    volumes:
      - type: bind        source: ./alertmanager/config.yml        target: /etc/prometheus-webhook-dingtalk/config.yml        read_only: true
    ports:
      - "8060:8060"
    networks:
      - prom  alertmanager:
    depends_on:
      - dingtalk    image: prom/alertmanager:latest    volumes:
      - type: bind        source: ./alertmanager/alertmanager.yml        target: /etc/alertmanager/alertmanager.yml        read_only: true
    ports:
      - "9093:9093"
      - "9094:9094"
    networks:
      - prom  prometheus:
    depends_on:
      - alertmanager    image: prom/prometheus:latest    volumes:
      - type: bind        source: ./prometheus/prometheus.yml        target: /etc/prometheus/prometheus.yml        read_only: true
      - type: bind        source: ./prometheus/alert-rules.yml        target: /etc/prometheus/alert-rules.yml        read_only: true
      - type: volume        source: prometheus        target: /prometheus    ports:
      - "9090:9090"
    networks:
      - prom  grafana:
    depends_on:
      - prometheus    image: grafana/grafana:latest    volumes:
      - type: volume        source: grafana        target: /var/lib/grafana    ports:
      - "3000:3000"
    networks:
      - promvolumes:
  prometheus:
    driver: local    driver_opts:
      type: none      o: bind      device: /home/prom/prometheus/data  grafana:
    driver: local    driver_opts:
      type: none      o: bind      device: /home/prom/grafana      
networks:
  prom:
    driver: bridge

docker stack deploy prom --compose-file docker-stack.yml

docker stack lsNAME                SERVICES            orchestraTOR
prom                4                   Swarm

docker service lsID                  NAME                MODE                REPLICAS            IMAGE                                          PORTS
f72uewsvc8os        prom_alertmanager   replicated          1/1                 prom/alertmanager:latest                       *:9093-9094->9093-9094/tcp
qonjcrm8pf8o        prom_dingtalk       replicated          1/1                 timonwong/prometheus-webhook-dingtalk:latest   *:8060->8060/tcp
u376krlzd9o6        prom_grafana        replicated          1/1                 grafana/grafana:latest                         *:3000->3000/tcp
kjj909up7ptd        prom_prometheus     replicated          1/1                 prom/prometheus:latest                         *:9090->9090/tcp

docker psCONTAINER ID        IMAGE                                          COMMAND                  CREATED              STATUS              PORTS               NAMES
daf3f972ceea        timonwong/prometheus-webhook-dingtalk:latest   "/bin/prometheus-web…"   About a minute ago   Up About a minute   8060/tcp            prom_dingtalk.1.76ick5qr2fquysl6noztepypa
bcd8f36c78dc        grafana/grafana:latest                         "/run.sh"                About a minute ago   Up About a minute   3000/tcp            prom_grafana.1.ybv3yqburoc6olwys0xh2pqlk
160b53a9f51e        prom/prometheus:latest                         "/bin/prometheus --c…"   About a minute ago   Up About a minute   9090/tcp            prom_prometheus.1.wo8gjnlqlup2nd0ejb88pca85
709ee8176696        prom/alertmanager:latest                       "/bin/alertmanager -…"   About a minute ago   Up About a minute   9093/tcp            prom_alertmanager.1.5beu8aeyt1towanyj9wixtggr

容器启动正常，访问ip:9090，

在这里插入图片描述

因为docker-stack.yml不包含node-exporter，所以状态是DOWN。访问ip:3000，

在这里插入图片描述

可以看到，prometheus各组件状态正常。

node-exporter

通过docker单独启动node-exporter：

docker pull prom/node-exporter:latest

docker run -d -p 9100:9100 --name node-exporter prom/node-exporter:latest

在这里插入图片描述

增加监控主机时，修改prometheus.yml，然后更新prometheus：

docker service update prom_prometheus

测试告警

docker stop node-exporter

docker psCONTAINER ID        IMAGE                                          COMMAND                  CREATED             STATUS              PORTS                    NAMES
daf3f972ceea        timonwong/prometheus-webhook-dingtalk:latest   "/bin/prometheus-web…"   17 minutes ago      Up 17 minutes       8060/tcp                 prom_dingtalk.1.76ick5qr2fquysl6noztepypa
bcd8f36c78dc        grafana/grafana:latest                         "/run.sh"                17 minutes ago      Up 17 minutes       3000/tcp                 prom_grafana.1.ybv3yqburoc6olwys0xh2pqlk
160b53a9f51e        prom/prometheus:latest                         "/bin/prometheus --c…"   17 minutes ago      Up 17 minutes       9090/tcp                 prom_prometheus.1.wo8gjnlqlup2nd0ejb88pca85
709ee8176696        prom/alertmanager:latest                       "/bin/alertmanager -…"   17 minutes ago      Up 17 minutes       9093/tcp                 prom_alertmanager.1.5beu8aeyt1towanyj9wixtggr

收到钉钉和邮件故障告警，

在这里插入图片描述

docker start node-exporter

docker psCONTAINER ID        IMAGE                                          COMMAND                  CREATED             STATUS              PORTS                    NAMES
daf3f972ceea        timonwong/prometheus-webhook-dingtalk:latest   "/bin/prometheus-web…"   17 minutes ago      Up 17 minutes       8060/tcp                 prom_dingtalk.1.76ick5qr2fquysl6noztepypa
bcd8f36c78dc        grafana/grafana:latest                         "/run.sh"                17 minutes ago      Up 17 minutes       3000/tcp                 prom_grafana.1.ybv3yqburoc6olwys0xh2pqlk
160b53a9f51e        prom/prometheus:latest                         "/bin/prometheus --c…"   17 minutes ago      Up 17 minutes       9090/tcp                 prom_prometheus.1.wo8gjnlqlup2nd0ejb88pca85
709ee8176696        prom/alertmanager:latest                       "/bin/alertmanager -…"   17 minutes ago      Up 17 minutes       9093/tcp                 prom_alertmanager.1.5beu8aeyt1towanyj9wixtggr
95252704e558        prom/node-exporter:latest                      "/bin/node_exporter"     24 hours ago        Up 7 minutes        0.0.0.0:9100->9100/tcp   node-exporter

收到钉钉和邮件恢复告警，

在这里插入图片描述

测试宕机完成，告警没有问题。

docker stack部署 prometheus + grafana 完成，整个部署过程在swarm集群中进行。与docker-compose类似，相比传统部署方式要简便很多。

Docker 下 Prometheus 和 Grafana 三部曲之一：极速体验

开源监控工具 Prometheus 目前广为使用，配合 Grafana 即可直观展现监控数据，但对于初学者来说搭建这样一个系统要花费些时间，或者有时也想要快速搭好系统使用其功能，今天的实战中，我们在 Docker 环境下快速搭建和体验一个典型的业务监控系统，包括 prometheus、node-exporter、cadvisor、grafana、业务后台服务 (springboot)；

本文是《Docker 下 Prometheus 和 Grafana 三部曲》系列的第一篇，整个系列包含以下内容：

极速体验，也就是本文；
细说 Docker 相关开发，介绍在 Docker 环境搭建整个环境的细节；
自定义监控项的开发和展示配置；

三部曲所有文章链接

《Docker 下 Prometheus 和 Grafana 三部曲之一：极速体验》；
《Docker 下 Prometheus 和 Grafana 三部曲之二：细说 Docker 编排》；
《Docker 下 Prometheus 和 Grafana 三部曲之三：自定义监控项开发和配置》；

环境信息

本次实战的环境信息如下：

操作系统：CentOS 7.6.1810
Docker：17.03.2-ce
node-exporter：0.17.0 (docker 容器，不需要您准备)
prometheus：2.8.0 (docker 容器，不需要您准备)
cadvisor：0.28.0 (docker 容器，不需要您准备)
grafana：5.4.2 (docker 容器，不需要您准备)
springboot：1.5.19.RELEASE (docker 容器，不需要您准备)

本次实战的电脑是 CentOS 系统，IP 地址为 192.168.1.101

操作步骤简介

实战由以下几步组成：

一行命令创建所有容器；
取得 Grafana 的 API KEY，创建图表时要用到；
一行命令创建所有图表；
体验监控系统；

接下来开始实战吧。

创建容器

wget https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/prometheus.yml && \
wget https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/docker-compose.yml && \
docker-compose up -d

上述命令会下载容器编排脚本和 prometheus 的配置文件，然后创建容器，控制台输出如下信息（如果本地没有相关的镜像还会显示下载镜像的信息）：

[root@hedy prometheus_blog]# wget https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/prometheus.yml && \
> wget https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/docker-compose.yml && \
> docker-compose up -d
--2019-03-09 23:05:28--  https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/prometheus.yml
正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.228.133
正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.228.133|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：1173 (1.1K) [text/plain]
正在保存至: “prometheus.yml”

100%[==================================================================================================================================================================>] 1,173       --.-K/s 用时 0s      

2019-03-09 23:05:28 (162 MB/s) - 已保存 “prometheus.yml” [1173/1173])

--2019-03-09 23:05:28--  https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/docker-compose.yml
正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.228.133
正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.228.133|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：1727 (1.7K) [text/plain]
正在保存至: “docker-compose.yml”

100%[==================================================================================================================================================================>] 1,727       --.-K/s 用时 0s      

2019-03-09 23:05:29 (108 MB/s) - 已保存 “docker-compose.yml” [1727/1727])

Creating network "prometheus_blog_default" with the default driver
Creating node-exporter  ... done
Creating prometheusdemo ... done
Creating cadvisor       ... done
Creating prometheus     ... done
Creating grafana        ... done

取得 Grafana 的 API KEY

现在要取得 Grafana 的授权 KEY，有了这个 KEY 就能通过脚本快速创建数据源和各种监控报表了，具体操作如下：

浏览器访问 Grafana 页面：http://192.168.1.101:3000
用户名 admin，密码 secret；
在左侧的设置菜单中，点击 "API keys"，如下图：
如下图，点击按钮 "+New API Key"：
在创建 API key 的页面，输入 key 名称 admin，Role 选择 Admin，然后点击 "Add" 按钮，如下图：
弹出的窗口会显示新建的 API Key，下图红框 1 中的内容就是我们后面会用到的 API Key，请保存下来，红框 2 中的命令是获取首页信息的，可以执行一下试试，成功了表示这个 API Key 是有效的：

创建监控图表

执行以下命令即可创建监控图表，注意把 192.168.1.101 替换为您的 CentOS 机器的 IP 地址，如果是在当前机器上操作就用 127.0.0.1，把 xxxxxxxxx 替换为前面步骤中取得的 API Key，例如我的是 eyJrIjoieHpHeUVoODFiMDNiZE13TDdqUGxNS1NoN0dhVGVVc04iLCJuIjoiYWRtaW4iLCJpZCI6MX0=：

wget https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/import_dashboard.sh && \
./import_dashboard.sh 192.168.1.101 xxxxxxxxx

控制台会输出如下类似信息，表示数据源和图表都创建成功：

[root@hedy 10]# wget https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/import_dashboard.sh && \
> chmod a+x import_dashboard.sh && \
> ./import_dashboard.sh 192.168.1.101 eyJrIjoiSEduM2g1clFpaVZRNjRXTTBSek50aUp3OFM3bXUzVUciLCJuIjoiYWRtaW4iLCJpZCI6MX0=
--2019-03-09 23:46:27--  https://raw.githubusercontent.com/zq2599/blog_demos/master/prometheusdemo/files/import_dashboard.sh
正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.108.133
正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.108.133|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：113606 (111K) [text/plain]
正在保存至: “import_dashboard.sh”

100%[==================================================================================================================================================================>] 113,606      468KB/s 用时 0.2s   

2019-03-09 23:46:28 (468 KB/s) - 已保存 “import_dashboard.sh” [113606/113606])

grafana host [192.168.1.101]
api key [eyJrIjoiSEduM2g1clFpaVZRNjRXTTBSek50aUp3OFM3bXUzVUciLCJuIjoiYWRtaW4iLCJpZCI6MX0=]
start create datasource
{
   
   
   "datasource":{
   
   
   "id":2,"orgId":1,"name":"Prometheus","type":"prometheus","typeLogoUrl":"","access":"proxy","url":"http://prometheus:9090","password":"","user":"","database":"","basicAuth":false,"basicAuthUser":"","basicAuthPassword":"","withCredentials":false,"isDefault":false,"secureJsonFields":{
   
   
   },"version":1,"readOnly":false},"id":2,"message":"Datasource added","name":"Prometheus"}
start create host dashboard
{
   
   
   "id":3,"slug":"1-zhu-ji-ji-chu-jian-kong-cpu-nei-cun-ci-pan-wang-luo","status":"success","uid":"h1qcOBjmz","url":"/d/h1qcOBjmz/1-zhu-ji-ji-chu-jian-kong-cpu-nei-cun-ci-pan-wang-luo","version":1}
start create jvm dashboard
{
   
   
   "id":4,"slug":"spring-boot-1-x","status":"success","uid":"MP3cOBjmz","url":"/d/MP3cOBjmz/spring-boot-1-x","version":1}
start create customize dashboard
{
   
   
   "id":5,"slug":"ye-wu-zi-ding-yi-jian-kong","status":"success","uid":"Hl3cOBCmk","url":"/d/Hl3cOBCmk/ye-wu-zi-ding-yi-jian-kong","version":1}

至此，整个环境已经 OK，一起来体验一下吧；

体验环境

查看 prometheus 的 web 页面，看配置的监控数据源是否有效，如下图，地址是：http://192.168.1.101:9090/targets

上图一共展示了四个数据源：
a. http://prometheusdemohost:8080/prometheus 是个 springboot 框架的 web 应用；
b. http://127.0.0.1:9090/metrics 是 prometheus 自身的数据；
c. http://cadvisorhost:8080/metrics 是 cadvisor 的数据，用来监控 docker 容器的；
d. http://node-exporterhost:9100/metrics 是当前宿主机的环境信息；
查看 cAdvisor 的 web 页面，可以看到 docker 的基本情况，如下图，地址是：http://192.168.1.101:8080
访问业务 web 服务，这是个 springboot 框架的 web 应用，提供了一个 web 接口，返回一个带了时间的字符串，地址是：http://192.168.1.101:8081/greet
注意：每访问一次上述业务 web 接口，web 应用都会上报一条记录到 prometheus，所以建议您多访问几次改接口，后面可以再 Grafana 上看到改接口的请求量曲线图；
查看 Grafana 的监控图表，地址是：http://192.168.1.101:3000 ，如下图，可见数据源已经配置成了 prometheus (前面的 shell 脚本中执行的)：
如下图，点击红框 1 中的 "manage"，即可见到右侧监控列表中出现了三项：
从第一个 "1. 主机基础监控 (cpu，内存，磁盘，网络)" 看起，如下图，服务器的各项信息都能展示出来，注意红框中的内容要选择准确，可以截图一致即可：
第二项 "Spring Boot 1.x" 主要是 JVM 和 Spring Boot 环境的相关信息，如下：
第三项是业务代码中自己上报到 prometheus 的数据，这些可以和具体业务做绑定，做出更多定制化的内容（具体的开发细节后面的文章会讲到），如下图：

至此，基于 Prometheus 和 Grafana 的监控系统我们已经体验完毕了，希望能帮助您在短时间内对整个系统有个初步了解，接下来的章节将会解释如此简化的步骤是怎样做到的，请看《Docker 下 Prometheus 和 Grafana 三部曲之二：细说 Docker 编排》；

欢迎关注我的公众号：程序员欣宸

在这里插入图片描述

本文分享 CSDN - 程序员欣宸。
如有侵权，请联系 support@oschina.cn 删除。
本文参与 “OSC 源创计划”，欢迎正在阅读的你也加入，一起分享。

关于Docker 整合 Prometheus、Grafana 监控 Redis和prometheus监控docker 容器的介绍已经告一段落，感谢您的耐心阅读，如果想了解更多关于cAdvisor+Prometheus+Grafana监控docker、CentOS-Docker监控minio集群(grafana+prometheus)、docker stack部署prometheus + grafana、Docker 下 Prometheus 和 Grafana 三部曲之一：极速体验的相关信息，请在本站寻找。

本文标签：