percentile_cont和percentile_disc都没有在PostgreSQL 9.6.3中计算所需的第75个百分位数

25-02-06 14

如果您想了解percentile_cont和percentile_disc都没有在PostgreSQL9.6.3中计算所需的第75个百分位数的相关知识，那么本文是一篇不可错过的文章，我们将为您提供关于

如果您想了解percentile_cont和percentile_disc都没有在PostgreSQL 9.6.3中计算所需的第75个百分位数的相关知识，那么本文是一篇不可错过的文章，我们将为您提供关于50.percentiles百分比算法以及网站延时统计、AggregationBuilders.percentiles 使用详解、ES percentiles 以及percentiles rank访问延时SLA统计、file_get_contents(url): failed to open stream的解决方案的有价值的信息。

本文目录一览：

percentile_cont和percentile_disc都没有在PostgreSQL 9.6.3中计算所需的第75个百分位数
50.percentiles百分比算法以及网站延时统计
AggregationBuilders.percentiles 使用详解
ES percentiles 以及percentiles rank访问延时SLA统计
file_get_contents(url): failed to open stream的解决方案

percentile_cont和percentile_disc都没有在PostgreSQL 9.6.3中计算所需的第75个百分位数

使用百分位函数，但没有得到所需的输出。我会说“不正确”，但是功能可能按预期工作，而我只是不正确地理解它们。

这些是我正在使用的数字：

我的理解percentile_cont是，如果计数是偶数，它将聚合两个数字，将两个数字相加然后相除。我的理解percentile_disc是，如果计数为偶数，它将只选择最低的数字。

这是我对使用第50（中位数）示例计算百分位数的理解：

如果数字（n）为奇数，则选择中间的数字；否则，选择0。如果数字是偶数，则取中间两个数字的平均值。因此，在这种情况下，有32个数字，因此中位数=
(358625 + 364999.92) / 2 = 361812.46。percentile_cont返回正确的值，因为它将两个值取平均值；percentile_disc返回错误的值，因为它选择了两者中的最小值。

关于其他百分位，例如第十位，我的理解是将百分位乘以数字（n）来获得索引：.10 * 32 = 3.2 index在这种情况下。然后应该将您舍入到最接近的整数，这就是您的百分位数值。如果索引是整数，则将索引中的数字与紧随其后的数字进行平均。

在那种情况下，percentile_cont是错误的，因为它返回的251500甚至是我无法到达的数字。我能得到的最接近的平均值24000,250000,265000是251666.67。percentile_disc返回的正确结果250000。

但是真正的踢球者是 第75个。它应该469250根据我的计算返回。index = (32*.75) = 24，该索引应为(463500 + 475000) = 469250。percentile_disc回报463500;
percentile_cont返回466375，这又是我一生无法得出的数字。

这是我的查询：

SELECT 
    itemcode,COUNT(itemcode) AS n,PERCENTILE_DIST(0.10) WITHIN GROUP (ORDER BY price) AS 10th,PERCENTILE_DIST(0.25) WITHIN GROUP (ORDER BY price) AS 25th,PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY price) AS median,AVG(price) AS mean,PERCENTILE_DIST(0.65) WITHIN GROUP (ORDER BY price) AS 65th,PERCENTILE_DIST(0.75) WITHIN GROUP (ORDER BY price) AS 75th,PERCENTILE_DIST(0.90) WITHIN GROUP (ORDER BY price) AS 90th
FROM items
WHERE itemcode = 26 AND removed IS NULL
GROUP BY itemcode;

注意：在任何情况下removed都不是NULL。

我需要做些什么才能使其正常且一致地工作？我需要编写一个函数，检查n前先确定哪些percentile_disc或percentile_cont基于它是否是奇数还是偶数？

SQL小提琴：http
://sqlfiddle.com/#!17/aa09c/9

50.percentiles百分比算法以及网站延时统计

主要知识点

percentiles的用法

现有一个需求：比如有一个网站，记录下了每次请求的访问的耗时，需要统计tp50，tp90，tp99

tp50：50%的请求的耗时最长在多长时间
tp90：90%的请求的耗时最长在多长时间
tp99：99%的请求的耗时最长在多长时间

一、准备数据

1、建立mappings

PUT /website

{

"mappings": {

"logs":{

"properties": {

"latency":{"type": "long"},

"province":{"type": "keyword"},

"timestamp":{"type":"date"}

}

2、批量插入数据

POST /website/logs/_bulk

{ "index": {}}

{ "latency" : 105, "province" : "江苏", "timestamp" : "2016-10-28" }

{ "index": {}}

{ "latency" : 83, "province" : "江苏", "timestamp" : "2016-10-29" }

{ "index": {}}

{ "latency" : 92, "province" : "江苏", "timestamp" : "2016-10-29" }

{ "index": {}}

{ "latency" : 112, "province" : "江苏", "timestamp" : "2016-10-28" }

{ "index": {}}

{ "latency" : 68, "province" : "江苏", "timestamp" : "2016-10-28" }

{ "index": {}}

{ "latency" : 76, "province" : "江苏", "timestamp" : "2016-10-29" }

{ "index": {}}

{ "latency" : 101, "province" : "新疆", "timestamp" : "2016-10-28" }

{ "index": {}}

{ "latency" : 275, "province" : "新疆", "timestamp" : "2016-10-29" }

{ "index": {}}

{ "latency" : 166, "province" : "新疆", "timestamp" : "2016-10-29" }

{ "index": {}}

{ "latency" : 654, "province" : "新疆", "timestamp" : "2016-10-28" }

{ "index": {}}

{ "latency" : 389, "province" : "新疆", "timestamp" : "2016-10-28" }

{ "index": {}}

{ "latency" : 302, "province" : "新疆", "timestamp" : "2016-10-29" }

二、pencentiles操作

1、查找tp50、tp90、tp99

GET /website/logs/_search

{

"size": 0,

"aggs": {

"latency_percentiles": {"percentiles": {"field": "latency","percents": [50,90,99]}},

"latency_late":{"avg": {"field": "latency"}}

}

执行结果如下：

"aggregations": {

"latency_late": {

"value": 201.91666666666666

"latency_percentiles": {

"values": {

"50.0": 108.5,

"90.0": 380.3,

"99.0": 624.8500000000001

}

注意是的，这个tp50等，均不是求里面的最大值，es经过了计算，但是这个计算到是是怎么个计算，我现在也还不知。

2、查看各省的情况

确定是那些省份比较慢

GET /website/logs/_search

{

"size": 0,

"aggs": {"group_by_province":{

"terms": {"field": "province"},

"aggs": {

"latency_percentiles": {"percentiles": {"field": "latency","percents": [50,90,99]}},

"latency_late":{"avg": {"field": "latency"}}

}

{

"aggregations": {

"group_by_province": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "新疆",

"doc_count": 6,

"latency_late": {

"value": 314.5

"latency_percentiles": {

"values": {

"50.0": 288.5,

"90.0": 521.5,

"99.0": 640.75

}

{

"key": "江苏",

"doc_count": 6,

"latency_late": {

"value": 89.33333333333333

"latency_percentiles": {

"values": {

"50.0": 87.5,

"90.0": 108.5,

"99.0": 111.65

}

]

}

可以看出新僵的网比较慢，所以要对新疆作处理。

AggregationBuilders.percentiles 使用详解

java es api 这个聚合函数怎么理解啊？

AggregationBuilders.percentiles, 比如：

AggregationBuilders
        .percentiles("percent")
        .field("检测字段")
        .percentiles(50.0));

这意思是检测字段大于 50% 的数值？

ES percentiles 以及percentiles rank访问延时SLA统计

转自： Elasticsearch 之（28）percentiles 以及 percentiles rank网站访问时延SLA统计

需求

有一个网站，记录下了每次请求的访问的耗时，需要统计tp50，tp90，tp99

tp50：50%的请求的耗时最长在多长时间
tp90：90%的请求的耗时最长在多长时间
tp99：99%的请求的耗时最长在多长时间

设置索引，准备数据

PUT /website
{
    "mappings": {
        "logs": {
            "properties": {
                "latency": {
                    "type": "long"
                },
                "province": {
                    "type": "keyword"
                },
                "timestamp": {
                    "type": "date"
                }
            }
        }
    }
}
POST /website/logs/_bulk
{ "index": {}}
{ "latency" : 105, "province" : "江苏", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 83, "province" : "江苏", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 92, "province" : "江苏", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 112, "province" : "江苏", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 68, "province" : "江苏", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 76, "province" : "江苏", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 101, "province" : "新疆", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 275, "province" : "新疆", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 166, "province" : "新疆", "timestamp" : "2016-10-29" }
{ "index": {}}
{ "latency" : 654, "province" : "新疆", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 389, "province" : "新疆", "timestamp" : "2016-10-28" }
{ "index": {}}
{ "latency" : 302, "province" : "新疆", "timestamp" : "2016-10-29" }

pencentiles 语法

GET /website/logs/_search 
{
  "size": 0,
  "aggs": {
    "latency_percentiles": {
      "percentiles": {
        "field": "latency",
        "percents": [
          50,
          95,
          99
        ]
      }
    },
    "latency_avg": {
      "avg": {
        "field": "latency"
      }
    }
  }
}

{
  "took": 31,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 12,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "latency_avg": {
      "value": 201.91666666666666
    },
    "latency_percentiles": {
      "values": {
        "50.0": 108.5,
        "95.0": 508.24999999999983,
        "99.0": 624.8500000000001
      }
    }
  }
}

根据province（省份）分组精确查询

GET /website/logs/_search 
{
  "size": 0,
  "aggs": {
    "group_by_province": {
      "terms": {
        "field": "province"
      },
      "aggs": {
        "latency_percentiles": {
          "percentiles": {
            "field": "latency",
            "percents": [
              50,
              95,
              99
            ]
          }
        },
        "latency_avg": {
          "avg": {
            "field": "latency"
          }
        }
      }
    }
  }
}

{
  "took": 33,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 12,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_province": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "新疆",
          "doc_count": 6,
          "latency_avg": {
            "value": 314.5
          },
          "latency_percentiles": {
            "values": {
              "50.0": 288.5,
              "95.0": 587.75,
              "99.0": 640.75
            }
          }
        },
        {
          "key": "江苏",
          "doc_count": 6,
          "latency_avg": {
            "value": 89.33333333333333
          },
          "latency_percentiles": {
            "values": {
              "50.0": 87.5,
              "95.0": 110.25,
              "99.0": 111.65
            }
          }
        }
      ]
    }
  }
}

percentile ranks 语法

SLA：就是提供的服务的标准我们的网站的提供的访问延时的SLA，确保所有的请求100%，都必须在200ms以内，大公司内，一般都是要求100%在200ms以内。如果超过1s，则需要升级到A级故障，代表网站的访问性能和用户体验急剧下降需求：在200ms以内的，有百分之多少，在1000毫秒以内的有百分之多少，percentile ranks metric

percentile ranks，其实比pencentile还要常用按照品牌分组，计算，电视机，售价在1000占比，2000占比，3000占比

GET /website/logs/_search 
{
  "size": 0,
  "aggs": {
    "group_by_province": {
      "terms": {
        "field": "province"
      },
      "aggs": {
        "latency_percentile_ranks": {
          "percentile_ranks": {
            "field": "latency",
            "values": [
              200,
              1000
            ]
          }
        }
      }
    }
  }
}

{
  "took": 38,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 12,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_province": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "新疆",
          "doc_count": 6,
          "latency_percentile_ranks": {
            "values": {
              "200.0": 29.40613026819923,
              "1000.0": 100
            }
          }
        },
        {
          "key": "江苏",
          "doc_count": 6,
          "latency_percentile_ranks": {
            "values": {
              "200.0": 100,
              "1000.0": 100
            }
          }
        }
      ]
    }
  }
}

percentile的优化

和基数一样，计算百分位需要一个近似算法。 percentiles 使用一个 TDigest 算法，与 HyperLogLog 一样，不需要理解完整的技术细节，但有必要了解算法的特性：

百分位的准确度与百分位的极端程度相关，也就是说 1 或 99 的百分位要比 50 百分位要准确。这只是数据结构内部机制的一种特性，但这是一个好的特性，因为多数人只关心极端的百分位。
对于数值集合较小的情况，百分位非常准确。如果数据集足够小，百分位可能 100% 精确。
随着桶里数值的增长，算法会开始对百分位进行估算。它能有效在准确度和内存节省之间做出权衡。不准确的程度比较难以总结，因为它依赖于聚合时数据的分布以及数据量的大小。 compression (默认100) 限制节点数量最多 compression * 20 = 2000个node去计算越大，占用内存越多，越精准，性能越差

一个节点占用32字节，100 * 20 * 32 = 64KB 如果你想要percentile算法越精准，compression可以设置的越大

file_get_contents(url): failed to open stream的解决方案

www.2cto.com：摘自英文网站

提问者：

hello everyone;

I am having a php file that gets the contents from a URL, i am getting the failure message

Collapse | Copy CodeWarning: file_get_contents(http://xxxxxx): failed to open stream: HTTP request failed! in xxxx.php on line xx

i tried so many online solutions but its still not working. here is the code

Collapse | Copy Code $cryptpass = rawurlencode(crypt($pc[''pcpassword'']));

$url = "http://" . $pc[''pcname'']."/Reports/ReportList.php?&username={$pc[''pcusername'']}&cryptpass=$cryptpass&noredir=1";

$parsed_list = read_general_list($url, false);

Collapse | Copy Codefunction read_general_list($url, $make_assoc = false)

{

$compressed_data = file_get_contents($url);

}

$compressed_data is always null and it throws an error:

Warning: file_get_contents(http://xxxxxx): failed to open stream: HTTP request failed! in xxxx.php on line xx

Any suggestions please?

答复：

I am sorry, i never updated this question completely. May be if some one is still looking for an answer.

This has worked for me. This is the equivalent function for File_get_contents, but can handle large amount of data. I found this solution online.

Collapse | Copy Code function file_get_contents_curl($url) {

$ch = curl_init();

curl_setopt($ch, CURLOPT_HEADER, 0);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.

curl_setopt($ch, CURLOPT_URL, $url);

$data = curl_exec($ch);

curl_close($ch);

return $data;

}

第二个解答：

As the error message says, the stream (URL) requested cannot be opened. There are many possible reasons for this:

1. base URL is bad. $pc[''pcname'']

2. username and/or password are bad

3. username/password do not have permission on the server

4. Your system cannot reach the server (firewall, PHP permissions, ...)

4. ...

I would use the following strategy to debug:

1. Dump $url and write it down.

2. Use a browser with debug tools (eg Firefox/Firebug) and try to access that URL.

3. Look at the headers returned to see what error the server reports (if any).

4. Think about why that error is returned...

Cheers,

Peter

If this answers your question, vote and mark it accepted.

今天关于percentile_cont和percentile_disc都没有在PostgreSQL 9.6.3中计算所需的第75个百分位数的讲解已经结束，谢谢您的阅读，如果想了解更多关于50.percentiles百分比算法以及网站延时统计、AggregationBuilders.percentiles 使用详解、ES percentiles 以及percentiles rank访问延时SLA统计、file_get_contents(url): failed to open stream的解决方案的相关知识，请在本站搜索。

本文标签：