phoenix 报错：type org.apache.phoenix.schema.types.PhoenixArray is not supported（phoenix explain）

25-02-15 17

在本文中，我们将带你了解phoenix报错：typeorg.apache.phoenix.schema.types.PhoenixArrayisnotsupported在这篇文章中，我们将为您详细介绍

在本文中，我们将带你了解phoenix 报错：type org.apache.phoenix.schema.types.PhoenixArray is not supported在这篇文章中，我们将为您详细介绍phoenix 报错：type org.apache.phoenix.schema.types.PhoenixArray is not supported的方方面面，并解答phoenix explain常见的疑惑，同时我们还将给您一些技巧，以帮助您实现更有效的Apache Kylin 和 Phoenix的区别和性能对比、Apache Phoenix 4.10 发布，HBase 的 SQL 驱动、Apache Phoenix 4.11 发布，HBase 的 SQL 驱动、Apache Phoenix 4.13 发布，HBase 的 SQL 驱动。

本文目录一览：
phoenix 报错：type org.apache.phoenix.schema.types.PhoenixArray is not supported（phoenix explain）
Apache Kylin 和 Phoenix的区别和性能对比
Apache Phoenix 4.10 发布，HBase 的 SQL 驱动
Apache Phoenix 4.11 发布，HBase 的 SQL 驱动
Apache Phoenix 4.13 发布，HBase 的 SQL 驱动
phoenix 报错：type org.apache.phoenix.schema.types.PhoenixArray is not supported（phoenix explain）
今天用phoenix报如下错误：

主要原因：

　　hbase的表中某字段类型是array，phoenix目前不支持此类型

解决方法：

复制替换phoenix包的cursor文件

# Copyright 2015 Lukas Lalinsky # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import logging,re import collections from phoenixdb.types import TypeHelper from phoenixdb.errors import OperationalError, NotSupportedError, ProgrammingError, InternalError from phoenixdb.calcite import common_pb2 __all__ = [''Cursor'', ''ColumnDescription'', ''DictCursor''] logger = logging.getLogger(__name__) # TODO see note in Cursor.rowcount() MAX_INT = 2 ** 64 - 1 ColumnDescription = collections.namedtuple(''ColumnDescription'', ''name type_code display_size internal_size precision scale null_ok'') """Named tuple for representing results from :attr:`Cursor.description`.""" class Cursor(object): """Database cursor for executing queries and iterating over results. You should not construct this object manually, use :meth:`Connection.cursor() <phoenixdb.connection.Connection.cursor>` instead. """ arraysize = 1 """ Read/write attribute specifying the number of rows to fetch at a time with :meth:`fetchmany`. It defaults to 1 meaning to fetch a single row at a time. """ itersize = 2000 """ Read/write attribute specifying the number of rows to fetch from the backend at each network roundtrip during iteration on the cursor. The default is 2000. """ def __init__(self, connection, id=None): self._connection = connection self._id = id self._signature = None self._column_data_types = [] self._frame = None self._pos = None self._closed = False self.arraysize = self.__class__.arraysize self.itersize = self.__class__.itersize self._updatecount = -1 def __del__(self): if not self._connection._closed and not self._closed: self.close() def __enter__(self): return self def __exit__(self, exc_type, exc_value, traceback): if not self._closed: self.close() def __iter__(self): return self def __next__(self): row = self.fetchone() if row is None: raise StopIteration return row next = __next__ def close(self): """Closes the cursor. No further operations are allowed once the cursor is closed. If the cursor is used in a ``with`` statement, this method will be automatically called at the end of the ``with`` block. """ if self._closed: raise ProgrammingError(''the cursor is already closed'') if self._id is not None: self._connection._client.close_statement(self._connection._id, self._id) self._id = None self._signature = None self._column_data_types = [] self._frame = None self._pos = None self._closed = True @property def closed(self): """Read-only attribute specifying if the cursor is closed or not.""" return self._closed @property def description(self): if self._signature is None: return None description = [] for column in self._signature.columns: description.append(ColumnDescription( column.column_name, column.type.name, column.display_size, None, column.precision, column.scale, None if column.nullable == 2 else bool(column.nullable), )) return description def _set_id(self, id): if self._id is not None and self._id != id: self._connection._client.close_statement(self._connection._id, self._id) self._id = id def _set_signature(self, signature): self._signature = signature self._column_data_types = [] self._parameter_data_types = [] if signature is None: return for column in signature.columns: dtype = TypeHelper.from_class(column.column_class_name) self._column_data_types.append(dtype) for parameter in signature.parameters: dtype = TypeHelper.from_class(parameter.class_name) self._parameter_data_types.append(dtype) def _set_frame(self, frame): self._frame = frame self._pos = None if frame is not None: if frame.rows: self._pos = 0 elif not frame.done: raise InternalError(''got an empty frame, but the statement is not done yet'') def _fetch_next_frame(self): offset = self._frame.offset + len(self._frame.rows) frame = self._connection._client.fetch(self._connection._id, self._id, offset=offset, frame_max_size=self.itersize) self._set_frame(frame) def _process_results(self, results): if results: result = results[0] if result.own_statement: self._set_id(result.statement_id) self._set_signature(result.signature if result.HasField(''signature'') else None) self._set_frame(result.first_frame if result.HasField(''first_frame'') else None) self._updatecount = result.update_count def _transform_parameters(self, parameters): typed_parameters = [] for value, data_type in zip(parameters, self._parameter_data_types): field_name, rep, mutate_to, cast_from = data_type typed_value = common_pb2.TypedValue() if value is None: typed_value.null = True typed_value.type = common_pb2.NULL else: typed_value.null = False # use the mutator function if mutate_to is not None: value = mutate_to(value) typed_value.type = rep setattr(typed_value, field_name, value) typed_parameters.append(typed_value) return typed_parameters def execute(self, operation, parameters=None): if self._closed: raise ProgrammingError(''the cursor is already closed'') self._updatecount = -1 self._set_frame(None) if parameters is None: if self._id is None: self._set_id(self._connection._client.create_statement(self._connection._id)) results = self._connection._client.prepare_and_execute(self._connection._id, self._id, operation, first_frame_max_size=self.itersize) self._process_results(results) else: statement = self._connection._client.prepare(self._connection._id, operation) self._set_id(statement.id) self._set_signature(statement.signature) results = self._connection._client.execute(self._connection._id, self._id, statement.signature, self._transform_parameters(parameters), first_frame_max_size=self.itersize) self._process_results(results) def executemany(self, operation, seq_of_parameters): if self._closed: raise ProgrammingError(''the cursor is already closed'') self._updatecount = -1 self._set_frame(None) statement = self._connection._client.prepare(self._connection._id, operation, max_rows_total=0) self._set_id(statement.id) self._set_signature(statement.signature) for parameters in seq_of_parameters: self._connection._client.execute(self._connection._id, self._id, statement.signature, self._transform_parameters(parameters), first_frame_max_size=0) def _transform_row(self, row): """Transforms a Row into Python values. :param row: A ``common_pb2.Row`` object. :returns: A list of values casted into the correct Python types. :raises: NotImplementedError """ tmp_row = [] for i, column in enumerate(row.value): if column.has_array_value: # 修改的地方=============== column_value = str(column.value) if ''INTEGER'' in column_value: pattern = ''(\d+)'' elif ''string_value'' in column_value: pattern = ''string_value: "(.+)"'' else: raise NotImplementedError(''array types are not supported'') value = re.findall(pattern, str(column.value)) tmp_row.append(value) # ========================= elif column.scalar_value.null: tmp_row.append(None) else: field_name, rep, mutate_to, cast_from = self._column_data_types[i] # get the value from the field_name value = getattr(column.scalar_value, field_name) # cast the value if cast_from is not None: value = cast_from(value) tmp_row.append(value) return tmp_row def fetchone(self): if self._frame is None: raise ProgrammingError(''no select statement was executed'') if self._pos is None: return None rows = self._frame.rows row = self._transform_row(rows[self._pos]) self._pos += 1 if self._pos >= len(rows): self._pos = None if not self._frame.done: self._fetch_next_frame() return row def fetchmany(self, size=None): if size is None: size = self.arraysize rows = [] while size > 0: row = self.fetchone() if row is None: break rows.append(row) size -= 1 return rows def fetchall(self): rows = [] while True: row = self.fetchone() if row is None: break rows.append(row) return rows def setinputsizes(self, sizes): pass def setoutputsize(self, size, column=None): pass @property def connection(self): """Read-only attribute providing access to the :class:`Connection <phoenixdb.connection.Connection>` object this cursor was created from.""" return self._connection @property def rowcount(self): """Read-only attribute specifying the number of rows affected by the last executed DML statement or -1 if the number cannot be determined. Note that this will always be set to -1 for select queries.""" # TODO instead of -1, this ends up being set to Integer.MAX_VALUE if self._updatecount == MAX_INT: return -1 return self._updatecount @property def rownumber(self): """Read-only attribute providing the current 0-based index of the cursor in the result set or ``None`` if the index cannot be determined. The index can be seen as index of the cursor in a sequence (the result set). The next fetch operation will fetch the row indexed by :attr:`rownumber` in that sequence. """ if self._frame is not None and self._pos is not None: return self._frame.offset + self._pos return self._pos class DictCursor(Cursor): """A cursor which returns results as a dictionary""" def _transform_row(self, row): row = super(DictCursor, self)._transform_row(row) d = {} for ind, val in enumerate(row): d[self._signature.columns[ind].column_name] = val return d

复制替换phoenix包下的types.py文件

# Copyright 2015 Lukas Lalinsky # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import sys import time import datetime from decimal import Decimal from phoenixdb.calcite import common_pb2 __all__ = [ ''Date'', ''Time'', ''Timestamp'', ''DateFromTicks'', ''TimeFromTicks'', ''TimestampFromTicks'', ''Binary'', ''STRING'', ''BINARY'', ''NUMBER'', ''DATETIME'', ''ROWID'', ''BOOLEAN'', ''JAVA_CLASSES'', ''JAVA_CLASSES_MAP'', ''TypeHelper'', ''PhoenixArray'' ] def PhoenixArray(value): print(value) return value def Date(year, month, day): """Constructs an object holding a date value.""" return datetime.date(year, month, day) def Time(hour, minute, second): """Constructs an object holding a time value.""" return datetime.time(hour, minute, second) def Timestamp(year, month, day, hour, minute, second): """Constructs an object holding a datetime/timestamp value.""" return datetime.datetime(year, month, day, hour, minute, second) def DateFromTicks(ticks): """Constructs an object holding a date value from the given UNIX timestamp.""" return Date(*time.localtime(ticks)[:3]) def TimeFromTicks(ticks): """Constructs an object holding a time value from the given UNIX timestamp.""" return Time(*time.localtime(ticks)[3:6]) def TimestampFromTicks(ticks): """Constructs an object holding a datetime/timestamp value from the given UNIX timestamp.""" return Timestamp(*time.localtime(ticks)[:6]) def Binary(value): """Constructs an object capable of holding a binary (long) string value.""" return bytes(value) def time_from_java_sql_time(n): dt = datetime.datetime(1970, 1, 1) + datetime.timedelta(milliseconds=n) return dt.time() def time_to_java_sql_time(t): return ((t.hour * 60 + t.minute) * 60 + t.second) * 1000 + t.microsecond // 1000 def date_from_java_sql_date(n): return datetime.date(1970, 1, 1) + datetime.timedelta(days=n) def date_to_java_sql_date(d): if isinstance(d, datetime.datetime): d = d.date() td = d - datetime.date(1970, 1, 1) return td.days def datetime_from_java_sql_timestamp(n): return datetime.datetime(1970, 1, 1) + datetime.timedelta(milliseconds=n) def datetime_to_java_sql_timestamp(d): td = d - datetime.datetime(1970, 1, 1) return td.microseconds // 1000 + (td.seconds + td.days * 24 * 3600) * 1000 class ColumnType(object): def __init__(self, eq_types): self.eq_types = tuple(eq_types) self.eq_types_set = set(eq_types) def __eq__(self, other): return other in self.eq_types_set def __cmp__(self, other): if other in self.eq_types_set: return 0 if other < self.eq_types: return 1 else: return -1 STRING = ColumnType([''VARCHAR'', ''CHAR'']) """Type object that can be used to describe string-based columns.""" BINARY = ColumnType([''BINARY'', ''VARBINARY'']) """Type object that can be used to describe (long) binary columns.""" NUMBER = ColumnType([''INTEGER'', ''UNSIGNED_INT'', ''BIGINT'', ''UNSIGNED_LONG'', ''TINYINT'', ''UNSIGNED_TINYINT'', ''SMALLINT'', ''UNSIGNED_SMALLINT'', ''FLOAT'', ''UNSIGNED_FLOAT'', ''DOUBLE'', ''UNSIGNED_DOUBLE'', ''DECIMAL'']) """Type object that can be used to describe numeric columns.""" DATETIME = ColumnType([''TIME'', ''DATE'', ''TIMESTAMP'', ''UNSIGNED_TIME'', ''UNSIGNED_DATE'', ''UNSIGNED_TIMESTAMP'']) """Type object that can be used to describe date/time columns.""" ROWID = ColumnType([]) """Only implemented for DB API 2.0 compatibility, not used.""" BOOLEAN = ColumnType([''BOOLEAN'']) """Type object that can be used to describe boolean columns. This is a phoenixdb-specific extension.""" # XXX ARRAY JAVA_CLASSES = { ''bool_value'': [ (''java.lang.Boolean'', common_pb2.BOOLEAN, None, None), ], ''string_value'': [ (''java.lang.Character'', common_pb2.CHARACTER, None, None), (''java.lang.String'', common_pb2.STRING, None, None), (''java.math.BigDecimal'', common_pb2.BIG_DECIMAL, str, Decimal), (''java.sql.Array'', common_pb2.ARRAY, None, None), ], ''number_value'': [ (''java.lang.Integer'', common_pb2.INTEGER, None, int), (''java.lang.Short'', common_pb2.SHORT, None, int), (''java.lang.Long'', common_pb2.LONG, None, long if sys.version_info[0] < 3 else int), (''java.lang.Byte'', common_pb2.BYTE, None, int), (''java.sql.Time'', common_pb2.JAVA_SQL_TIME, time_to_java_sql_time, time_from_java_sql_time), (''java.sql.Date'', common_pb2.JAVA_SQL_DATE, date_to_java_sql_date, date_from_java_sql_date), (''java.sql.Timestamp'', common_pb2.JAVA_SQL_TIMESTAMP, datetime_to_java_sql_timestamp, datetime_from_java_sql_timestamp), ], ''bytes_value'': [ (''[B'', common_pb2.BYTE_STRING, Binary, None), ], ''double_value'': [ # if common_pb2.FLOAT is used, incorrect values are sent (''java.lang.Float'', common_pb2.DOUBLE, float, float), (''java.lang.Double'', common_pb2.DOUBLE, float, float), ] } """Groups of Java classes.""" JAVA_CLASSES_MAP = dict( (v[0], (k, v[1], v[2], v[3])) for k in JAVA_CLASSES for v in JAVA_CLASSES[k] ) """Flips the available types to allow for faster lookup by Java class. This mapping should be structured as: { ''java.math.BigDecimal'': (''string_value'', common_pb2.BIG_DECIMAL, str, Decimal),), ... ''<java class>'': (<field_name>, <Rep enum>, <mutate_to function>, <cast_from function>), } """ class TypeHelper(object): @staticmethod def from_class(klass): """Retrieves a Rep and functions to cast to/from based on the Java class. :param klass: The string of the Java class for the column or parameter. :returns: tuple ``(field_name, rep, mutate_to, cast_from)`` WHERE ``field_name`` is the attribute in ``common_pb2.TypedValue`` ``rep`` is the common_pb2.Rep enum ``mutate_to`` is the function to cast values into Phoenix values, if any ``cast_from`` is the function to cast from the Phoenix value to the Python value, if any :raises: NotImplementedError """ if klass == ''org.apache.phoenix.schema.types.PhoenixArray'': klass = "java.sql.Array" if klass not in JAVA_CLASSES_MAP: raise NotImplementedError(''type {} is not supported''.format(klass)) return JAVA_CLASSES_MAP[klass]

改动后只支持 array里面的值是int、string例如 array(1,2,3),array(''12'',''a'',''b'')
Apache Kylin 和 Phoenix的区别和性能对比
Apache Kylin 和 Apache Phoenix 都能使用 Apache HBase 做数据存储和查询，那么，同为 HBase 上的 SQL 引擎，它们之间有什么不同呢？下面我们将从这两个项目的介绍开始为大家做个深度解读和比较。

1.Apache Kylin 和 Apache Phoenix介绍

1.1 Apache Kylin

Kylin 是一个分布式的大数据分析引擎，提供在 Hadoop 之上的 SQL 接口和多维分析能力（OLAP），可以做到在 TB 级的数据量上实现亚秒级的查询响应。

上图是 Kylin 的架构图，从图中可以看出，Kylin 利用 MapReduce/Spark 将原始数据进行聚合计算，转成了 OLAP Cube 并加载到 HBase 中，以 Key-Value 的形式存储。Cube 按照时间范围划分为多个 segment，每个 segment 是一张 HBase 表，每张表会根据数据大小切分成多个 region。Kylin 选择 HBase 作为存储引擎，是因为 HBase 具有延迟低，容量大，使用广泛，API完备等特性，此外它的 Hadoop 接口完善，用户社区也十分活跃。

1.2 Apache Phoenix

Phoenix 是一个 Hadoop 上的 OLTP 和业务数据分析引擎，为用户提供操作 HBase 的 SQL 接口，结合了具有完整 ACID 事务功能的标准 SQL 和 JDBC API，以及来自 NoSQL 的后期绑定，具有读取模式灵活的优点。

下图为 Phoenix 的架构图，从图中可以看出，Phoenix 分为 client 和 server，其中 client 又分为 thin（本质上是一个 JDBC 驱动，所依赖的第三方类较少）和非 thin （所依赖的第三方类较多）两种；server 是针对 thin client 而言的，为 standalone 模式，是由一台 Java 服务器组成，代表客户端管理 Phoenix 的连接，可以进行横向扩展，启动方式也很简单，通过 bin/queryserver.py start 即可。

2.Apache Kylin 和 Apache Phoenix 对比

2.1优缺点对比

Kylin 的优点主要有以下几点：

1. 支持雪花/星型模型；

2. 亚秒级查询响应；

3. 支持 ANSI-SQL，可通过 ODBC，JDBC 以及 RESTful API 进行访问；

4. 支持百亿、千亿甚至万亿级别交互式分析；

5. 无缝与 BI 工具集成；

6. 支持增量刷新；

7. 既支持历史数据也支持流式数据；

8. 易用的管理页面和 API。

Phoenix 的优点则主要是以下几点：

1. 支持明细和聚合查询；

2. 支持 insert, update, delete 操作，其使用 upsert 来代替 insert 和 update；

3. 较好的利用 HBase 的优点，如 row timestamp，将其与 HBase 原生的 row timestamp 映射起来，有助于 Phoenix 利用 HBase 针对存储文件的时间范围提供的多种优化和 Phoenix 内置的各式各样的查询优化；

4. 支持多种函数：聚合、String、时间和日期、数字、数组、数学和其它函数；

5. 支持具有完整 ACID 语义的跨行及跨表事务；

6. 支持多租户；

7. 支持索引（二级索引），游标。

当然，Kylin 和 Phoenix 也都有一些还有待提升的不足之处。Kylin 的不足主要是体现在首先由于 Kylin 是一个分析引擎，只读，不支持 insert，update，delete 等 SQL 操作，用户修改数据的话需要重新批量导入（构建）；其次，Kylin 用户需要预先建立模型后加载数据到 Cube 后才可进行查询；最后，使用 Kylin 的建模人员需要了解一定的数据仓库知识。

Phoenix 的不足则主要体现在：首先，其二级索引的使用有一定的限制，只有当查询中所有的列都在索引或覆盖索引中才生效且成本较高，在使用之前还需配置；其次，范围扫描的使用有一定的限制，只有当使用了不少于一个在主键约束中的先导列时才生效；最后，创建表时必须包含主键，对别名支持不友好。

2.2  phoenix和 Kylin存储格式对比

Kylin 将数据列区分成维度和度量：维度的顺序与 HBase 中的 Rowkey 建立关系从而将 Cube 数据存储，维度的值会被编码为字节，然后多个维度的值被拼接在一起组成 Rowkey，Rowkey 的格式为 Shard ID（2 字节）+ Cuboid ID（8 字节，标记有哪几个列）+ 维度值；度量的值会被序列化为字节数组，然后以 column 的方式存储；多个度量值可以放在同一个列簇中，也可以放在不同列簇中。如下图所示：

Phoenix 在列名与 HBase 列限定符之间引入了一个间接层，将 HBase 非关系型形式转换成关系型数据模型，在创建表时默认会将 PK 与 HBase 中表的 Rowkey 映射起来，PK 支持多字段组合，剩下的列可以根据需求进行选择，列簇如果未显式定义，则会被忽略，Qualifier 会转换成表的字段名。如下图所示：

2.3  phoenix和 Kylin查询方式对比

Kylin 查询时会将 SQL 通过 Apache Calcite 进行解析和优化，转化成对 HBase 的 RPC 访问。Kylin 会将计算逻辑下压到 HBase Region Server 中使用 Coprocessor 并行运行，每个 RS 返回过滤聚合后的数据给 Kylin 节点，Kylin 做最后的处理后返回给客户端。因为大量的计算在 Cube 生成的时候已经完成，因此 Kylin 的查询效率非常高，通常在毫秒到秒级。

Kylin 在 Insight 页面提供 SQL 查询窗口；也能够通过 REST API 发送请求的方式进行查询；还能够快速的与其他 BI 工具集成并使用 BI 工具自带的方式进行查询。

Phoenix 直接使用 HBase API，以及协处理器和自定义过滤器，从而使得查询的效率更好。对于查询，Phoenix 可以根据 region 的边界进行分块并在客户端并行运行以减少延迟。聚合操作将在服务器端的协处理器中完成（这点与 Kylin 类似），返回到客户端的数据量是进行过压缩的，而不是全部返回。

Phoenix 是通过命令行的方式进行查询（既可以输入单条 SQL 语句，也可以执行 SQL 文件）；也可以通过界面进行查询，但需额外安装 Squirrel。

2.4  phoenix和 Kylin查询优化方式对比

Kylin 查询优化方法比较多样，既有逻辑层的维度减枝优化（层级，必须，联合，推导等），编码优化，rowkey 优化等，也有存储层的优化，如按某个维度切 shard，region 大小划分优化，segment 自动合并等，具体可以参考 Kylin 的文档。用户可以根据自己的数据特征、性能需求使用不同的策略，从而在空间和时间之间找到一个平衡点。

为了使得查询效率更高，Phoenix 可以在表上加索引，不同的索引有不同的适用场景：全局索引适用于大量读取的场景，且要求查询中引用的所有列都包含在索引中；本地索引适用于大量写入，空间有限的场景。索引会将数据的值进行拷贝，额外增加了开销，且使用二级索引还需在 HBase 的配置文件中进行相应配置。数据总不会是完美分布的，HBase 顺序写入时（行键单调递增）可能会导致热点问题，这时可以通过加盐操作来解决，Phoenix 可以为 key 自动加盐。

从上述内容可以看出：

1）Kylin 和 Phoenix 虽然同为 Hadoop/HBase 上的 SQL 引擎，两者的定位不同，一个是 OLAP，另一个是 OLTP，服务于不同的场景；

2）Phoenix 更多的是适用于以往关系型数据库的相关操作，当查询语句是点查找和小范围扫描时，Phoenix 可以比较好地满足，而它不太适合大量 scan 类型的 OLAP 查询，或查询的模式较为灵活的场景；

3）Kylin 是一个只读型的分析引擎，不适合细粒度修改数据，但适合做海量数据的交互式在线分析，通常跟数据仓库以及 BI 工具结合使用，目标用户为业务分析人员。

下面我们做一个简单的性能测试，因为 Kylin 不支持数据写入，因此我们不得不测试数据的查询性能，使用相同 HBase 集群和数据集。

2.5 phoenix和 Kylin性能对比

我们准备的测试环境为 CDH 5.15.1，1个 Master，7个 Region Server，每个节点 8 核心 58G 内存，使用 Star Schema Benchmark 数据进行测试。其中单表 Lineorder 表数据量为 3 千万，大小为 8.70 GB。Phoenix 导入时间: 7mins 9sec，Kylin 导入时间: 32mins 8sec。多表 Lineorder 数据量 750 万，大小为 10 GB。具体的 SQL 语句参见：

单表的sql 1、select lo_custkey, sum(lo_revenue) from lineorder group by lo_custkey 2、select count(*) from lineorder 3、select LINEORDER.LO_ORDERDATE, sum(LINEORDER.LO_REVENUE) as sum_lo_revenue from lineorder where LINEORDER.LO_ORDERDATE > 19960105 and LINEORDER.LO_ORDERDATE < 19960305 group by LINEORDER.LO_ORDERDATE 4、select LINEORDER.LO_CUSTKEY, LINEORDER.LO_ORDERDATE, sum(LINEORDER.LO_REVENUE) as sum_lo_revenue from lineorder where LINEORDER.LO_ORDERDATE > 19960105 and LINEORDER.LO_ORDERDATE < 19960305 group by LINEORDER.LO_ORDERDATE, LINEORDER.LO_CUSTKEY having sum_lo_revenue > 55000000 order by sum_lo_revenue desc 多表的sql: 1、select sum(lo_revenue) as revenue       from lineorder       left join dates on lo_orderdate = d_datekey       where d_year = 1993       and lo_discount between 1 and 3       and lo_quantity < 25;        2、select sum(lo_revenue) as lo_revenue, d_year, p_brand       from lineorder       left join dates on lo_orderdate = d_datekey       left join part on lo_partkey = p_partkey       left join supplier on lo_suppkey = s_suppkey       where p_brand between ''MFGR#2221'' and ''MFGR#2228'' and s_region = ''ASIA''       group by d_year, p_brand       order by d_year, p_brand;

图5 单表对比图

图 5 是一个单表查询场景的分析，从上我们可以看出，针对于一张表的查询，Phoenix 查询的耗时是 Kylin 的几十甚至是几百倍，加入索引后，Phoenix 的查询速度有了较为显著的提升，但仍然是 Kylin 的十几倍甚至几十倍，因此单表查询 Kylin 具有明显优势。

图6 多表对比图

图6是一个多表 join 查询的场景，从上图可以看出，对于多表 join 的情况，Kylin 查询依旧非常快，因为 join 在 Cube 构建阶段已经完成了；Phoenix 加入索引后时间并没有较为显著的减少，耗时仍然是 Kylin 的几十倍甚至几百倍。
Apache Phoenix 4.10 发布，HBase 的 SQL 驱动
Apache Phoenix 4.10 发布了，Apache Phoenix 是 HBase 的 SQL 驱动。Phoenix 使得 HBase 支持通过 JDBC 的方式进行访问，并将你的 SQL 查询转成 HBase 的扫描和相应的动作。

4.x 版本与 HBase 0.98/1.1/1.2 兼容。

本次发布值得关注的更新：

通过列编码减少磁盘占用空间，并优化只写入一次的数据的存储格式

在 Phoenix/Spark 集成中支持 Apache Spark 2.0

通过 Phoenix 消耗 Apache Kafka 消息

通过跨集群分布执行来提高 UPSERT SELECT 性能

改进 Hive 集成

40+ bug 修复

下载地址 和 发布主页
Apache Phoenix 4.11 发布，HBase 的 SQL 驱动
Apache Phoenix 4.11 发布了，Apache Phoenix 是 HBase 的 SQL 驱动。Phoenix 使得 HBase 支持通过 JDBC 的方式进行访问，并将你的 SQL 查询转成 HBase 的扫描和相应的动作。

更新内容：

Support for HBase 1.3.1 and above

Local index hardening and performance improvements [1]

Atomic update of data and local index rows (HBase 1.3 only) [2]

Use of snapshots for MR-based queries and async index building [3][4]

Support for forward moving cursors [5]

Chunk commit data from client based on byte size and row count [6]

Reduce load on region server hosting SYSTEM.CATALOG [7]

50+ bug fixes [8]

下载地址：

http://phoenix.apache.org/download.html

Apache Phoenix 4.13 发布，HBase 的 SQL 驱动
Apache Phoenix 4.13 已发布，Apache Phoenix 是 HBase 的 SQL 驱动。Phoenix 使得 HBase 支持通过 JDBC 的方式进行访问，并将你的 SQL 查询转成 HBase 的扫描和相应的动作。

Phoenix 4.x 与 HBase 0.98 和 1.3 兼容。

更新亮点：

修复在连接时创建 SYSTEM.CATALOG 快照的 bug

关于行删除处理的大量错误修复

统计收集改进

新增 COLLATION_KEY 函数

详情可查阅发行说明

下载地址：

http://phoenix.apache.org/download.html
关于phoenix 报错：type org.apache.phoenix.schema.types.PhoenixArray is not supported和phoenix explain的介绍已经告一段落，感谢您的耐心阅读，如果想了解更多关于Apache Kylin 和 Phoenix的区别和性能对比、Apache Phoenix 4.10 发布，HBase 的 SQL 驱动、Apache Phoenix 4.11 发布，HBase 的 SQL 驱动、Apache Phoenix 4.13 发布，HBase 的 SQL 驱动的相关信息，请在本站寻找。

本文标签：

phoenix报错

type org.apache.phoenix.schema.types.PhoenixArray

not supported

phoenix explain