GVKun编程网logo

mysql 子查询 EXISTS(mysql 子查询优化)

3

如果您对mysql子查询EXISTS感兴趣,那么本文将是一篇不错的选择,我们将为您详在本文中,您将会了解到关于mysql子查询EXISTS的详细内容,我们还将为您解答mysql子查询优化的相关问题,并

如果您对mysql 子查询 EXISTS感兴趣,那么本文将是一篇不错的选择,我们将为您详在本文中,您将会了解到关于mysql 子查询 EXISTS的详细内容,我们还将为您解答mysql 子查询优化的相关问题,并且为您提供关于C# 中当程序的访问权限不足时,Directory.Exists 和 File.Exists 方法不会抛出异常报错、EXISTS (SELECT 1 ...) vs EXISTS (SELECT * ...) 一个还是另一个?、EXISTS和 NOT EXISTS 子查询 (高级查询 二)、Hive_LEFT SEMI JOIN / LEFT OUTER JOIN 与 (IN / NOT IN), (EXISTS / NOT EXISTS ) 分析的有价值信息。

本文目录一览:

mysql 子查询 EXISTS(mysql 子查询优化)

mysql 子查询 EXISTS(mysql 子查询优化)

子查询又叫嵌套查询

子查询的select 语句不能使用order by子句,order by不要只能对最终查询结果排序。

1.带IN的子查询

select * from emp where dep_id in (select id from dept id); 在子查询中的order by id排序 对最后结果无影响。

2.带ANY或ALL的子查询2

select salary from emp where id in(2,5);

select * from emp where salary <ANY (select salary from emp where id in(2,5));

<ANY 子查询(匹配任一返回真) :表示条件满足 小于 子查询中任何一个值 就会返回emp的一条记录,相当于筛选出小于子查询最大值的记录。

select * from emp where salary <ALL (select salary from emp where id in(2,5));

<ALL 子查询(匹配所有返回真) :表示条件满足 小于 子查询中所有的值 才会返回emp的一条记录,相当于筛选出子查询最小值的记录。

其他操作符功能类似(=,<>,<,>,<=,>=)。

以上用于对子查询出的集合值不明确,且子查询的值不多的情况下,不用直接获取最值而交给数据库匹配的方法。

3.EXISTS

select * from emp where EXISTS (select id from dept where dept.id=emp.dep_id);

用法: exists后面一定是子查询语句,不能用(值1,值2)代替;where exists (查询),结构中没有列;exists后面的子查询不返回任何实际数据,只返回真或假,当返回真时 where条件成立,该条记录保留。

exists (查询),只要子查询不会空  则where条件就返回真。

 

C# 中当程序的访问权限不足时,Directory.Exists 和 File.Exists 方法不会抛出异常报错

C# 中当程序的访问权限不足时,Directory.Exists 和 File.Exists 方法不会抛出异常报错

有些时候,我们开发的 C# 应用程序的执行账号,可能没有对一些文件夹和文件的访问权限,当我们使用 Directory.Exists 和 File.Exists 方法去判断这些文件夹和文件是否存在的时候,Directory.Exists 和 File.Exists 方法并不会抛出异常报错,这两个方法会返回 false,表示查找的文件夹和文件不存在。尽管文件夹和文件实际上是存在的,只是 C# 程序的执行账号没有权限访问而已,但是 Directory.Exists 和 File.Exists 方法还是会返回 false,并不会抛出异常报错。

 

以下是 MSDN 对 Directory.Exists 和 File.Exists 方法的解释,其中也提到了权限不足的问题:

Directory.Exists

Returns
true if path refers to an existing directory; false if the directory does not exist or an error occurs when trying to determine if the specified directory exists.

 

File.Exists

Returns
true if the caller has the required permissions and path contains the name of an existing file; otherwise, false. This method also returns false if path is null, an invalid path, or a zero-length string. If the caller does not have sufficient permissions to read the specified file, no exception is thrown and the method returns false regardless of the existence of path.

 

 

参考链接

Directory.Exists
File.Exists

 

EXISTS (SELECT 1 ...) vs EXISTS (SELECT * ...) 一个还是另一个?

EXISTS (SELECT 1 ...) vs EXISTS (SELECT * ...) 一个还是另一个?

每当我需要检查表中是否存在某行时,我倾向于始终编写如下条件:

SELECT a,b,c
  FROM a_table
 WHERE EXISTS
       (SELECT *  -- This is what I normally write
          FROM another_table
         WHERE another_table.b = a_table.b
       )

其他一些人这样写:

SELECT a,c
  FROM a_table
 WHERE EXISTS
       (SELECT 1   --- This nice '1' is what I have seen other people use
          FROM another_table
         WHERE another_table.b = a_table.b
       )

当条件NOT EXISTS不是EXISTS: 在某些情况下,我可能会用 aLEFT JOIN和一个额外的条件(有时称为antijoin)来编写它:

SELECT a,c
  FROM a_table
       LEFT JOIN another_table ON another_table.b = a_table.b
 WHERE another_table.primary_key IS NULL

我尽量避免使用它,因为我认为含义不太清楚,特别是当您primary_key的内容不那么明显时,或者当您的主键或连接条件是多列时(您很容易忘记其中一列)。但是,有时您维护由其他人编写的代码......它就在那里。

  1. 有什么区别(除了风格)来SELECT 1代替SELECT *吗?
    有没有表现不同的极端情况?
  2. 虽然我写的是(AFAIK)标准SQL:不同的数据库/旧版本有这样的区别吗?
  3. 明确写反加入有什么好处吗?
    当代计划者/优化者是否将其与NOT EXISTS条款区别对待?

EXISTS和 NOT EXISTS 子查询 (高级查询 二)

EXISTS和 NOT EXISTS 子查询 (高级查询 二)

子查询:嵌入到另一个查询语句之中的查询语句

子查询注意事项:

1.子查询可以嵌套在sql语句中任何表达式出现的位置

2.只出现在子查询中没有出现在父查询中的表不能包含在输出列中

 

-----学生表
DROP TABLE IF EXISTS student;
CREATE TABLE `student`(
`studentNo` INT(4) NOT NULL COMMENT ''学号'',
`loginPwd` VARCHAR(20) NOT NULL COMMENT ''密码'',
`studentName` VARCHAR(50) NOT NULL COMMENT ''学生姓名'',
`sex` CHAR(2) DEFAULT ''男'' NOT NULL COMMENT ''性别'',
`gradeId` INT(4) UNSIGNED COMMENT ''年级编号'',
`phone` VARCHAR(50) COMMENT ''联系电话'',
`address` VARCHAR(255) COMMENT ''地址'',
`bornDate` DATETIME COMMENT ''出生时间'',
`email` VARCHAR(50) COMMENT''邮件账号'',
identityCard VARCHAR(18) COMMENT''身份证号码'',
PRIMARY KEY(`studentNo`)
);

-----年级表
DROP TABLE IF EXISTS grade;
CREATE TABLE `grade`(
gradeID INT(4) NOT NULL COMMENT ''年级编号'',
gradeName VARCHAR(50) NOT NULL COMMENT''年级名称''
);

-----科目表
DROP TABLE IF EXISTS subject;
CREATE TABLE `subject`(
subjectNo INT(4) NOT NULL COMMENT ''课程编号'' PRIMARY KEY, #主键标识列 自增1
subjectName VARCHAR(50) COMMENT ''课程名称'',
classHour INT(4) COMMENT ''学时'',
gradeID INT(4) COMMENT ''年级编号''

);

-----成绩表
DROP TABLE IF EXISTS result;
CREATE TABLE `result`(
`studentNo` INT(4) NOT NULL COMMENT ''学号'',
`subjectNo` INT(4) NOT NULL COMMENT ''课程编号'',
`examDate` DATETIME NOT NULL COMMENT ''考试日期'',
`studentResult` INT(4) NOT NULL COMMENT ''考试成绩''
);

-----插入年级表
INSERT INTO `grade` VALUES (''1'', ''S1'');
INSERT INTO `grade` VALUES (''2'', ''S2'');
INSERT INTO `grade` VALUES (''3'', ''Y2'');

-----插入成绩表
INSERT INTO `result` VALUES (''10000'', ''1'', ''2016-02-15 00:00:00'', ''71'');
INSERT INTO `result` VALUES (''10000'', ''1'', ''2016-02-17 00:00:00'', ''60'');
INSERT INTO `result` VALUES (''10001'', ''1'', ''2016-02-17 00:00:00'', ''46'');
INSERT INTO `result` VALUES (''10002'', ''1'', ''2016-02-17 00:00:00'', ''83'');
INSERT INTO `result` VALUES (''10003'', ''1'', ''2016-02-17 00:00:00'', ''60'');
INSERT INTO `result` VALUES (''10004'', ''1'', ''2016-02-17 00:00:00'', ''60'');
INSERT INTO `result` VALUES (''10005'', ''1'', ''2016-02-17 00:00:00'', ''95'');
INSERT INTO `result` VALUES (''10006'', ''1'', ''2016-02-17 00:00:00'', ''93'');
INSERT INTO `result` VALUES (''10007'', ''1'', ''2016-02-17 00:00:00'', ''23'');

----插入学生表

INSERT INTO `student` VALUES (''10000'', ''123'', ''郭靖'', ''男'', ''1'', ''13645667783'', ''天津市河西区'', ''1990-09-08 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10001'', ''123'', ''李文才'', ''男'', ''1'', ''13645667890'', ''地址不详'', ''1994-04-12 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10002'', ''123'', ''李斯文'', ''男'', ''1'', ''13645556793'', ''河南洛阳'', ''1993-07-23 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10003'', ''123'', ''张萍'', ''女'', ''1'', ''13642345112'', ''地址不详'', ''1995-06-10 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10004'', ''123'', ''韩秋洁'', ''女'', ''1'', ''13812344566'', ''北京市海淀区'', ''1995-07-15 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10005'', ''123'', ''张秋丽'', ''女'', ''1'', ''13567893246'', ''北京市东城区'', ''1994-01-17 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10006'', ''123'', ''肖梅'', ''女'', ''1'', ''13563456721'', ''河北省石家庄市'', ''1991-02-17 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10007'', ''123'', ''秦洋'', ''男'', ''1'', ''13056434411'', ''上海市卢湾区'', ''1992-04-18 00:00:00'', null, null);
INSERT INTO `student` VALUES (''10008'', ''123'', ''何晴晴'', ''女'', ''1'', ''13053445221'', ''广州市天河区'', ''1997-07-23 00:00:00'', null, null);
INSERT INTO `student` VALUES (''20000'', ''123'', ''王宝宝'', ''女'', ''2'', ''13318877954'', ''地址不详'', ''1995-09-10 00:00:00'', null, null);
INSERT INTO `student` VALUES (''20010'', ''123'', ''何小华'', ''女'', ''2'', ''13318877954'', ''地址不详'', ''1995-09-10 00:00:00'', null, null);
INSERT INTO `student` VALUES (''30011'', ''123'', ''陈志强'', ''女'', ''3'', ''13689965430'', ''地址不详'', ''1994-09-27 00:00:00'', null, null);
INSERT INTO `student` VALUES (''30012'', ''123'', ''李露露'', ''女'', ''3'', ''13685678854'', ''地址不详'', ''1992-09-27 00:00:00'', null, null);

-----插入科目表
INSERT INTO `subject` VALUES (''1'', ''Logic Java'', ''220'', ''1'');
INSERT INTO `subject` VALUES (''2'', ''HTML'', ''160'', ''1'');
INSERT INTO `subject` VALUES (''3'', ''Java OOP'', ''230'', ''2'');

 

-------------------------------检查Logic Java 课程最近一次考试 。如果成绩达到80分
--------------------------------以上者,则显示分数排在前5,名学员和分数
#1.获取Logic java 课程编号
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java";


#2.获取ogic Java最近一次考试日期
SELECT MAX(examDate) FROM result WHERE subjectNo=(
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java");

#3.找到考试达到80分的记录
SELECT subjectNo,studentResult
FROM result
WHERE subjectNo=(SELECT subjectNo FROM `subject` WHERE subjectName =''Logic Java'')
AND examDate=(SELECT MAX(examDate) FROM result WHERE subjectNo=(
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java"))
AND studentResult >80

 

#4.显示Logic Java最近一次考试前5名的学生信息
SELECT subjectNo,studentResult
FROM result
WHERE EXISTS(
SELECT subjectNo,studentResult
FROM result
WHERE subjectNo=(SELECT subjectNo FROM `subject` WHERE subjectName =''Logic Java'')
AND examDate=(SELECT MAX(examDate) FROM result WHERE subjectNo=(
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java"))
AND studentResult >80
)
AND subjectNo =(SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java")
AND examDate=(SELECT MAX(examDate) FROM result WHERE subjectNo=(
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java"))
ORDER BY studentResult DESC
LIMIT 5;

 

-------------------------------检查Logic Java 课程最近一次考试 。如果全部成绩未通过考试
--------------------------------(60分及格),认为本次考试偏难,计算的该次考试平均分加5分

#1.找到考试达到60分的记录
SELECT subjectNo,studentResult
FROM result
WHERE subjectNo=(SELECT subjectNo FROM `subject` WHERE subjectName =''Logic Java'')
AND examDate=(SELECT MAX(examDate) FROM result WHERE subjectNo=(
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java"))
AND studentResult >=60

 

#2.计算平均分加5分
SELECT AVG(studentResult)+5 AS 平均分
FROM result
WHERE NOT EXISTS(
SELECT subjectNo,studentResult
FROM result
WHERE subjectNo=(SELECT subjectNo FROM `subject` WHERE subjectName =''Logic Java'')
AND examDate=(SELECT MAX(examDate) FROM result WHERE subjectNo=(
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java"))
AND studentResult >=60
)
AND subjectNo =(SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java")
AND examDate=(SELECT MAX(examDate) FROM result WHERE subjectNo=(
SELECT subjectNo FROM `subject` WHERE subjectName="Logic Java"));


-----。。。。。。
UPDATE result SET studentResult=50 WHERE subjectNo=1 AND examDate=''2016-02-17''

Hive_LEFT SEMI JOIN / LEFT OUTER JOIN 与 (IN / NOT IN), (EXISTS / NOT EXISTS ) 分析

Hive_LEFT SEMI JOIN / LEFT OUTER JOIN 与 (IN / NOT IN), (EXISTS / NOT EXISTS ) 分析

 

参考文章 : https://blog.csdn.net/happyrocking/article/details/79885071

 

本篇文章,我们主要就 Hive 中的  LEFT SEMI JOIN 和  (IN / NOT IN), (EXISTS / NOT EXISTS ) 子句查询做一个了解。

 

LEFT SEMI JOIN 基本认识

首先,我们先要了解下什么是 LEFT SEMI JOIN.

 

特点

1、left semi join 的限制是, JOIN 子句中右边的表只能在 ON 子句中设置过滤条件,在 WHERE 子句、SELECT 子句或其他地方过滤都不行。

2、left semi join 是只传递表的 join key 给 map 阶段,因此left semi join 中最后 select 的结果只许出现左表。

3、因为 left semi join 是 in(keySet) 的关系,遇到右表重复记录,左表会跳过,而 join 则会一直遍历。这就导致右表有重复值得情况下 left semi join 只产生一条,join 会产生多条,也会导致 left semi join 的性能更高。 

比如以下A表和B表进行 join 或 left semi join,然后 select 出所有字段,结果区别如下:
 

注意:蓝色叉的那一列实际是不存在left semi join中的,因为最后 select 的结果只许出现左表。

 

 

 

 

 

其实可以这么认为 LEFT SEMI JOIN 就是 子查询形式的 (IN / NOT IN), (EXISTS / NOT EXISTS ) 的替代方案。

因为 HIVE 0.13 版本之前,是不支持 (IN / NOT IN), (EXISTS / NOT EXISTS ) 中存在子查询语句的,此时我们需要使用 LEFT SEMI JOIN

文档如下:

 

构建基础的测试数据

DROP TABLE IF EXISTS data_semi_a;
  • DROP TABLE IF EXISTS data_semi_b;
  • CREATE TABLE IF NOT EXISTS data_semi_a
  • (
  • user_id BIGINT
  • ,sex_id BIGINT
  • );
  • CREATE TABLE IF NOT EXISTS data_semi_b
  • (
  • user_id BIGINT
  • ,sex_id BIGINT
  • ,age BIGINT
  • );
  • INSERT INTO TABLE data_semi_a VALUES
  • (NULL ,0)
  • ,(1, 1)
  • ,(1, 0)
  • ,(2, 1)
  • ,(3, 0)
  • ,(4, 1)
  • ;
  • INSERT INTO TABLE data_semi_b VALUES
  • (NULL, 0, 3)
  • ,(1, 0, 12)
  • ,(2, 1, 14)
  • ;
  •  

    测试数据:

    data_semi_a

    1. +----------------------+---------------------+
    2. | data_semi_a.user_id | data_semi_a.sex_id |
    3. +----------------------+---------------------+
    4. | NULL | 0 |
    5. | 1 | 1 |
    6. | 1 | 0 |
    7. | 2 | 1 |
    8. | 3 | 0 |
    9. | 4 | 1 |
    10. +----------------------+---------------------+

     

    data_semi_b

    1. +----------------------+---------------------+------------------+
    2. | data_semi_b.user_id | data_semi_b.sex_id | data_semi_b.age |
    3. +----------------------+---------------------+------------------+
    4. | NULL | 0 | 3 |
    5. | 1 | 0 | 12 |
    6. | 2 | 1 | 14 |
    7. +----------------------+---------------------+------------------+

     

     

     

     

     

    单条件的 LEFT SEMI JOIN  相当于 (IN )

     

    注意

    LEFT SEMI JOIN 等同于 IN ,其原理是 只传递 LEFT SEMI JOIN 中的 KEY 。

    所以 A LEFT SEMI JOIN B , SELECT 语句中 不能出现B 中的字段。

    1. SELECT 
    2.  a.user_id
    3.  ,a.sex_id
    4.  ,b.age
    5. FROM data_semi_a AS a
    6. LEFT SEMI JOIN data_semi_b AS b
    7.  ON a.user_id = b.user_id
    8. ;

     

    Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 4:1 Invalid table alias or column reference 'b': (possible column names are: user_id, sex_id) (state=42000,code=10004)

     

     

    单条件的 LEFT SEMI JOIN  相当于 (IN )   , 例如如下SQL

    SQL 语句

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. LEFT SEMI JOIN data_semi_b AS b
    6. ON a.user_id = b.user_id
    7. ;

    等价的 IN SQL

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. WHERE a.user_id IN (
    6. SELECT b.user_id
    7. FROM data_semi_b AS b
    8. );

     

    我们比较下2个SQL 的运行结果

    LEFT SEMI JOIN 的执行结果

    1. INFO : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2. INFO : 2020-04-12 10:53:09,591 Stage-1 map = 0%, reduce = 0%
    3. INFO : 2020-04-12 10:53:17,849 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 3.12 sec
    4. INFO : 2020-04-12 10:53:22,975 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.84 sec
    5. INFO : 2020-04-12 10:53:29,141 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.77 sec
    6. INFO : MapReduce Total cumulative CPU time: 7 seconds 770 msec
    7. INFO : Ended Job = job_1586423165261_0087
    8. INFO : MapReduce Jobs Launched:
    9. INFO : Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 7.77 sec HDFS Read: 16677 HDFS Write: 135 SUCCESS
    10. INFO : Total MapReduce CPU Time Spent: 7 seconds 770 msec
    11. INFO : Completed executing command(queryId=hive_20200412105301_9f643e42-c966-4140-8c72-330be6bdd73c); Time taken: 28.939 seconds
    12. INFO : OK
    13. +------------+-----------+
    14. | a.user_id | a.sex_id |
    15. +------------+-----------+
    16. | 1 | 0 |
    17. | 1 | 1 |
    18. | 2 | 1 |
    19. +------------+-----------+
    20. 3 rows selected (29.073 seconds)

     

    IN 的执行结果

    1. INFO : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2. INFO : 2020-04-12 10:37:26,143 Stage-1 map = 0%, reduce = 0%
    3. INFO : 2020-04-12 10:37:33,376 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 2.71 sec
    4. INFO : 2020-04-12 10:37:39,510 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.6 sec
    5. INFO : 2020-04-12 10:37:44,680 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.41 sec
    6. INFO : MapReduce Total cumulative CPU time: 7 seconds 410 msec
    7. INFO : Ended Job = job_1586423165261_0085
    8. INFO : MapReduce Jobs Launched:
    9. INFO : Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 7.41 sec HDFS Read: 16726 HDFS Write: 135 SUCCESS
    10. INFO : Total MapReduce CPU Time Spent: 7 seconds 410 msec
    11. INFO : Completed executing command(queryId=hive_20200412103717_2ab604da-f301-4fee-b9bd-9c22ad6e65a1); Time taken: 27.796 seconds
    12. INFO : OK
    13. +------------+-----------+
    14. | a.user_id | a.sex_id |
    15. +------------+-----------+
    16. | 1 | 0 |
    17. | 1 | 1 |
    18. | 2 | 1 |
    19. +------------+-----------+
    20. 3 rows selected (27.902 seconds)

     

    我们再看下两个语句的 EXPLAIN 结果:

    LEFT SEMI JOIN 的 EXPLAIN 结果:

    1. INFO : Starting task [Stage-3:EXPLAIN] in serial mode
    2. INFO : Completed executing command(queryId=hive_20200412105949_53e51917-8c04-4f6f-b9fd-32ab71a2888b); Time taken: 0.005 seconds
    3. INFO : OK
    4. +----------------------------------------------------+
    5. | Explain |
    6. +----------------------------------------------------+
    7. | STAGE DEPENDENCIES: |
    8. | Stage-1 is a root stage |
    9. | Stage-0 depends on stages: Stage-1 |
    10. | |
    11. | STAGE PLANS: |
    12. | Stage: Stage-1 |
    13. | Map Reduce |
    14. | Map Operator Tree: |
    15. | TableScan |
    16. | alias: a |
    17. | filterExpr: user_id is not null (type: boolean) |
    18. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    19. | Filter Operator |
    20. | predicate: user_id is not null (type: boolean) |
    21. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    22. | Reduce Output Operator |
    23. | key expressions: user_id (type: bigint) |
    24. | sort order: + |
    25. | Map-reduce partition columns: user_id (type: bigint) |
    26. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    27. | value expressions: sex_id (type: bigint) |
    28. | TableScan |
    29. | alias: b |
    30. | filterExpr: user_id is not null (type: boolean) |
    31. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    32. | Filter Operator |
    33. | predicate: user_id is not null (type: boolean) |
    34. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    35. | Select Operator |
    36. | expressions: user_id (type: bigint) |
    37. | outputColumnNames: _col0 |
    38. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    39. | Group By Operator |
    40. | keys: _col0 (type: bigint) |
    41. | mode: hash |
    42. | outputColumnNames: _col0 |
    43. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    44. | Reduce Output Operator |
    45. | key expressions: _col0 (type: bigint) |
    46. | sort order: + |
    47. | Map-reduce partition columns: _col0 (type: bigint) |
    48. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    49. | Reduce Operator Tree: |
    50. | Join Operator |
    51. | condition map: |
    52. | Left Semi Join 0 to 1 |
    53. | keys: |
    54. | 0 user_id (type: bigint) |
    55. | 1 _col0 (type: bigint) |
    56. | outputColumnNames: _col0, _col1 |
    57. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    58. | File Output Operator |
    59. | compressed: false |
    60. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    61. | table: |
    62. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    63. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    64. | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
    65. | |
    66. | Stage: Stage-0 |
    67. | Fetch Operator |
    68. | limit: -1 |
    69. | Processor Tree: |
    70. | ListSink |
    71. | |
    72. +----------------------------------------------------+
    73. 65 rows selected (0.136 seconds)

    IN 的 EXPLAIN 结果:

    1. INFO : Starting task [Stage-3:EXPLAIN] in serial mode
    2. INFO : Completed executing command(queryId=hive_20200412110229_81d9cf79-50e2-46f1-8152-a399038861c7); Time taken: 0.005 seconds
    3. INFO : OK
    4. +----------------------------------------------------+
    5. | Explain |
    6. +----------------------------------------------------+
    7. | STAGE DEPENDENCIES: |
    8. | Stage-1 is a root stage |
    9. | Stage-0 depends on stages: Stage-1 |
    10. | |
    11. | STAGE PLANS: |
    12. | Stage: Stage-1 |
    13. | Map Reduce |
    14. | Map Operator Tree: |
    15. | TableScan |
    16. | alias: a |
    17. | filterExpr: user_id is not null (type: boolean) |
    18. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    19. | Filter Operator |
    20. | predicate: user_id is not null (type: boolean) |
    21. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    22. | Reduce Output Operator |
    23. | key expressions: user_id (type: bigint) |
    24. | sort order: + |
    25. | Map-reduce partition columns: user_id (type: bigint) |
    26. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    27. | value expressions: sex_id (type: bigint) |
    28. | TableScan |
    29. | alias: b |
    30. | filterExpr: user_id is not null (type: boolean) |
    31. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    32. | Filter Operator |
    33. | predicate: user_id is not null (type: boolean) |
    34. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    35. | Select Operator |
    36. | expressions: user_id (type: bigint) |
    37. | outputColumnNames: _col0 |
    38. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    39. | Group By Operator |
    40. | keys: _col0 (type: bigint) |
    41. | mode: hash |
    42. | outputColumnNames: _col0 |
    43. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    44. | Reduce Output Operator |
    45. | key expressions: _col0 (type: bigint) |
    46. | sort order: + |
    47. | Map-reduce partition columns: _col0 (type: bigint) |
    48. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    49. | Reduce Operator Tree: |
    50. | Join Operator |
    51. | condition map: |
    52. | Left Semi Join 0 to 1 |
    53. | keys: |
    54. | 0 user_id (type: bigint) |
    55. | 1 _col0 (type: bigint) |
    56. | outputColumnNames: _col0, _col1 |
    57. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    58. | File Output Operator |
    59. | compressed: false |
    60. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    61. | table: |
    62. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    63. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    64. | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
    65. | |
    66. | Stage: Stage-0 |
    67. | Fetch Operator |
    68. | limit: -1 |
    69. | Processor Tree: |
    70. | ListSink |
    71. | |
    72. +----------------------------------------------------+
    73. 65 rows selected (0.127 seconds)

    可以看到两者在执行结果 和 EXPLAIN 结果上是完全一致的。

    其实 IN 内部也是使用 的 LEFT SEMI JOIN

     

     

     

     

    LEFT OUTER JOIN 实现 NOT IN 

     

    注意 LEFT SEMI JOIN 不能实现 NOT IN 

    本质 : Hive 中不支持不等值连接!!!

     

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. LEFT SEMI JOIN data_semi_b AS b
    6. ON (a.user_id != b.user_id)
    7. ;

    Error: Error while compiling statement: FAILED: SemanticException [Error 10017]: Line 6:4 Both left and right aliases encountered in JOIN 'user_id' (state=42000,code=10017)

     

     

    正确写法

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. WHERE a.user_id NOT IN (
    6. SELECT b.user_id
    7. FROM data_semi_b AS b
    8. );
    1. INFO : Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
    2. INFO : 2020-04-12 23:02:26,751 Stage-2 map = 0%, reduce = 0%
    3. INFO : 2020-04-12 23:02:33,938 Stage-2 map = 50%, reduce = 0%, Cumulative CPU 1.76 sec
    4. INFO : 2020-04-12 23:02:39,172 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 3.35 sec
    5. INFO : 2020-04-12 23:02:47,688 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 7.88 sec
    6. INFO : MapReduce Total cumulative CPU time: 7 seconds 880 msec
    7. INFO : Ended Job = job_1586423165261_0106
    8. INFO : MapReduce Jobs Launched:
    9. INFO : Stage-Stage-4: Map: 1 Reduce: 1 Cumulative CPU: 6.49 sec HDFS Read: 8372 HDFS Write: 96 SUCCESS
    10. INFO : Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 5.65 sec HDFS Read: 11974 HDFS Write: 96 SUCCESS
    11. INFO : Stage-Stage-2: Map: 2 Reduce: 1 Cumulative CPU: 7.88 sec HDFS Read: 14131 HDFS Write: 87 SUCCESS
    12. INFO : Total MapReduce CPU Time Spent: 20 seconds 20 msec
    13. INFO : Completed executing command(queryId=hive_20200412230117_fef818dc-e433-4880-9c8d-f6a9d28a08a9); Time taken: 91.471 seconds
    14. INFO : OK
    15. +------------+-----------+
    16. | a.user_id | a.sex_id |
    17. +------------+-----------+
    18. +------------+-----------+
    19. No rows selected (91.674 seconds)

     

    等价的 SQL ,  注意 NOT IN 不能使用 LEFT SEMI JOIN 实现,我们需要使用 LEFT OUTER JOIN 进行实现:

    等价的LEFT OUTER JOIN   的 SQL

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. LEFT OUTER JOIN data_semi_b AS b
    6. ON a.user_id = b.user_id
    7. AND b.user_id IS NULL
    8. WHERE a.user_id IS NOT NULL
    9. AND b.user_id IS NOT NULL
    10. ;
    1. INFO : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2. INFO : 2020-04-12 23:04:47,896 Stage-1 map = 0%, reduce = 0%
    3. INFO : 2020-04-12 23:04:55,176 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 2.91 sec
    4. INFO : 2020-04-12 23:05:00,288 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.53 sec
    5. INFO : 2020-04-12 23:05:06,449 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.45 sec
    6. INFO : MapReduce Total cumulative CPU time: 8 seconds 450 msec
    7. INFO : Ended Job = job_1586423165261_0107
    8. INFO : MapReduce Jobs Launched:
    9. INFO : Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 8.45 sec HDFS Read: 16358 HDFS Write: 87 SUCCESS
    10. INFO : Total MapReduce CPU Time Spent: 8 seconds 450 msec
    11. INFO : Completed executing command(queryId=hive_20200412230438_62ce326e-1b03-4c5a-a842-6816dc6feda3); Time taken: 28.871 seconds
    12. INFO : OK
    13. +------------+-----------+
    14. | a.user_id | a.sex_id |
    15. +------------+-----------+
    16. +------------+-----------+
    17. No rows selected (28.979 seconds)

     

    我们看下这两个SQL 的执行过程

    NOT IN 的 EXPLAIN 结果:

    1. +----------------------------------------------------+
    2. | Explain |
    3. +----------------------------------------------------+
    4. | STAGE DEPENDENCIES: |
    5. | Stage-4 is a root stage |
    6. | Stage-1 depends on stages: Stage-4 |
    7. | Stage-2 depends on stages: Stage-1 |
    8. | Stage-0 depends on stages: Stage-2 |
    9. | |
    10. | STAGE PLANS: |
    11. | Stage: Stage-4 |
    12. | Map Reduce |
    13. | Map Operator Tree: |
    14. | TableScan |
    15. | alias: b |
    16. | filterExpr: user_id is null (type: boolean) |
    17. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    18. | Filter Operator |
    19. | predicate: user_id is null (type: boolean) |
    20. | Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE |
    21. | Select Operator |
    22. | Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE |
    23. | Group By Operator |
    24. | aggregations: count() |
    25. | mode: hash |
    26. | outputColumnNames: _col0 |
    27. | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
    28. | Reduce Output Operator |
    29. | sort order: |
    30. | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
    31. | value expressions: _col0 (type: bigint) |
    32. | Reduce Operator Tree: |
    33. | Group By Operator |
    34. | aggregations: count(VALUE._col0) |
    35. | mode: mergepartial |
    36. | outputColumnNames: _col0 |
    37. | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
    38. | Filter Operator |
    39. | predicate: (_col0 = 0) (type: boolean) |
    40. | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
    41. | Select Operator |
    42. | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
    43. | Group By Operator |
    44. | keys: 0 (type: bigint) |
    45. | mode: hash |
    46. | outputColumnNames: _col0 |
    47. | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
    48. | File Output Operator |
    49. | compressed: false |
    50. | table: |
    51. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    52. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    53. | serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe |
    54. | |
    55. | Stage: Stage-1 |
    56. | Map Reduce |
    57. | Map Operator Tree: |
    58. | TableScan |
    59. | alias: a |
    60. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    61. | Reduce Output Operator |
    62. | sort order: |
    63. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    64. | value expressions: user_id (type: bigint), sex_id (type: bigint) |
    65. | TableScan |
    66. | Reduce Output Operator |
    67. | sort order: |
    68. | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
    69. | Reduce Operator Tree: |
    70. | Join Operator |
    71. | condition map: |
    72. | Left Semi Join 0 to 1 |
    73. | keys: |
    74. | 0 |
    75. | 1 |
    76. | outputColumnNames: _col0, _col1 |
    77. | Statistics: Num rows: 6 Data size: 73 Basic stats: COMPLETE Column stats: NONE |
    78. | File Output Operator |
    79. | compressed: false |
    80. | table: |
    81. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    82. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    83. | serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe |
    84. | |
    85. | Stage: Stage-2 |
    86. | Map Reduce |
    87. | Map Operator Tree: |
    88. | TableScan |
    89. | Reduce Output Operator |
    90. | key expressions: _col0 (type: bigint) |
    91. | sort order: + |
    92. | Map-reduce partition columns: _col0 (type: bigint) |
    93. | Statistics: Num rows: 6 Data size: 73 Basic stats: COMPLETE Column stats: NONE |
    94. | value expressions: _col1 (type: bigint) |
    95. | TableScan |
    96. | alias: b |
    97. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    98. | Select Operator |
    99. | expressions: user_id (type: bigint) |
    100. | outputColumnNames: _col0 |
    101. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    102. | Reduce Output Operator |
    103. | key expressions: _col0 (type: bigint) |
    104. +----------------------------------------------------+
    105. | Explain |
    106. +----------------------------------------------------+
    107. | sort order: + |
    108. | Map-reduce partition columns: _col0 (type: bigint) |
    109. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    110. | Reduce Operator Tree: |
    111. | Join Operator |
    112. | condition map: |
    113. | Left Outer Join0 to 1 |
    114. | keys: |
    115. | 0 _col0 (type: bigint) |
    116. | 1 _col0 (type: bigint) |
    117. | outputColumnNames: _col0, _col1, _col5 |
    118. | Statistics: Num rows: 6 Data size: 80 Basic stats: COMPLETE Column stats: NONE |
    119. | Filter Operator |
    120. | predicate: _col5 is null (type: boolean) |
    121. | Statistics: Num rows: 3 Data size: 40 Basic stats: COMPLETE Column stats: NONE |
    122. | Select Operator |
    123. | expressions: _col0 (type: bigint), _col1 (type: bigint) |
    124. | outputColumnNames: _col0, _col1 |
    125. | Statistics: Num rows: 3 Data size: 40 Basic stats: COMPLETE Column stats: NONE |
    126. | File Output Operator |
    127. | compressed: false |
    128. | Statistics: Num rows: 3 Data size: 40 Basic stats: COMPLETE Column stats: NONE |
    129. | table: |
    130. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    131. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    132. | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
    133. | |
    134. | Stage: Stage-0 |
    135. | Fetch Operator |
    136. | limit: -1 |
    137. | Processor Tree: |
    138. | ListSink |
    139. | |
    140. +----------------------------------------------------+

     

    LEFT OUTER JOIN 的 EXPLAIN 结果:

    1. +----------------------------------------------------+
    2. | Explain |
    3. +----------------------------------------------------+
    4. | STAGE DEPENDENCIES: |
    5. | Stage-1 is a root stage |
    6. | Stage-0 depends on stages: Stage-1 |
    7. | |
    8. | STAGE PLANS: |
    9. | Stage: Stage-1 |
    10. | Map Reduce |
    11. | Map Operator Tree: |
    12. | TableScan |
    13. | alias: a |
    14. | filterExpr: user_id is not null (type: boolean) |
    15. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    16. | Filter Operator |
    17. | predicate: user_id is not null (type: boolean) |
    18. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    19. | Reduce Output Operator |
    20. | key expressions: user_id (type: bigint) |
    21. | sort order: + |
    22. | Map-reduce partition columns: user_id (type: bigint) |
    23. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    24. | value expressions: sex_id (type: bigint) |
    25. | TableScan |
    26. | alias: b |
    27. | filterExpr: (user_id is null and user_id is not null) (type: boolean) |
    28. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    29. | Filter Operator |
    30. | predicate: (user_id is null and user_id is not null) (type: boolean) |
    31. | Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE |
    32. | Reduce Output Operator |
    33. | key expressions: user_id (type: bigint) |
    34. | sort order: + |
    35. | Map-reduce partition columns: user_id (type: bigint) |
    36. | Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE |
    37. | Reduce Operator Tree: |
    38. | Join Operator |
    39. | condition map: |
    40. | Left Outer Join0 to 1 |
    41. | keys: |
    42. | 0 user_id (type: bigint) |
    43. | 1 user_id (type: bigint) |
    44. | outputColumnNames: _col0, _col1, _col5 |
    45. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    46. | Filter Operator |
    47. | predicate: _col5 is not null (type: boolean) |
    48. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    49. | Select Operator |
    50. | expressions: _col0 (type: bigint), _col1 (type: bigint) |
    51. | outputColumnNames: _col0, _col1 |
    52. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    53. | File Output Operator |
    54. | compressed: false |
    55. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    56. | table: |
    57. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    58. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    59. | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
    60. | |
    61. | Stage: Stage-0 |
    62. | Fetch Operator |
    63. | limit: -1 |
    64. | Processor Tree: |
    65. | ListSink |
    66. | |
    67. +----------------------------------------------------+
    68. 63 rows selected (0.143 seconds)

     

     

     

     

     

     

    LEFT SEMI JOIN 实现多条件 IN , 即 EXISTS 

     

    注意: IN 只能用户单列,如果是多列的话,我们需要使用 EXISTS

     

    如下的IN 的SQL 是错误的

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. WHERE (a.user_id, a.sex_id) IN (
    6. SELECT
    7. a.user_id
    8. ,a.sex_id
    9. FROM data_semi_b AS b
    10. )
    11. ;

    Error: Error while compiling statement: FAILED: ParseException line 6:0 mismatched input 'SELECT' expecting ( near '(' in expression specification (state=42000,code=40000)

     

     

    我们需要用如下的形式,

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. LEFT SEMI JOIN data_semi_b AS b
    6. ON a.user_id = b.user_id
    7. AND a.sex_id = b.sex_id
    8. ;

    或者

    1. SELECT
    2. a.user_id
    3. ,a.sex_id
    4. FROM data_semi_a AS a
    5. WHERE EXISTS (
    6. SELECT 1
    7. FROM data_semi_b AS b
    8. WHERE
    9. a.user_id = b.user_id
    10. AND a.sex_id = b.sex_id
    11. )
    12. ;

    运行结果

    1. INFO : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2. INFO : 2020-04-12 23:46:16,157 Stage-1 map = 0%, reduce = 0%
    3. INFO : 2020-04-12 23:46:24,375 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 3.04 sec
    4. INFO : 2020-04-12 23:46:28,545 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.84 sec
    5. INFO : 2020-04-12 23:46:35,732 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.85 sec
    6. INFO : MapReduce Total cumulative CPU time: 7 seconds 850 msec
    7. INFO : Ended Job = job_1586423165261_0110
    8. INFO : MapReduce Jobs Launched:
    9. INFO : Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 7.85 sec HDFS Read: 17951 HDFS Write: 119 SUCCESS
    10. INFO : Total MapReduce CPU Time Spent: 7 seconds 850 msec
    11. INFO : Completed executing command(queryId=hive_20200412234607_8b6acba0-54bb-420f-80df-a5efd5dc9ae5); Time taken: 29.286 seconds
    12. INFO : OK
    13. +------------+-----------+
    14. | a.user_id | a.sex_id |
    15. +------------+-----------+
    16. | 1 | 0 |
    17. | 2 | 1 |
    18. +------------+-----------+
    19. 2 rows selected (29.379 seconds)

     

     

    我们看下两种方式 的 EXPLAIN 结果 :

    LEFT SEMI JOIN

    1. +----------------------------------------------------+
    2. | Explain |
    3. +----------------------------------------------------+
    4. | STAGE DEPENDENCIES: |
    5. | Stage-1 is a root stage |
    6. | Stage-0 depends on stages: Stage-1 |
    7. | |
    8. | STAGE PLANS: |
    9. | Stage: Stage-1 |
    10. | Map Reduce |
    11. | Map Operator Tree: |
    12. | TableScan |
    13. | alias: a |
    14. | filterExpr: (user_id is not null and sex_id is not null) (type: boolean) |
    15. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    16. | Filter Operator |
    17. | predicate: (user_id is not null and sex_id is not null) (type: boolean) |
    18. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    19. | Reduce Output Operator |
    20. | key expressions: user_id (type: bigint), sex_id (type: bigint) |
    21. | sort order: ++ |
    22. | Map-reduce partition columns: user_id (type: bigint), sex_id (type: bigint) |
    23. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    24. | TableScan |
    25. | alias: b |
    26. | filterExpr: (user_id is not null and sex_id is not null) (type: boolean) |
    27. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    28. | Filter Operator |
    29. | predicate: (user_id is not null and sex_id is not null) (type: boolean) |
    30. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    31. | Select Operator |
    32. | expressions: user_id (type: bigint), sex_id (type: bigint) |
    33. | outputColumnNames: _col0, _col1 |
    34. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    35. | Group By Operator |
    36. | keys: _col0 (type: bigint), _col1 (type: bigint) |
    37. | mode: hash |
    38. | outputColumnNames: _col0, _col1 |
    39. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    40. | Reduce Output Operator |
    41. | key expressions: _col0 (type: bigint), _col1 (type: bigint) |
    42. | sort order: ++ |
    43. | Map-reduce partition columns: _col0 (type: bigint), _col1 (type: bigint) |
    44. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    45. | Reduce Operator Tree: |
    46. | Join Operator |
    47. | condition map: |
    48. | Left Semi Join 0 to 1 |
    49. | keys: |
    50. | 0 user_id (type: bigint), sex_id (type: bigint) |
    51. | 1 _col0 (type: bigint), _col1 (type: bigint) |
    52. | outputColumnNames: _col0, _col1 |
    53. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    54. | File Output Operator |
    55. | compressed: false |
    56. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    57. | table: |
    58. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    59. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    60. | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
    61. | |
    62. | Stage: Stage-0 |
    63. | Fetch Operator |
    64. | limit: -1 |
    65. | Processor Tree: |
    66. | ListSink |
    67. | |
    68. +----------------------------------------------------+
    69. 64 rows selected (0.121 seconds)

     

    EXISTS 

    1. +----------------------------------------------------+
    2. | Explain |
    3. +----------------------------------------------------+
    4. | STAGE DEPENDENCIES: |
    5. | Stage-1 is a root stage |
    6. | Stage-0 depends on stages: Stage-1 |
    7. | |
    8. | STAGE PLANS: |
    9. | Stage: Stage-1 |
    10. | Map Reduce |
    11. | Map Operator Tree: |
    12. | TableScan |
    13. | alias: a |
    14. | filterExpr: (user_id is not null and sex_id is not null) (type: boolean) |
    15. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    16. | Filter Operator |
    17. | predicate: (user_id is not null and sex_id is not null) (type: boolean) |
    18. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    19. | Reduce Output Operator |
    20. | key expressions: user_id (type: bigint), sex_id (type: bigint) |
    21. | sort order: ++ |
    22. | Map-reduce partition columns: user_id (type: bigint), sex_id (type: bigint) |
    23. | Statistics: Num rows: 6 Data size: 19 Basic stats: COMPLETE Column stats: NONE |
    24. | TableScan |
    25. | alias: b |
    26. | filterExpr: (user_id is not null and sex_id is not null) (type: boolean) |
    27. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    28. | Filter Operator |
    29. | predicate: (user_id is not null and sex_id is not null) (type: boolean) |
    30. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    31. | Select Operator |
    32. | expressions: user_id (type: bigint), sex_id (type: bigint) |
    33. | outputColumnNames: _col0, _col1 |
    34. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    35. | Group By Operator |
    36. | keys: _col0 (type: bigint), _col1 (type: bigint) |
    37. | mode: hash |
    38. | outputColumnNames: _col0, _col1 |
    39. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    40. | Reduce Output Operator |
    41. | key expressions: _col0 (type: bigint), _col1 (type: bigint) |
    42. | sort order: ++ |
    43. | Map-reduce partition columns: _col0 (type: bigint), _col1 (type: bigint) |
    44. | Statistics: Num rows: 3 Data size: 18 Basic stats: COMPLETE Column stats: NONE |
    45. | Reduce Operator Tree: |
    46. | Join Operator |
    47. | condition map: |
    48. | Left Semi Join 0 to 1 |
    49. | keys: |
    50. | 0 user_id (type: bigint), sex_id (type: bigint) |
    51. | 1 _col0 (type: bigint), _col1 (type: bigint) |
    52. | outputColumnNames: _col0, _col1 |
    53. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    54. | File Output Operator |
    55. | compressed: false |
    56. | Statistics: Num rows: 6 Data size: 20 Basic stats: COMPLETE Column stats: NONE |
    57. | table: |
    58. | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
    59. | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
    60. | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
    61. | |
    62. | Stage: Stage-0 |
    63. | Fetch Operator |
    64. | limit: -1 |
    65. | Processor Tree: |
    66. | ListSink |
    67. | |
    68. +----------------------------------------------------+
    69. 64 rows selected (0.147 seconds)

     

    可以看到两种方式的执行计划是一致的!!!

     

     

     

     

    关于mysql 子查询 EXISTSmysql 子查询优化的介绍已经告一段落,感谢您的耐心阅读,如果想了解更多关于C# 中当程序的访问权限不足时,Directory.Exists 和 File.Exists 方法不会抛出异常报错、EXISTS (SELECT 1 ...) vs EXISTS (SELECT * ...) 一个还是另一个?、EXISTS和 NOT EXISTS 子查询 (高级查询 二)、Hive_LEFT SEMI JOIN / LEFT OUTER JOIN 与 (IN / NOT IN), (EXISTS / NOT EXISTS ) 分析的相关信息,请在本站寻找。

    本文标签:

    上一篇mysql 案例 ~ mysql安全渗透测试(mysql渗透教程)

    下一篇Mysql 提示 you need the SUPER privilege for this operation 的解决办法