如果您对SparkDataFrameTimestampType-如何从字段获取年,月,日值?和spark获取当前时间感兴趣,那么这篇文章一定是您不可错过的。我们将详细讲解SparkDataFrameT
如果您对Spark DataFrame TimestampType-如何从字段获取年,月,日值?和spark获取当前时间感兴趣,那么这篇文章一定是您不可错过的。我们将详细讲解Spark DataFrame TimestampType-如何从字段获取年,月,日值?的各种细节,并对spark获取当前时间进行深入的分析,此外还有关于Apache Spark DataFrames 入门指南:操作 DataFrame、Apache Spark DataFrames入门指南:操作DataFrame、Apache Spark:如何从DataFrame创建矩阵?、org.apache.hadoop.hbase.codec.prefixtree.timestamp.data.TestTimestampDataRepeats的实例源码的实用技巧。
本文目录一览:
Spark DataFrame TimestampType-如何从字段获取年,月,日值?(spark获取当前时间)
我有带有take(5)最上面几行的Spark DataFrame,如下所示:
[Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=1, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=2, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=3, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=4, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=5, value=638.55)]
它的模式定义为:
elevDF.printSchema()root |-- date: timestamp (nullable = true) |-- hour: long (nullable = true) |-- value: double (nullable = true)
如何从“日期”字段中获取年,月,日值?
答案1
小编典典 从Spark 1.5开始,您可以使用许多日期处理功能:
pyspark.sql.functions.year
pyspark.sql.functions.month
pyspark.sql.functions.dayofmonth
pyspark.sql.functions.dayofweek()
pyspark.sql.functions.dayofyear
pyspark.sql.functions.weekofyear()
import datetime
from pyspark.sql.functions import year, month, dayofmonth
elevDF = sc.parallelize([
(datetime.datetime(1984, 1, 1, 0, 0), 1, 638.55),
(datetime.datetime(1984, 1, 1, 0, 0), 2, 638.55),
(datetime.datetime(1984, 1, 1, 0, 0), 3, 638.55),
(datetime.datetime(1984, 1, 1, 0, 0), 4, 638.55),
(datetime.datetime(1984, 1, 1, 0, 0), 5, 638.55)
]).toDF([“date”, “hour”, “value”])
elevDF.select(
year(“date”).alias(‘year’),
month(“date”).alias(‘month’),
dayofmonth(“date”).alias(‘day’)
).show()
+----+-----+—+
|year|month|day|
+----+-----+—+
|1984| 1| 1|
|1984| 1| 1|
|1984| 1| 1|
|1984| 1| 1|
|1984| 1| 1|
+----+-----+—+
您可以将simplemap
与其他任何RDD一起使用:
elevDF = sqlContext.createDataFrame(sc.parallelize([ Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=1, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=2, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=3, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=4, value=638.55), Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=5, value=638.55)]))(elevDF .map(lambda (date, hour, value): (date.year, date.month, date.day)) .collect())
结果是:
[(1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1)]
顺便说一句:datetime.datetime
无论如何都存储一个小时,所以分开保存似乎浪费了内存。

Apache Spark DataFrames 入门指南:操作 DataFrame
文章目录
- 1 二、操作 DataFrame
- 1.1 打印 DataFrame 里面的模式
- 1.2 对 DataFrame 里面的数据进行采样
- 1.3 查询 DataFrame 里面的列
- 1.4 根据条件过滤数据
- 1.5 对 DataFrame 里面的数据进行排序
- 1.6 对列进行重命名
- 1.7 将 DataFrame 看作是关系型数据表
- 1.8 对两个 DataFrame 进行 Join 操作
- 1.9 将 DataFrame 保存成文件
二、操作 DataFrame
在前面的文章中,我们介绍了如何创建 DataFrame。本文将介绍如何操作 DataFrame 里面的数据和打印出 DataFrame 里面数据的模式
打印 DataFrame 里面的模式
在创建完 DataFrame 之后,我们一般都会查看里面数据的模式,我们可以通过 printSchema
函数来查看。它会打印出列的名称和类型:
students.printSchema root |-- id : string (nullable = true ) |-- studentName : string (nullable = true ) |-- phone : string (nullable = true ) |-- email : string (nullable = true ) |
如果采用的是 load 方式参见 DataFrame 的,students.printSchema
的输出则如下:
root |-- id|studentName|phone|email : string (nullable = true ) |
对 DataFrame 里面的数据进行采样
打印完模式之后,我们要做的第二件事就是看看加载进 DataFrame 里面的数据是否正确。从新创建的 DataFrame 里面采样数据的方法有很多种。我们来对其进行介绍。
最简单的就是使用 show 方法,show 方法有四个版本:
(1)、第一个需要我们指定采样的行数 def show(numRows: Int);
(2)、第二种不需要我们指定任何参数,这种情况下,show 函数默认会加载出 20 行的数据 def show();
(3)、第三种需要指定一个 boolean 值,这个值说明是否需要对超过 20 个字符的列进行截取 def show(truncate: Boolean);
(4)、最后一种需要指定采样的行和是否需要对列进行截断 def show(numRows: Int, truncate: Boolean)
。实际上,前三个函数都是调用这个函数实现的。
Show 函数和其他函数不同的地方在于其不仅会显示需要打印的行,而且还会打印出头信息,并且会直接在默认的输出流打出 (console)。来看看怎么使用吧:
students.show() //打印出20行 +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| | 17 | Blossom| 1 - 154 - 406 - 9596 |Nunc.commodo.auct...| | 18 | Guy| 1 - 869 - 521 - 3230 |senectus.et.netus...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 20 | Edward| 1 - 711 - 710 - 6552 |lectus @ aliquetlib...| +---+-----------+--------------+--------------------+ only showing top 20 rows students.show( 15 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| +---+-----------+--------------+--------------------+ only showing top 15 rows students.show( true ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| | 17 | Blossom| 1 - 154 - 406 - 9596 |Nunc.commodo.auct...| | 18 | Guy| 1 - 869 - 521 - 3230 |senectus.et.netus...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 20 | Edward| 1 - 711 - 710 - 6552 |lectus @ aliquetlib...| +---+-----------+--------------+--------------------+ only showing top 20 rows students.show( false ) +---+-----------+--------------+-----------------------------------------+ |id |studentName|phone |email | +---+-----------+--------------+-----------------------------------------+ | 1 |Burke | 1 - 300 - 746 - 8446 |ullamcorper.velit.in @ ametnullaDonec.co.uk| | 2 |Kamal | 1 - 668 - 571 - 5046 |pede.Suspendisse @ interdumenim.edu | | 3 |Olga | 1 - 956 - 311 - 1686 |Aenean.eget.metus @ dictumcursusNunc.edu | | 4 |Belle | 1 - 246 - 894 - 6340 |vitae.aliquet.nec @ neque.co.uk | | 5 |Trevor | 1 - 300 - 527 - 4967 |dapibus.id @ acturpisegestas.net | | 6 |Laurel | 1 - 691 - 379 - 9921 |adipiscing @ consectetueripsum.edu | | 7 |Sara | 1 - 608 - 140 - 1995 |Donec.nibh @ enimEtiamimperdiet.edu | | 8 |Kaseem | 1 - 881 - 586 - 2689 |cursus.et.magna @ euismod.org | | 9 |Lev | 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsumdolor.com | | 10 |Maya | 1 - 271 - 683 - 2698 |accumsan.convallis @ ornarelectusjusto.edu | | 11 |Emi | 1 - 467 - 270 - 1337 |est @ nunc.com | | 12 |Caleb | 1 - 683 - 212 - 0896 |Suspendisse @ Quisque.edu | | 13 |Florence | 1 - 603 - 575 - 2444 |sit.amet.dapibus @ lacusAliquamrutrum.ca | | 14 |Anika | 1 - 856 - 828 - 7883 |euismod @ ligulaelit.co.uk | | 15 |Tarik | 1 - 398 - 171 - 2268 |turpis @ felisorci.com | | 16 |Amena | 1 - 878 - 250 - 3129 |lorem.luctus.ut @ scelerisque.com | | 17 |Blossom | 1 - 154 - 406 - 9596 |Nunc.commodo.auctor @ eratSed.co.uk | | 18 |Guy | 1 - 869 - 521 - 3230 |senectus.et.netus @ lectusrutrum.com | | 19 |Malachi | 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ estarcu.net | | 20 |Edward | 1 - 711 - 710 - 6552 |lectus @ aliquetlibero.co.uk | +---+-----------+--------------+-----------------------------------------+ only showing top 20 rows students.show( 10 , false ) +---+-----------+--------------+-----------------------------------------+ |id |studentName|phone |email | +---+-----------+--------------+-----------------------------------------+ | 1 |Burke | 1 - 300 - 746 - 8446 |ullamcorper.velit.in @ ametnullaDonec.co.uk| | 2 |Kamal | 1 - 668 - 571 - 5046 |pede.Suspendisse @ interdumenim.edu | | 3 |Olga | 1 - 956 - 311 - 1686 |Aenean.eget.metus @ dictumcursusNunc.edu | | 4 |Belle | 1 - 246 - 894 - 6340 |vitae.aliquet.nec @ neque.co.uk | | 5 |Trevor | 1 - 300 - 527 - 4967 |dapibus.id @ acturpisegestas.net | | 6 |Laurel | 1 - 691 - 379 - 9921 |adipiscing @ consectetueripsum.edu | | 7 |Sara | 1 - 608 - 140 - 1995 |Donec.nibh @ enimEtiamimperdiet.edu | | 8 |Kaseem | 1 - 881 - 586 - 2689 |cursus.et.magna @ euismod.org | | 9 |Lev | 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsumdolor.com | | 10 |Maya | 1 - 271 - 683 - 2698 |accumsan.convallis @ ornarelectusjusto.edu | +---+-----------+--------------+-----------------------------------------+ only showing top 10 rows |
我们还可以使用 head (n: Int) 方法来采样数据,这个函数也需要输入一个参数标明需要采样的行数,而且这个函数返回的是 Row 数组,我们需要遍历打印。当然,我们也可以使用 head () 函数直接打印,这个函数只是返回数据的一行,类型也是 Row。
students.head( 5 ).foreach(println) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] [ 2 ,Kamal, 1 - 668 - 571 - 5046 ,pede.Suspendisse @ interdumenim.edu] [ 3 ,Olga, 1 - 956 - 311 - 1686 ,Aenean.eget.metus @ dictumcursusNunc.edu] [ 4 ,Belle, 1 - 246 - 894 - 6340 ,vitae.aliquet.nec @ neque.co.uk] [ 5 ,Trevor, 1 - 300 - 527 - 4967 ,dapibus.id @ acturpisegestas.net] println(students.head()) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] |
除了 show、head 函数。我们还可以使用 first 和 take 函数,他们分别调用 head () 和 head (n)
println(students.first()) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] students.take( 5 ).foreach(println) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] [ 2 ,Kamal, 1 - 668 - 571 - 5046 ,pede.Suspendisse @ interdumenim.edu] [ 3 ,Olga, 1 - 956 - 311 - 1686 ,Aenean.eget.metus @ dictumcursusNunc.edu] [ 4 ,Belle, 1 - 246 - 894 - 6340 ,vitae.aliquet.nec @ neque.co.uk] [ 5 ,Trevor, 1 - 300 - 527 - 4967 ,dapibus.id @ acturpisegestas.net] |
查询 DataFrame 里面的列
正如你所看到的,所有的 DataFrame 里面的列都是有名称的。Select 函数可以帮助我们从 DataFrame 中选择需要的列,并且返回一个全新的 DataFrame,下面我将此进行介绍。
1、只选择一列。假如我们只想从 DataFrame 中选择 email 这列,因为 DataFrame 是不可变的 (immutable),所以这个操作会返回一个新的 DataFrame:
val emailDataFrame : DataFrame = students.select( "email" ) |
现在我们有一个名叫 emailDataFrame 全新的 DataFrame,而且其中只包含了 email 这列,让我们使用 show 来看看是否是这样的:
emailDataFrame.show( 3 ) +--------------------+ | email| +--------------------+ |ullamcorper.velit...| |pede.Suspendisse @ ...| |Aenean.eget.metus...| +--------------------+ only showing top 3 rows |
2、选择多列。其实 select 函数支持选择多列。
val studentEmailDF = students.select( "studentName" , "email" ) studentEmailDF.show( 5 ) +-----------+--------------------+ |studentName| email| +-----------+--------------------+ | Burke|ullamcorper.velit...| | Kamal|pede.Suspendisse @ ...| | Olga|Aenean.eget.metus...| | Belle|vitae.aliquet.nec...| | Trevor|dapibus.id @ acturp...| +-----------+--------------------+ only showing top 5 rows |
需要主要的是,我们 select 列的时候,需要保证 select 的列是有效的,换句话说,就是必须保证 select 的列是 printSchema
打印出来的。如果列的名称是无效的,将会出现 org.apache.spark.sql.AnalysisException
异常,如下:
val studentEmailDF = students.select( "studentName" , "iteblog" ) studentEmailDF.show( 5 ) Exception in thread "main" org.apache.spark.sql.AnalysisException : cannot resolve ''iteblog'' given input columns id, studentName, phone, email; |
根据条件过滤数据
现在我们已经知道如何在 DataFrame 中选择需要的列,让我们来看看如何根据条件来过滤 DataFrame 里面的数据。对应基于 Row 的数据,我们可以将 DataFrame 看作是普通的 Scala 集合,然后我们根据需要的条件进行相关的过滤,为了展示清楚,我在语句没后面都用 show 函数展示过滤的结果。
students.filter( "id > 5" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| +---+-----------+--------------+--------------------+ only showing top 10 rows students.filter( "studentName =''''" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| +---+-----------+--------------+--------------------+ |
注意看第一个过滤语句,虽然 id 被解析成 String 了,但是程序依然正确地做出了比较。我们也可以对多个条件进行过滤:
students.filter( "studentName ='''' OR studentName = ''NULL''" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 33 | NULL| 1 - 105 - 503 - 0141 |Donec @ Inmipede.co.uk| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| +---+-----------+--------------+--------------------+ |
我们还可以采用类 SQL 的语法对数据进行过滤:
students.filter( "SUBSTR(studentName,0,1) =''M''" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 24 | Marsden| 1 - 477 - 629 - 7528 |Donec.dignissim.m...| | 37 | Maggy| 1 - 910 - 887 - 6777 |facilisi.Sed.nequ...| | 61 | Maxine| 1 - 422 - 863 - 3041 |aliquet.molestie....| | 77 | Maggy| 1 - 613 - 147 - 4380 | pellentesque @ mi.net| | 97 | Maxwell| 1 - 607 - 205 - 1273 |metus.In @ musAenea...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
对 DataFrame 里面的数据进行排序
使用 sort 函数我们可以对 DataFrame 中指定的列进行排序:
students.sort(students( "studentName" ).desc).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 50 | Yasir| 1 - 282 - 511 - 4445 |eget.odio.Aliquam...| | 52 | Xena| 1 - 527 - 990 - 8606 |in.faucibus.orci @ ...| | 86 | Xandra| 1 - 677 - 708 - 5691 |libero @ arcuVestib...| | 43 | Wynter| 1 - 440 - 544 - 1851 |amet.risus.Donec @ ...| | 31 | Wallace| 1 - 144 - 220 - 8159 | lorem.lorem @ non.net| | 66 | Vance| 1 - 268 - 680 - 0857 |pellentesque @ netu...| | 41 | Tyrone| 1 - 907 - 383 - 5293 |non.bibendum.sed @ ...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
也可以对多列进行排序:
students.sort( "studentName" , "id" ).show( 10 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| | 91 | Abel| 1 - 530 - 527 - 7467 | urna @ veliteu.edu| | 69 | Aiko| 1 - 682 - 230 - 7013 |turpis.vitae.puru...| | 47 | Alma| 1 - 747 - 382 - 6775 | nec.enim @ non.org| | 26 | Amela| 1 - 526 - 909 - 2605 | in @ vitaesodales.edu| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| +---+-----------+--------------+--------------------+ only showing top 10 rows |
从上面的结果我们可以看出,默认是按照升序进行排序的。我们也可以将上面的语句写成下面的:
students.sort(students( "studentName" ).asc, students( "id" ).asc).show( 10 ) |
这两个语句运行的效果是一致的。
对列进行重命名
如果我们对 DataFrame 中默认的列名不感兴趣,我们可以在 select 的时候利用 as 对其进行重命名,下面的列子将 studentName
重命名为 name,而 email 这列名字不变:
students.select(students( "studentName" ).as( "name" ), students( "email" )).show( 10 ) +--------+--------------------+ | name| email| +--------+--------------------+ | Burke|ullamcorper.velit...| | Kamal|pede.Suspendisse @ ...| | Olga|Aenean.eget.metus...| | Belle|vitae.aliquet.nec...| | Trevor|dapibus.id @ acturp...| | Laurel|adipiscing @ consec...| | Sara|Donec.nibh @ enimEt...| | Kaseem|cursus.et.magna @ e...| | Lev|Vivamus.nisi @ ipsu...| | Maya|accumsan.convalli...| +--------+--------------------+ only showing top 10 rows |
将 DataFrame 看作是关系型数据表
DataFrame 的一个强大之处就是我们可以将它看作是一个关系型数据表,然后在其上运行 SQL 查询语句,只要我们进行下面两步即可实现:
(1)、将 DataFrame 注册成一张名为 students 的表:
students.registerTempTable( "students" ) |
(2)、然后我们在其上用标准的 SQL 进行查询:
sqlContext.sql( "select * from students where studentName!='''' order by email desc" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 87 | Selma| 1 - 601 - 330 - 4409 |vulputate.velit @ p...| | 96 | Channing| 1 - 984 - 118 - 7533 |viverra.Donec.tem...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 78 | Finn| 1 - 213 - 781 - 6969 |vestibulum.massa @ ...| | 53 | Kasper| 1 - 155 - 575 - 9346 |velit.eget @ pedeCu...| | 63 | Dylan| 1 - 417 - 943 - 8961 |vehicula.aliquet @ ...| | 35 | Cadman| 1 - 443 - 642 - 5919 |ut.lacus @ adipisci...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
对两个 DataFrame 进行 Join 操作
前面我们已经知道如何将 DataFrame 注册成一张表,现在我们来看看如何使用普通的 SQL 对两个 DataFrame 进行 Join 操作。
1、内联:内联是默认的 Join 操作,它仅仅返回两个 DataFrame 都匹配到的结果,来看看下面的例子:
val students 1 = sqlContext.csvFile(filePath = "E:\\StudentPrep1.csv" , useHeader = true , delimiter = ''|'' ) val students 2 = sqlContext.csvFile(filePath = "E:\\StudentPrep2.csv" , useHeader = true , delimiter = ''|'' ) val studentsJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" )) studentsJoin.show(studentsJoin.count.toInt) +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 |BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 |KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 |BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 |LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ |
2、右外联:在内连接的基础上,还包含右表中所有不符合条件的数据行,并在其中的左表列填写 NULL ,来看看下面的实例:
val studentsRightOuterJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" ), "right_outer" ) studentsRightOuterJoin.show(studentsRightOuterJoin.count.toInt) +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 | BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 | KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 | BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 | LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | null | null | null | null | 999 |LevUniqueToSecondRDD| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ |
3、左外联:在内连接的基础上,还包含左表中所有不符合条件的数据行,并在其中的右表列填写 NULL ,同样我们来看看下面的实例:
val studentsLeftOuterJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" ), "left_outer" ) studentsLeftOuterJoin.show(studentsLeftOuterJoin.count.toInt) +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 |BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 |KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 |BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 |LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | iteblog| 999999 | iteblog @ iteblog.com| null | null | null | null | +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ |
将 DataFrame 保存成文件
下面我来介绍如何将 DataFrame 保存到一个文件里面。前面我们加载 csv 文件用到了 load 函数,与之对于的用于保存文件可以使用 save 函数。具体操作包括以下两步:
1、首先创建一个 map 对象,用于存储一些 save 函数需要用到的一些属性。这里我将制定保存文件的存放路径和 csv 的头信息。
val saveOptions = Map( "header" -> "true" , "path" -> "iteblog.csv" ) |
为了基于学习的态度,我们从 DataFrame 里面选择出 studentName 和 email 两列,并且将 studentName 的列名重定义为 name。
val copyOfStudents = students.select(students( "studentName" ).as( "name" ), students( "email" )) |
2、下面我们调用 save 函数保存上面的 DataFrame 数据到 iteblog.csv 文件夹中
copyOfStudents.write.format( "com.databricks.spark.csv" ).mode(SaveMode.Overwrite).options(saveOptions).save() |
mode 函数可以接收的参数有 Overwrite、Append、Ignore 和 ErrorIfExists。从名字就可以很好的理解,Overwrite 代表覆盖目录下之前存在的数据;Append 代表给指定目录下追加数据;Ignore 代表如果目录下已经有文件,那就什么都不执行;ErrorIfExists 代表如果保存目录下存在文件,那么抛出相应的异常。
需要注意的是,上述 path 参数指定的是保存文件夹,并不是最后的保存文件名。

Apache Spark DataFrames入门指南:操作DataFrame
文章目录
- 1 二、操作DataFrame
- 1.1 打印DataFrame里面的模式
- 1.2 对DataFrame里面的数据进行采样
- 1.3 查询DataFrame里面的列
- 1.4 根据条件过滤数据
- 1.5 对DataFrame里面的数据进行排序
- 1.6 对列进行重命名
- 1.7 将DataFrame看作是关系型数据表
- 1.8 对两个DataFrame进行Join操作
- 1.9 将DataFrame保存成文件
二、操作DataFrame
在前面的文章中,我们介绍了如何创建DataFrame。本文将介绍如何操作DataFrame里面的数据和打印出DataFrame里面数据的模式
打印DataFrame里面的模式
在创建完DataFrame之后,我们一般都会查看里面数据的模式,我们可以通过printSchema
函数来查看。它会打印出列的名称和类型:
students.printSchema root |-- id : string (nullable = true ) |-- studentName : string (nullable = true ) |-- phone : string (nullable = true ) |-- email : string (nullable = true ) |
如果采用的是load方式参见DataFrame的,students.printSchema
的输出则如下:
root |-- id|studentName|phone|email : string (nullable = true ) |
对DataFrame里面的数据进行采样
打印完模式之后,我们要做的第二件事就是看看加载进DataFrame里面的数据是否正确。从新创建的DataFrame里面采样数据的方法有很多种。我们来对其进行介绍。
最简单的就是使用show方法,show方法有四个版本:
(1)、第一个需要我们指定采样的行数def show(numRows: Int);
(2)、第二种不需要我们指定任何参数,这种情况下,show函数默认会加载出20行的数据def show();
(3)、第三种需要指定一个boolean值,这个值说明是否需要对超过20个字符的列进行截取def show(truncate: Boolean);
(4)、最后一种需要指定采样的行和是否需要对列进行截断def show(numRows: Int, truncate: Boolean)
。实际上,前三个函数都是调用这个函数实现的。
Show函数和其他函数不同的地方在于其不仅会显示需要打印的行,而且还会打印出头信息,并且会直接在默认的输出流打出(console)。来看看怎么使用吧:
students.show() //打印出20行 +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| | 17 | Blossom| 1 - 154 - 406 - 9596 |Nunc.commodo.auct...| | 18 | Guy| 1 - 869 - 521 - 3230 |senectus.et.netus...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 20 | Edward| 1 - 711 - 710 - 6552 |lectus @ aliquetlib...| +---+-----------+--------------+--------------------+ only showing top 20 rows students.show( 15 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| +---+-----------+--------------+--------------------+ only showing top 15 rows students.show( true ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| | 17 | Blossom| 1 - 154 - 406 - 9596 |Nunc.commodo.auct...| | 18 | Guy| 1 - 869 - 521 - 3230 |senectus.et.netus...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 20 | Edward| 1 - 711 - 710 - 6552 |lectus @ aliquetlib...| +---+-----------+--------------+--------------------+ only showing top 20 rows students.show( false ) +---+-----------+--------------+-----------------------------------------+ |id |studentName|phone |email | +---+-----------+--------------+-----------------------------------------+ | 1 |Burke | 1 - 300 - 746 - 8446 |ullamcorper.velit.in @ ametnullaDonec.co.uk| | 2 |Kamal | 1 - 668 - 571 - 5046 |pede.Suspendisse @ interdumenim.edu | | 3 |Olga | 1 - 956 - 311 - 1686 |Aenean.eget.metus @ dictumcursusNunc.edu | | 4 |Belle | 1 - 246 - 894 - 6340 |vitae.aliquet.nec @ neque.co.uk | | 5 |Trevor | 1 - 300 - 527 - 4967 |dapibus.id @ acturpisegestas.net | | 6 |Laurel | 1 - 691 - 379 - 9921 |adipiscing @ consectetueripsum.edu | | 7 |Sara | 1 - 608 - 140 - 1995 |Donec.nibh @ enimEtiamimperdiet.edu | | 8 |Kaseem | 1 - 881 - 586 - 2689 |cursus.et.magna @ euismod.org | | 9 |Lev | 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsumdolor.com | | 10 |Maya | 1 - 271 - 683 - 2698 |accumsan.convallis @ ornarelectusjusto.edu | | 11 |Emi | 1 - 467 - 270 - 1337 |est @ nunc.com | | 12 |Caleb | 1 - 683 - 212 - 0896 |Suspendisse @ Quisque.edu | | 13 |Florence | 1 - 603 - 575 - 2444 |sit.amet.dapibus @ lacusAliquamrutrum.ca | | 14 |Anika | 1 - 856 - 828 - 7883 |euismod @ ligulaelit.co.uk | | 15 |Tarik | 1 - 398 - 171 - 2268 |turpis @ felisorci.com | | 16 |Amena | 1 - 878 - 250 - 3129 |lorem.luctus.ut @ scelerisque.com | | 17 |Blossom | 1 - 154 - 406 - 9596 |Nunc.commodo.auctor @ eratSed.co.uk | | 18 |Guy | 1 - 869 - 521 - 3230 |senectus.et.netus @ lectusrutrum.com | | 19 |Malachi | 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ estarcu.net | | 20 |Edward | 1 - 711 - 710 - 6552 |lectus @ aliquetlibero.co.uk | +---+-----------+--------------+-----------------------------------------+ only showing top 20 rows students.show( 10 , false ) +---+-----------+--------------+-----------------------------------------+ |id |studentName|phone |email | +---+-----------+--------------+-----------------------------------------+ | 1 |Burke | 1 - 300 - 746 - 8446 |ullamcorper.velit.in @ ametnullaDonec.co.uk| | 2 |Kamal | 1 - 668 - 571 - 5046 |pede.Suspendisse @ interdumenim.edu | | 3 |Olga | 1 - 956 - 311 - 1686 |Aenean.eget.metus @ dictumcursusNunc.edu | | 4 |Belle | 1 - 246 - 894 - 6340 |vitae.aliquet.nec @ neque.co.uk | | 5 |Trevor | 1 - 300 - 527 - 4967 |dapibus.id @ acturpisegestas.net | | 6 |Laurel | 1 - 691 - 379 - 9921 |adipiscing @ consectetueripsum.edu | | 7 |Sara | 1 - 608 - 140 - 1995 |Donec.nibh @ enimEtiamimperdiet.edu | | 8 |Kaseem | 1 - 881 - 586 - 2689 |cursus.et.magna @ euismod.org | | 9 |Lev | 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsumdolor.com | | 10 |Maya | 1 - 271 - 683 - 2698 |accumsan.convallis @ ornarelectusjusto.edu | +---+-----------+--------------+-----------------------------------------+ only showing top 10 rows |
我们还可以使用head(n: Int)方法来采样数据,这个函数也需要输入一个参数标明需要采样的行数,而且这个函数返回的是Row数组,我们需要遍历打印。当然,我们也可以使用head()函数直接打印,这个函数只是返回数据的一行,类型也是Row。
students.head( 5 ).foreach(println) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] [ 2 ,Kamal, 1 - 668 - 571 - 5046 ,pede.Suspendisse @ interdumenim.edu] [ 3 ,Olga, 1 - 956 - 311 - 1686 ,Aenean.eget.metus @ dictumcursusNunc.edu] [ 4 ,Belle, 1 - 246 - 894 - 6340 ,vitae.aliquet.nec @ neque.co.uk] [ 5 ,Trevor, 1 - 300 - 527 - 4967 ,dapibus.id @ acturpisegestas.net] println(students.head()) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] |
除了show、head函数。我们还可以使用first和take函数,他们分别调用head()和head(n)
println(students.first()) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] students.take( 5 ).foreach(println) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] [ 2 ,Kamal, 1 - 668 - 571 - 5046 ,pede.Suspendisse @ interdumenim.edu] [ 3 ,Olga, 1 - 956 - 311 - 1686 ,Aenean.eget.metus @ dictumcursusNunc.edu] [ 4 ,Belle, 1 - 246 - 894 - 6340 ,vitae.aliquet.nec @ neque.co.uk] [ 5 ,Trevor, 1 - 300 - 527 - 4967 ,dapibus.id @ acturpisegestas.net] |
查询DataFrame里面的列
正如你所看到的,所有的DataFrame里面的列都是有名称的。Select函数可以帮助我们从DataFrame中选择需要的列,并且返回一个全新的DataFrame,下面我将此进行介绍。
1、只选择一列。假如我们只想从DataFrame中选择email这列,因为DataFrame是不可变的(immutable),所以这个操作会返回一个新的DataFrame:
val emailDataFrame : DataFrame = students.select( "email" ) |
现在我们有一个名叫emailDataFrame全新的DataFrame,而且其中只包含了email这列,让我们使用show来看看是否是这样的:
emailDataFrame.show( 3 ) +--------------------+ | email| +--------------------+ |ullamcorper.velit...| |pede.Suspendisse @ ...| |Aenean.eget.metus...| +--------------------+ only showing top 3 rows |
2、选择多列。其实select函数支持选择多列。
val studentEmailDF = students.select( "studentName" , "email" ) studentEmailDF.show( 5 ) +-----------+--------------------+ |studentName| email| +-----------+--------------------+ | Burke|ullamcorper.velit...| | Kamal|pede.Suspendisse @ ...| | Olga|Aenean.eget.metus...| | Belle|vitae.aliquet.nec...| | Trevor|dapibus.id @ acturp...| +-----------+--------------------+ only showing top 5 rows |
需要主要的是,我们select列的时候,需要保证select的列是有效的,换句话说,就是必须保证select的列是printSchema
打印出来的。如果列的名称是无效的,将会出现org.apache.spark.sql.AnalysisException
异常,如下:
val studentEmailDF = students.select( "studentName" , "iteblog" ) studentEmailDF.show( 5 ) Exception in thread "main" org.apache.spark.sql.AnalysisException : cannot resolve ''iteblog'' given input columns id, studentName, phone, email; |
根据条件过滤数据
现在我们已经知道如何在DataFrame中选择需要的列,让我们来看看如何根据条件来过滤DataFrame里面的数据。对应基于Row的数据,我们可以将DataFrame看作是普通的Scala集合,然后我们根据需要的条件进行相关的过滤,为了展示清楚,我在语句没后面都用show函数展示过滤的结果。
students.filter( "id > 5" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| +---+-----------+--------------+--------------------+ only showing top 10 rows students.filter( "studentName =''''" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| +---+-----------+--------------+--------------------+ |
注意看第一个过滤语句,虽然id被解析成String了,但是程序依然正确地做出了比较。我们也可以对多个条件进行过滤:
students.filter( "studentName ='''' OR studentName = ''NULL''" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 33 | NULL| 1 - 105 - 503 - 0141 |Donec @ Inmipede.co.uk| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| +---+-----------+--------------+--------------------+ |
我们还可以采用类SQL的语法对数据进行过滤:
students.filter( "SUBSTR(studentName,0,1) =''M''" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 24 | Marsden| 1 - 477 - 629 - 7528 |Donec.dignissim.m...| | 37 | Maggy| 1 - 910 - 887 - 6777 |facilisi.Sed.nequ...| | 61 | Maxine| 1 - 422 - 863 - 3041 |aliquet.molestie....| | 77 | Maggy| 1 - 613 - 147 - 4380 | pellentesque @ mi.net| | 97 | Maxwell| 1 - 607 - 205 - 1273 |metus.In @ musAenea...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
对DataFrame里面的数据进行排序
使用sort函数我们可以对DataFrame中指定的列进行排序:
students.sort(students( "studentName" ).desc).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 50 | Yasir| 1 - 282 - 511 - 4445 |eget.odio.Aliquam...| | 52 | Xena| 1 - 527 - 990 - 8606 |in.faucibus.orci @ ...| | 86 | Xandra| 1 - 677 - 708 - 5691 |libero @ arcuVestib...| | 43 | Wynter| 1 - 440 - 544 - 1851 |amet.risus.Donec @ ...| | 31 | Wallace| 1 - 144 - 220 - 8159 | lorem.lorem @ non.net| | 66 | Vance| 1 - 268 - 680 - 0857 |pellentesque @ netu...| | 41 | Tyrone| 1 - 907 - 383 - 5293 |non.bibendum.sed @ ...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
也可以对多列进行排序:
students.sort( "studentName" , "id" ).show( 10 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| | 91 | Abel| 1 - 530 - 527 - 7467 | urna @ veliteu.edu| | 69 | Aiko| 1 - 682 - 230 - 7013 |turpis.vitae.puru...| | 47 | Alma| 1 - 747 - 382 - 6775 | nec.enim @ non.org| | 26 | Amela| 1 - 526 - 909 - 2605 | in @ vitaesodales.edu| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| +---+-----------+--------------+--------------------+ only showing top 10 rows |
从上面的结果我们可以看出,默认是按照升序进行排序的。我们也可以将上面的语句写成下面的:
students.sort(students( "studentName" ).asc, students( "id" ).asc).show( 10 ) |
这两个语句运行的效果是一致的。
对列进行重命名
如果我们对DataFrame中默认的列名不感兴趣,我们可以在select的时候利用as对其进行重命名,下面的列子将studentName
重命名为name,而email这列名字不变:
students.select(students( "studentName" ).as( "name" ), students( "email" )).show( 10 ) +--------+--------------------+ | name| email| +--------+--------------------+ | Burke|ullamcorper.velit...| | Kamal|pede.Suspendisse @ ...| | Olga|Aenean.eget.metus...| | Belle|vitae.aliquet.nec...| | Trevor|dapibus.id @ acturp...| | Laurel|adipiscing @ consec...| | Sara|Donec.nibh @ enimEt...| | Kaseem|cursus.et.magna @ e...| | Lev|Vivamus.nisi @ ipsu...| | Maya|accumsan.convalli...| +--------+--------------------+ only showing top 10 rows |
将DataFrame看作是关系型数据表
DataFrame的一个强大之处就是我们可以将它看作是一个关系型数据表,然后在其上运行SQL查询语句,只要我们进行下面两步即可实现:
(1)、将DataFrame注册成一张名为students的表:
students.registerTempTable( "students" ) |
(2)、然后我们在其上用标准的SQL进行查询:
sqlContext.sql( "select * from students where studentName!='''' order by email desc" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 87 | Selma| 1 - 601 - 330 - 4409 |vulputate.velit @ p...| | 96 | Channing| 1 - 984 - 118 - 7533 |viverra.Donec.tem...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 78 | Finn| 1 - 213 - 781 - 6969 |vestibulum.massa @ ...| | 53 | Kasper| 1 - 155 - 575 - 9346 |velit.eget @ pedeCu...| | 63 | Dylan| 1 - 417 - 943 - 8961 |vehicula.aliquet @ ...| | 35 | Cadman| 1 - 443 - 642 - 5919 |ut.lacus @ adipisci...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
对两个DataFrame进行Join操作
前面我们已经知道如何将DataFrame注册成一张表,现在我们来看看如何使用普通的SQL对两个DataFrame进行Join操作。
1、内联:内联是默认的Join操作,它仅仅返回两个DataFrame都匹配到的结果,来看看下面的例子:
val students 1 = sqlContext.csvFile(filePath = "E:\\StudentPrep1.csv" , useHeader = true , delimiter = ''|'' ) val students 2 = sqlContext.csvFile(filePath = "E:\\StudentPrep2.csv" , useHeader = true , delimiter = ''|'' ) val studentsJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" )) studentsJoin.show(studentsJoin.count.toInt) +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 |BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 |KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 |BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 |LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ |
2、右外联:在内连接的基础上,还包含右表中所有不符合条件的数据行,并在其中的左表列填写NULL ,来看看下面的实例:
val studentsRightOuterJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" ), "right_outer" ) studentsRightOuterJoin.show(studentsRightOuterJoin.count.toInt) +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 | BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 | KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 | BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 | LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | null | null | null | null | 999 |LevUniqueToSecondRDD| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ |
3、左外联:在内连接的基础上,还包含左表中所有不符合条件的数据行,并在其中的右表列填写NULL ,同样我们来看看下面的实例:
val studentsLeftOuterJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" ), "left_outer" ) studentsLeftOuterJoin.show(studentsLeftOuterJoin.count.toInt) +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 |BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 |KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 |BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 |LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | iteblog| 999999 | iteblog @ iteblog.com| null | null | null | null | +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ |
将DataFrame保存成文件
下面我来介绍如何将DataFrame保存到一个文件里面。前面我们加载csv文件用到了load函数,与之对于的用于保存文件可以使用save函数。具体操作包括以下两步:
1、首先创建一个map对象,用于存储一些save函数需要用到的一些属性。这里我将制定保存文件的存放路径和csv的头信息。
val saveOptions = Map( "header" -> "true" , "path" -> "iteblog.csv" ) |
为了基于学习的态度,我们从DataFrame里面选择出studentName和email两列,并且将studentName的列名重定义为name。
val copyOfStudents = students.select(students( "studentName" ).as( "name" ), students( "email" )) |
2、下面我们调用save函数保存上面的DataFrame数据到iteblog.csv文件夹中
copyOfStudents.write.format( "com.databricks.spark.csv" ).mode(SaveMode.Overwrite).options(saveOptions).save() |
mode函数可以接收的参数有Overwrite、Append、Ignore和ErrorIfExists。从名字就可以很好的理解,Overwrite代表覆盖目录下之前存在的数据;Append代表给指定目录下追加数据;Ignore代表如果目录下已经有文件,那就什么都不执行;ErrorIfExists代表如果保存目录下存在文件,那么抛出相应的异常。
需要注意的是,上述path参数指定的是保存文件夹,并不是最后的保存文件名。

Apache Spark:如何从DataFrame创建矩阵?
我在Apache
Spark中有一个带有整数数组的DataFrame,其源是一组图像。我最终想对它进行PCA,但仅从阵列创建矩阵时遇到了麻烦。如何从RDD创建矩阵?
> imagerdd = traindf.map(lambda row: map(float,row.image))
> mat = DenseMatrix(numRows=206456,numCols=10,values=imagerdd)
Traceback (most recent call last):
File "<ipython-input-21-6fdaa8cde069>",line 2,in <module>
mat = DenseMatrix(numRows=206456,values=imagerdd)
File "/usr/local/spark/current/python/lib/pyspark.zip/pyspark/mllib/linalg.py",line 815,in __init__
values = self._convert_to_array(values,np.float64)
File "/usr/local/spark/current/python/lib/pyspark.zip/pyspark/mllib/linalg.py",line 806,in _convert_to_array
return np.asarray(array_like,dtype=dtype)
File "/usr/local/python/conda/lib/python2.7/site- packages/numpy/core/numeric.py",line 462,in asarray
return array(a,dtype,copy=False,order=order)
TypeError: float() argument must be a string or a number
我想到的所有可能的安排都出现了相同的错误:
imagerdd = traindf.map(lambda row: Vectors.dense(row.image))
imagerdd = traindf.map(lambda row: row.image)
imagerdd = traindf.map(lambda row: np.array(row.image))
如果我尝试
> imagedf = traindf.select("image")
> mat = DenseMatrix(numRows=206456,values=imagedf)
追溯(最近一次通话):
File "<ipython-input-26-a8cbdad10291>",values=imagedf)
File "/usr/local/spark/current/python/lib/pyspark.zip/pyspark/mllib/linalg.py",in __init__
values = self._convert_to_array(values,np.float64)
File "/usr/local/spark/current/python/lib/pyspark.zip/pyspark/mllib/linalg.py",dtype=dtype)
File "/usr/local/python/conda/lib/python2.7/site-packages/numpy/core/numeric.py",order=order)
ValueError: setting an array element with a sequence.

org.apache.hadoop.hbase.codec.prefixtree.timestamp.data.TestTimestampDataRepeats的实例源码
项目:ditb
文件:TestTimestampData.java
public Collection<Object[]> getAllAsObjectArray() {
List<Object[]> all = Lists.newArrayList();
all.add(new Object[] { new TestTimestampDataBasic() });
all.add(new Object[] { new TestTimestampDatanumbers() });
all.add(new Object[] { new TestTimestampDataRepeats() });
return all;
}
项目:pbase
文件:TestTimestampData.java
public Collection<Object[]> getAllAsObjectArray() {
List<Object[]> all = Lists.newArrayList();
all.add(new Object[] { new TestTimestampDataBasic() });
all.add(new Object[] { new TestTimestampDatanumbers() });
all.add(new Object[] { new TestTimestampDataRepeats() });
return all;
}
项目:HIndex
文件:TestTimestampData.java
public Collection<Object[]> getAllAsObjectArray() {
List<Object[]> all = Lists.newArrayList();
all.add(new Object[] { new TestTimestampDataBasic() });
all.add(new Object[] { new TestTimestampDatanumbers() });
all.add(new Object[] { new TestTimestampDataRepeats() });
return all;
}
项目:PyroDB
文件:TestTimestampData.java
public Collection<Object[]> getAllAsObjectArray() {
List<Object[]> all = Lists.newArrayList();
all.add(new Object[] { new TestTimestampDataBasic() });
all.add(new Object[] { new TestTimestampDatanumbers() });
all.add(new Object[] { new TestTimestampDataRepeats() });
return all;
}
项目:c5
文件:TestTimestampData.java
public Collection<Object[]> getAllAsObjectArray() {
List<Object[]> all = Lists.newArrayList();
all.add(new Object[] { new TestTimestampDataBasic() });
all.add(new Object[] { new TestTimestampDatanumbers() });
all.add(new Object[] { new TestTimestampDataRepeats() });
return all;
}
今天关于Spark DataFrame TimestampType-如何从字段获取年,月,日值?和spark获取当前时间的讲解已经结束,谢谢您的阅读,如果想了解更多关于Apache Spark DataFrames 入门指南:操作 DataFrame、Apache Spark DataFrames入门指南:操作DataFrame、Apache Spark:如何从DataFrame创建矩阵?、org.apache.hadoop.hbase.codec.prefixtree.timestamp.data.TestTimestampDataRepeats的实例源码的相关知识,请在本站搜索。