Quantcast
Channel: Read parquet files in Spark with pattern matching - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Read parquet files in Spark with pattern matching

$
0
0

I'm running Spark 1.3.0 and want to read a number of parquet files based on pattern matching. the parquet files are basically the underlying files of a Hive DB and I want to read some of the files (across different folders) only. the folder structure is

hdfs://myhost:8020/user/hive/warehouse/db/blogs/some/meta/files/hdfs://myhost:8020/user/hive/warehouse/db/blogs/yymmdd=20160101/01/file1.parq         hdfs://myhost:8020/user/hive/warehouse/db/blogs/yymmdd=20160101/02/file2.parqhdfs://myhost:8020/user/hive/warehouse/db/blogs/yymmdd=20160103/01/file3.parq

Something like

val v1 = sqlContext.parquetFile("hdfs://myhost:8020/user/hive/warehouse/db/blogs/yymmdd={[0-9]*}")

I want to ignore the meta files and load only the parquet files inside the date folders. Is this possible?


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles



Latest Images