Rows to Columns
Rows to Comma Separated String
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
FROM: +-----+-----+ | uid | tag | +-----+-----+ | 1 | 1 | | 1 | 2 | | 1 | 3 | | 2 | 1 | | 2 | 4 | | 3 | 5 | +-----+-----+ TO: +-----+----------+ | uid | tag_list | +-----+----------+ | 1 | 1,2,3 | | 2 | 1,4 | | 3 | 5 | +-----+----------+ |
MySQL
Use GROUP_CONCAT(). This function also works in Infobright.
GROUP_CONCAT([DISTINCT] expr [,expr …]
[ORDER BY {unsigned_integer | col_name | expr}
[ASC | DESC] [,col_name …]]
[SEPARATOR str_val])
1 2 3 4 5 6 7 8 9 |
mysql> select uid, group_concat(DISTINCT tag SEPARATOR ',') AS tag_list from test group by uid; +-----+----------+ | uid | tag_list | +-----+----------+ | 1 | 1,2,3 | | 2 | 1,4 | | 3 | 5 | +-----+----------+ 3 rows in set (0.00 sec) |
Hive
For Hive, I use collect_set() + concat_ws() from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF.
But if you want to remove duplicated elements, write your own UDF should be the only choice till now.
1 2 3 4 5 6 7 8 9 10 |
hive> select uid, concat_ws(',', collect_set(tag)) from test group by uid; FAILED: SemanticException [Error 10016]: Line 1:27 Argument type mismatch 'tag': Argument 2 of function CONCAT_WS must be "string or array<string>", but "array<int>" was found. hive> select uid, concat_ws(',', collect_set(CAST(tag AS STRING))) from test group by uid; ... Job 0: Map: 3 Reduce: 1 Cumulative CPU: 8.43 sec HDFS Read: 890 HDFS Write: 18 SUCCESS Total MapReduce CPU Time Spent: 8 seconds 430 msec OK 1 2,1,3 2 1,4 3 5 |
Presto
Don’t know how. :(
Impala
Impala also has a group_concat() but different from mysql
group_concat(string s [, string sep])
Purpose: Returns a single string representing the argument value concatenated together for each row of the result set. If the optional separator string is specified, the separator is added between each pair of concatenated values.
Return type: stringUsage notes: concat() and concat_ws() are appropriate for concatenating the values of multiple columns within the same row, while group_concat() joins together values from different rows.
By default, returns a single string covering the whole result set. To include other columns or values in the result set, or to produce multiple concatenated strings for subsets of rows, include a GROUP BY clause in the query.
1 2 3 4 5 6 7 8 9 10 |
[hadoop4.xxx.com:21000] > select uid, group_concat(cast(tag as string), ',') as tag_list from test3 group by uid; Query: select uid, group_concat(cast(tag as string), ',') as tag_list from test3 group by uid +-----+----------+ | uid | tag_list | +-----+----------+ | 3 | 5 | | 2 | 1,4 | | 1 | 1,2,3 | +-----+----------+ Returned 3 row(s) in 0.68s |
Rows to Columns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
FROM: +------+------+------+ | uid | tag | val | +------+------+------+ | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | +------+------+------+ TO: +------+----------+----------+----------+----------+----------+ | uid | tag1_val | tag2_val | tag3_val | tag4_val | tag5_val | +------+----------+----------+----------+----------+----------+ | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ |
MySQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
mysql> select -> uid, -> max(case when tag=1 then val else 0 end) as tag1_val, -> max(case when tag=2 then val else 0 end) as tag2_val, -> max(case when tag=3 then val else 0 end) as tag3_val, -> max(case when tag=4 then val else 0 end) as tag4_val, -> max(case when tag=5 then val else 0 end) as tag5_val -> from test1 -> group by uid; +------+----------+----------+----------+----------+----------+ | uid | tag1_val | tag2_val | tag3_val | tag4_val | tag5_val | +------+----------+----------+----------+----------+----------+ | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ |
Hive
Same as MySQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
hive> > select > uid, > max(case when tag=1 then val else 0 end) as tag1_val, > max(case when tag=2 then val else 0 end) as tag2_val, > max(case when tag=3 then val else 0 end) as tag3_val, > max(case when tag=4 then val else 0 end) as tag4_val, > max(case when tag=5 then val else 0 end) as tag5_val > from test1 > group by uid; ... MapReduce Total cumulative CPU time: 6 seconds 140 msec Ended Job = job_201402181530_0028 MapReduce Jobs Launched: Job 0: Map: 2 Reduce: 1 Cumulative CPU: 6.14 sec HDFS Read: 873 HDFS Write: 36 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 140 msec OK 1 1 0 1 0 0 2 1 0 0 0 0 3 0 0 0 0 1 Time taken: 16.068 seconds |
Presto
Same as MySQL & Hive.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
presto:temp_merge_db> select -> uid, -> max(case when tag=1 then val else 0 end) as tag1_val, -> max(case when tag=2 then val else 0 end) as tag2_val, -> max(case when tag=3 then val else 0 end) as tag3_val, -> max(case when tag=4 then val else 0 end) as tag4_val, -> max(case when tag=5 then val else 0 end) as tag5_val -> from test1 -> group by uid; uid | tag1_val | tag2_val | tag3_val | tag4_val | tag5_val -----+----------+----------+----------+----------+---------- 3 | 0 | 0 | 0 | 0 | 1 2 | 1 | 0 | 0 | 0 | 0 1 | 1 | 0 | 1 | 0 | 0 |
Impala
Still the same
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
[hadoop4.xxx.com:21000] > select > uid, > max(case when tag=1 then val else 0 end) as tag1_val, > max(case when tag=2 then val else 0 end) as tag2_val, > max(case when tag=3 then val else 0 end) as tag3_val, > max(case when tag=4 then val else 0 end) as tag4_val, > max(case when tag=5 then val else 0 end) as tag5_val > from test2 > group by uid; Query: select uid, max(case when tag=1 then val else 0 end) as tag1_val, max(case when tag=2 then val else 0 end) as tag2_val, max(case when tag=3 then val else 0 end) as tag3_val, max(case when tag=4 then val else 0 end) as tag4_val, max(case when tag=5 then val else 0 end) as tag5_val from test2 group by uid +-----+----------+----------+----------+----------+----------+ | uid | tag1_val | tag2_val | tag3_val | tag4_val | tag5_val | +-----+----------+----------+----------+----------+----------+ | 3 | 0 | 0 | 0 | 0 | 1 | | 2 | 1 | 0 | 0 | 0 | 0 | | 1 | 1 | 0 | 1 | 0 | 0 | +-----+----------+----------+----------+----------+----------+ Returned 3 row(s) in 0.99s |
Columns to Rows
Comma Separated String to Rows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
FROM: +-----+----------+ | uid | tag_list | +-----+----------+ | 1 | 1,2,3 | | 2 | 1,4 | | 3 | 5 | +-----+----------+ TO: +-----+-----+ | uid | tag | +-----+-----+ | 1 | 1 | | 1 | 2 | | 1 | 3 | | 2 | 1 | | 2 | 4 | | 3 | 5 | +-----+-----+ |
UNION [ALL] SELECT seems to be a solution.
MySQL
And…A Stored Procedure or a UDF?
Hive
Lateral View is AWESOME!
I tried explode() which can split an array into rows and before that split() which split string into array.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
hive> select uid, tag from test4 LATERAL view explode(split(tag_list,',')) tag_table as tag; ... Job 0: Map: 1 Cumulative CPU: 1.69 sec HDFS Read: 293 HDFS Write: 24 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 690 msec OK 1 1 1 2 1 3 2 1 2 4 3 5 Time taken: 12.894 seconds hive> |
Presto
Not figured out.
Impala
Not figured out.
Columns to Rows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
FROM: +------+----------+----------+----------+----------+----------+ | uid | tag1_val | tag2_val | tag3_val | tag4_val | tag5_val | +------+----------+----------+----------+----------+----------+ | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ TO: +------+------+------+ | uid | tag | val | +------+------+------+ | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | +------+------+------+ |
UNION!
Incoming search terms:
- GROUP CONCAT IMPALA
- hive transpose
- IMPALA transpose ROWS to column
- impala sql for transpose
- transpose row to columns in impala
- impala concat
- hive group by column and concat column
- wrong75u
- impala group_concat
- impala groupconcat
- How to join row values to column names impala
- hive sql transpose
- allowcfp
- group_concatimpala
- groupy96
- fruit5qv
- familyc82
- eveningrl3
- concat impala
- completelyv53