-
Notifications
You must be signed in to change notification settings - Fork 474
/
Copy pathpyspark-session-2021-10-25-RDD-join.txt
executable file
·60 lines (57 loc) · 1.16 KB
/
pyspark-session-2021-10-25-RDD-join.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Inner Join Example
$ pyspark
Python 3.7.10 (default, Jun 3 2021, 00:02:01)
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.5-amzn-0
/_/
Using Python version 3.7.10 (default, Jun 3 2021 00:02:01)
SparkContext available as 'sc'.
SparkSession available as 'spark'.
>>>
>>>
>>> x = sc.parallelize([("a", 1), ("a", 4), ("b", 7), ("b", 8), ("c", 89)])
>>> y = sc.parallelize([("a", 100), ("a", 400), ("b", 700), ("b", 800), ("b", 900), ("d", 890)])
>>> x.collect()
[
('a', 1), ('a', 4),
('b', 7), ('b', 8),
('c', 89)
]
>>> y.collect()
[
('a', 100), ('a', 400),
('b', 700), ('b', 800), ('b', 900),
('d', 890)
]
>>> joined = x.join(y)
>>> joined.collect()
[
('b', (7, 800)),
('b', (7, 900)),
('b', (7, 700)),
('b', (8, 800)),
('b', (8, 900)),
('b', (8, 700)),
('a', (1, 100)),
('a', (1, 400)),
('a', (4, 100)),
('a', (4, 400))
]
>>> joined2 = y.join(x)
>>> joined2.collect()
[
('b', (700, 8)),
('b', (700, 7)),
('b', (800, 8)),
('b', (800, 7)),
('b', (900, 8)),
('b', (900, 7)),
('a', (100, 4)),
('a', (100, 1)),
('a', (400, 4)),
('a', (400, 1))
]
>>>