Wednesday, 16 September 2020

Introduction to SQL Joins (Theory)

In this post we will discuss about Spark SQL JOINS theoretically. We will have a couple of posts with in detail explanation with examples for each Join type.


Suppose there are 2 tables TableA and TableB with only 2 columns (Id, Data) and following data:

MyTable:
+----+---------+
| Id | Data |
+----+---------+
| 1 | MyData01|
| 1 | MyData10|
| 1 | MyData11|
| 2 | MyData02|
| 3 | MyData03|
+----+---------+
YourTable:
+----+---------+
| Id | Data |
+----+---------+
| 1 | UrData01|
| 2 | UrData02|
| 2 | UrData20|
| 2 | UrData22|
| 4 | UrData04|
+----+---------+
Inner Join on column Id will return columns from both the tables and only the matching records:
.----.---------.----.---------.
| Id |  Data   | Id |  Data   |
:----+---------+----+---------:
|  1 | DataA11 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA12 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA13 |  1 | DataB11 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB21 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB22 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB23 |
'----'---------'----'---------'
Left Join (or Left Outer join) on column Id will return columns from both the tables and matching records with records from left table (Null values from right table):
.----.---------.----.---------.
| Id |  Data   | Id |  Data   |
:----+---------+----+---------:
|  1 | DataA11 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA12 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA13 |  1 | DataB11 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB21 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB22 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB23 |
:----+---------+----+---------:
|  3 | DataA31 |    |         |
'----'---------'----'---------'
Right Join (or Right Outer join) on column Id will return columns from both the tables and matching records with records from right table (Null values from left table):
┌────┬─────────┬────┬─────────┐
 Id   Data    Id   Data   
├────┼─────────┼────┼─────────┤
  1  DataA11   1  DataB11 
  1  DataA12   1  DataB11 
  1  DataA13   1  DataB11 
  2  DataA21   2  DataB21 
  2  DataA21   2  DataB22 
  2  DataA21   2  DataB23 
               4  DataB41 
└────┴─────────┴────┴─────────┘
Full Outer Join on column Id will return columns from both the tables and matching records with records from left table (Null values from right table) and records from right table (Null values from left table):
╔════╦═════════╦════╦═════════╗
 Id   Data    Id   Data   
╠════╬═════════╬════╬═════════╣
  -                       
  1  DataA11   1  DataB11 
  1  DataA12   1  DataB11 
  1  DataA13   1  DataB11 
  2  DataA21   2  DataB21 
  2  DataA21   2  DataB22 
  2  DataA21   2  DataB23 
  3  DataA31              
               4  DataB41 
╚════╩═════════╩════╩═════════╝
Left Semi Join on column Id will return columns only from left table and matching records only from left table:
┌────┬─────────┐
 Id   Data   
├────┼─────────┤
  1  DataA11 
  1  DataA12 
  1  DataA13 
  2  DataA21 
└────┴─────────┘

No comments:

Post a Comment