SQL Joins and Window Functions – Codango® / Codango.Com

1. JOINS

Sometimes it is very difficult to look through data to get specific things. For example, you want the name of customers who made purchases as well as the name of the products they bought.
You have a sales table which only has customer_id and product_id, the product names are in a different table say the products table and the names of the customers are found in the customers table.
As a normal person you would open the sales table and note down the Product IDs and the Customer IDs then move to the customers table and get the names of the customers represented by the specific customer ID then finally move to the products table and do the same.
That’s so hectic!
Makes you wonder….what if there was a way to link all these tables and see whatever I want without having to open each table alone?

Well my friend, you are lucky!
I introduce to you the concept of JOINS.
Joins provide a way that enables us connect multiple tables so that you can SELECT the columns you want to see from those tables and display all of them simultaneously. It literally joins tables as long as there exists a common column in the tables being joined together. The most IMPORTANT thing to remember is that there must be a common column.
Lets see how joins work.

Inner Join

Inner Join displays something that is present in all tables. If say a 1 corresponds to cup (1 = cup) in table A and table B has only digit 1. Then inner join identifies that digit 1 in table B matches digit 1 in table A and they both can be said to correspond to cup.

We will use the tables below for examples.

create schema joins
set search_path to joins;

CREATE TABLE employees
(employee_id SERIAL PRIMARY KEY, 
name VARCHAR (100) NOT NULL, 
department_id INT, 
manager_id INT,
salary NUMERIC (8, 2)
) ;
INSERT INTO employees (name, department_id, manager_id, salary) 
VALUES 
('Alice', 1, NULL, 50000),
('Bob',2, 1, 45000),
('Charlie', 1, 1, 47000), 
('Diana', 3, NULL, 60000),
('Eve',NULL, NULL, 40000);
select * from employees;

CREATE TABLE departments 
( department_id INT PRIMARY KEY,
department_name VARCHAR (100));

INSERT INTO departments (department_id, department_name) 
VALUES 
(1, 'Engineering'),
(2,'Sales'),
(3,'Marketing'),
(4, 'Finance');

select * from departments;

CREATE TABLE projects 
( project_id SERIAL PRIMARY KEY, 
project_name VARCHAR (100),
 employee_id INT
) ;


INSERT INTO projects (project_name, employee_id) 
VALUES 
('Redesign Website', 1),
('Customer Survey',2),
('Market Analysis',NULL),
('Mobile App Dev', 3),
('Budget Planning', NULL);

select * from projects;

**Inner join example**

select name, project_name from employees e
inner join projects p
on e.employee_id = p.employee_id;

Explanation: The query joins the projects and employees tables and only selects the names of employees and projects assigned to them.
NOTE the common column here is employee_id on both tables. It establishes the relationship between the two tables.

Left Join

For left join and right join I need you to picture two tables.
Let us call one the primary table and the other a secondary table.
The primary table is our main point of reference meaning that this table remains constant and its contents are sort of static. You are the one to choose which table will be primary or secondary.

Look at the statement below and try to understand it.
select name, project_name from primary_table p
left join secondary_table s
on s.common_column = p.common_column;

Now below is an example of a real query

select name, project_name from employees e
left join projects p
on e.employee_id = p.employee_id;

Explanation: Look at the results the column on the left side shows names of employees selected from employees table. In this example the employees e is the PRIMARY table. Note that there are no NULLS in the primary table but on the project_name column we have two NULLS this is because this column comes from a secondary table called projects p
In the left join the primary table is the first table you selected your columns from in this case the ’employees e’ table.
The primary table displays all the things it contains based on column selected. The secondary table appears after the keyword join.
Note that that is for left joins.
Simply, the left table is the primary table

Right Joins

Right joins are the total opposites of left joins.
Look at this general syntax note the position of primary and secondary tables

select name, project_name from secondary_table s
right join primary_table p
on s.common_column = p.common_column;

select name, project_name from employees e
right join projects p
on e.employee_id = p.employee_id;

Look at where the NULLS went to. The secondary table.
So the primary table in this case in on our right hand.
Here NULL shows that there were projects not assigned to any employees.

Full Outer Join

It joins data from both left and right join here both tables are treated as equals

select name, project_name from employees e
full outer join projects p
on e.employee_id = p.employee_id;

See how NULLS appear on both sides. Shows projects not assigned to employees and employees not assigned any projects.

Natural Join

Natural join unites two tables using a common column without you having to use ON keyword.
It just checks for common columns between the two tables and uses them to establish a relationship
For example

select name, project_name from employees e
natural join projects p;

Note the results are similar to inner join. However, natural joins need to be used with caution. It is better not to use it if you are not sure about columns in the tables.

2. Window Functions

Window functions enable us to do calculations on rows. There is some sort of relationship between the rows which we will talk about at the end.
Some of the window functions include: Rank(), Dense_Rank(), lag(), lead(),row_number(), ntile()

Rank() and dense_rank()

They both assign a rank to each row. Just like in a race where we have the race leader, others in between and finally the last person.
So we could rank rows based on total amount spent, total quantity sold etc..
For example rank departments by the number of employees they contain

select department_name, count(*),
rank() over (order by count(*)desc) as rank_
from departments d
join employees e
on e.department_id = d.department_id
group by department_name;

The difference between the dense_rank and rank is that rank skips a number if there is a tie i.e 1,1,1,4 while dense_rank does not skip a number i.e 1,1,1,2

Lead() and lag()

Lead will show what the next value i.e the value in the row that follows.

For example, show the salaries for the next employee on the table.

select name, salary,
lead (salary) over (order by salary) as next_salary 
from employees e;

For lead, the last value is always a null and vice versa for lag

Example show the salary of the preceding employee

select name, salary,
lag (salary) over (order by salary) as next_salary 
from employees e;

Row_number()

Assigns a unique number to each row.

select name, employee_id,
row_number() over (order by employee_id) as row_no 
from employees;

NTILE()

Distributes data into specific segments/groups based on a certain criteria.

For example, distribute employees based on their salary into 3 different groups.

select name, salary,
ntile(3) over (order by salary desc) as tile 
from employees e;

From the above the ones earning highest are placed in tier 1 and lowest in tier 3.

Conclusion

Window functions enable us to carry our calculations per row. As seen above you do notice the relationship between rows that we said. For rank it compares the value in every row and the current row and assigns a number to that row in relation to other rows.
Lead and lag depend on the next row and the preceding row respectively to give an output.
Window functions are useful especially in live data when you want to get the customer who bought first or the customer with the highest purchases etc..

Now let’s practice!!!