Imp. Concepts of SQL with Examples. Ch-1
*Topics are:
1.COUNT( ) vs. COUNT(*)
2.GROUP BY vs. PARTITION BY
3.ROW_NUMBER vs. RANK vs. DENSE_RANK
---- > First, let us create a table in MySQL and visualize it.
----------------------------------------------------------------------------
CREATE TABLE EMPLOYEE ( employee_id int primary key,
employee_name text , dept text ,salary int, gender text);
INSERT INTO EMPLOYEE ( employee_id, employee_name, dept, salary, gender)
VALUES (0001, 'Clark', 'Accounting', 140000, 'm'),
(0002, 'Dave', 'Accounting', 230000, 'm'),
(0003, 'Eva', 'Marketing', null, 'f'),
(0004, 'Joy', 'Marketing', 160000, 'm'),
(0005, 'Danial', 'Accounting', 150000, 'm'),
(0006, 'Alexa', 'Accounting', 150000, 'f');
Select * From EMPLOYEE;
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
employee_id | employee_name | dept | salary | gender |
---|---|---|---|---|
1 | Clark | Accounting | 140000 | m |
2 | Dave | Accounting | 230000 | m |
3 | Eva | Marketing | NULL | f |
4 | Joy | Marketing | 160000 | m |
5 | Danial | Accounting | 150000 | m |
6 | Alexa | Accounting | 150000 | f |
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
1.COUNT( ) vs. COUNT(*)
● COUNT( ) :
(1) It counts the data from the columns. The value which we put in the bracket, 'COUNT' will take it as a parameter(column_name) and return the data.
---------------------------
Select Count(employee_name) as total_employee
From EMPLOYEE ;
---------------------------
Total_employee |
---|
6 |
(2) It will not consider ‘null’ values while counting.
------------------------
Select Count(salary) as salary_count
From EMPLOYEE ;
---------------------------
salary_count |
---|
5 |
*Note: For employee_id = 3, Salary is not specified or null. So it wont be counted in total count.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
● COUNT(*) :
(1) In this, ' * ' is given as a parameter, which means 'All records' in table.
(2) It will count out each and every row in table.
---------------------------
Select Count(*) as all_rows
From EMPLOYEE
---------------------------
all_rows |
---|
6 |
*Note: generally count(*) will count all rows irrespective of any entry being 'null'.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2.GROUP BY vs. PARTITION BY
● GROUP BY:
(1) As it suggests, it will make groups for the same type of records in a column.
(2) Generally it helps in data fetching operations for aggregating functions like SUM, AVERAGE,COUNT. In result table, it will give only one row/record per group.
---------------------------
Select dept, sum(salary) as dept_wise_total_salary
from EMPLOYEE
group by dept;
---------------------------
dept | dept_wise_total_salary | |
---|---|---|
Accounting | 670000 | |
Marketing | 160000 |
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(3) In some cases, In select clause, apart from aggregated column value; if any other column is mentioned which is not part of Groupby clause, result will show error.
---------------------------
Select employee_name, dept , sum(salary) as dept_wise_total_salary
from EMPLOYEE
group by dept;
---------------------------
*Throwing an error* |
*Note: Here three column were asked to fetch. ‘dept’ column is part of GroupBy clause and ‘salary’ column is used with aggregator function SUM. Here, it will throw error because column 'employee_name' is not a part of any of these.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
● PARTITION BY:
(1) It gives aggregated columns with each record in the given table.
(2)It keeps original values while also allowing to produce aggregated values. It is used with OVER clause.
---------------------------
select dept,
Count(employee_name) Over (partition by dept) as dep_wise_employee
from EMPLOYEE;
---------------------------
dept | dep_wise_employee |
---|---|
Accounting | 4 |
Accounting | 4 |
Accounting | 4 |
Accounting | 4 |
Marketing | 2 |
Marketing | 2 |
*Note: Here partition by is with each dept and asked to count employee_name for same. So record is shown for every row.
(3) Apart from asked column and aggregated column if any other column is specified in Select query then it will also give it as an output.
---------------------------
Select employee_name,salary, gender,
Count(employee_name) Over (partition by gender) as gen_wise_employee
from EMPLOYEE;
---------------------------
employee_name | salary | gender | gen_wise_employee |
---|---|---|---|
Eva | NULL | f | 2 |
Alexa | 150000 | f | 2 |
Clark | 140000 | m | 4 |
Dave | 230000 | m | 4 |
Joy | 160000 | m | 4 |
Danial | 150000 | m | 4 |
*Note: Here column ‘employee_name’ is associated with count aggregator function and partition is happening for different gender. The ‘salary’ column which is not related to any of these can also be fetched as shown above.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. Row_number, Rank & Dense_Rank
● Row_number ():
(1)It gives the serial no. to every row.
(2)The ranking is based on given column partition with Over Clause.
---------------------------
select *,
row_number() over (partition by gender) as number
from EMPLOYEE;
---------------------------
employee_id | employee_name | dept | salary | gender | number |
---|---|---|---|---|---|
3 | Eva | Marketing | NULL | f | 1 |
6 | Alexa | Accounting | 150000 | f | 2 |
1 | Clark | Accounting | 140000 | m | 1 |
2 | Dave | Accounting | 230000 | m | 2 |
4 | Joy | Marketing | 160000 | m | 3 |
5 | Danial | Accounting | 150000 | m | 4 |
(3) When results are fetched in either ascending or descending order, the row number is keep increasing by 1 unit, keeping 1 as a start value. It does not care about duplicate values.
---------------------------
select *,
row_number() over (partition by dept order by salary) as number
from EMPLOYEE;
---------------------------
employee_id | employee_name | dept | salary | gender | number |
---|---|---|---|---|---|
1 | Clark | Accounting | 140000 | m | 1 |
5 | Danial | Accounting | 150000 | m | 2 |
6 | Alexa | Accounting | 150000 | f | 3 |
2 | Dave | Accounting | 230000 | m | 4 |
3 | Eva | Marketing | NULL | f | 1 |
4 | Joy | Marketing | 160000 | m | 2 |
*Note: Here column ‘dept' is used for partition and with order by column 'Salary' in ascending manner. Here salary is same for employee_id =5 and 6 , which is 15000. In such cases the duplication is ignored and given upcoming regular value to record.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
● Rank ():
(1)It is used to specify rank for each row in result set.
(2) In clause for Partitionby, if it find same values; it will give them all ‘1’ as a result.
---------------------------
select *,
rank() over (partition by gender order by salary) as rnk_1
from EMPLOYEE;
---------------------------
employee_id | employee_name | dept | salary | gender | rnk_1 |
---|---|---|---|---|---|
3 | Eva | Marketing | NULL | f | 1 |
6 | Alexa | Accounting | 150000 | f | 2 |
1 | Clark | Accounting | 140000 | m | 1 |
5 | Danial | Accounting | 150000 | m | 2 |
4 | Joy | Marketing | 160000 | m | 3 |
2 | Dave | Accounting | 230000 | m | 4 |
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(3)It gives the same rank (n) to duplicate records and does not give (n+1) rank to next record. Instead, it gives (n+2) rank to next record.
---------------------------
select *,
rank() over (partition by dept order by salary) as rank_no.
from EMPLOYEE;
---------------------------
employee_id | employee_name | dept | salary | gender | rank_no. |
---|---|---|---|---|---|
1 | Clark | Accounting | 140000 | m | 1 |
5 | Danial | Accounting | 150000 | m | 2 |
6 | Alexa | Accounting | 150000 | f | 2 |
2 | Dave | Accounting | 230000 | m | 4 |
3 | Eva | Marketing | NULL | f | 1 |
4 | Joy | Marketing | 160000 | m | 2 |
*Note : Here is data is fetched for partition with dept and then ascending order with salary, since employee_id = 5 and 6 has same salary being in same department , RANK=2 will be given to both of them. For the next record, it will skip the upcoming value and give next to next (n+2) [here 2+2 = 4] value.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
● Dense_rank() :
(1) It is also used to specify rank for each row in result set.
(2) Same as rank here also ranking is based on any Column Partition with Over Clause.
---------------------------
select *,
rank() over (partition by gender order by salary) as d_rank
from EMPLOYEE;
---------------------------
employee_id | employee_name | dept | salary | gender | d_rank |
---|---|---|---|---|---|
3 | Eva | Marketing | NULL | f | 1 |
6 | Alexa | Accounting | 150000 | f | 2 |
1 | Clark | Accounting | 140000 | m | 1 |
5 | Danial | Accounting | 150000 | m | 2 |
4 | Joy | Marketing | 160000 | m | 3 |
2 | Dave | Accounting | 230000 | m | 4 |
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(3)The difference is ,it gives the same rank (n) to duplicate records and does not skip next value as in case of RANK. it will follow regular order after finding duplicates.
---------------------------
select *,
dense_rank() over(partition by dept order by salary) as d_rank
from EMPLOYEE;
---------------------------
employee_id | employee_name | dept | salary | gender | d_rank |
---|---|---|---|---|---|
1 | Clark | Accounting | 140000 | m | 1 |
5 | Danial | Accounting | 150000 | m | 2 |
6 | Alexa | Accounting | 150000 | f | 2 |
2 | Dave | Accounting | 230000 | m | 3 |
3 | Eva | Marketing | NULL | f | 1 |
4 | Joy | Marketing | 160000 | m | 2 |
*Note : Here is data is fetched for partition with dept and then ascending order with salary, since employee_id = 5 and 6 has same salary being in same department , RANK will give them same value. For next record it will skip the upcoming value and give next to next (n+2) value to record.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
-------------------------- Few concepts discussed here and next will be covered in second part of this blog ---------------------------------
- Harsh Mehta
- Oct, 17 2022