Imp. Concepts of SQL with Examples. Ch-1
Data Analytics

Imp. Concepts of SQL with Examples. Ch-1

 *Topics are:   

 

 1.COUNT( ) vs. COUNT(*)

 2.GROUP BY vs. PARTITION BY

 3.ROW_NUMBER vs. RANK vs. DENSE_RANK

 

---- > First, let us create a table in MySQL and visualize it.

----------------------------------------------------------------------------

 CREATE TABLE EMPLOYEE ( employee_id int primary key,

employee_name text , dept text ,salary int, gender text);

 

INSERT INTO EMPLOYEE ( employee_id, employee_name, dept, salary, gender)

VALUES (0001, 'Clark', 'Accounting', 140000, 'm'),

(0002, 'Dave', 'Accounting', 230000, 'm'),

(0003, 'Eva', 'Marketing', null, 'f'),

 (0004, 'Joy', 'Marketing', 160000, 'm'),

(0005, 'Danial', 'Accounting', 150000, 'm'),

(0006, 'Alexa', 'Accounting', 150000, 'f');

 

Select * From EMPLOYEE; 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

employee_idemployee_namedeptsalarygender
1ClarkAccounting140000m
2DaveAccounting230000m
3EvaMarketingNULLf
4JoyMarketing160000m
5DanialAccounting150000m
6AlexaAccounting150000f

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

 1.COUNT( ) vs. COUNT(*)

 

● COUNT( ) :

(1) It counts the data from the columns. The value which we put in the bracket,  'COUNT' will take it as a parameter(column_name) and return the data.

---------------------------

Select Count(employee_name) as total_employee
From EMPLOYEE ;


---------------------------

Total_employee
6

 

(2) It will not consider ‘null’ values while counting.

------------------------

Select Count(salary) as salary_count
From EMPLOYEE ;


---------------------------

salary_count
5

*Note: For employee_id = 3, Salary is not specified or null. So it wont be counted in total count.

 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

● COUNT(*)  : 

(1) In this,  ' * '  is given as a parameter, which means  'All records' in table. 
 

(2) It will count out each and every row in table. 

---------------------------

Select Count(*) as all_rows

From EMPLOYEE

---------------------------

all_rows
6

*Note: generally count(*) will count all rows irrespective of any entry being 'null'.

 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

2.GROUP BY vs. PARTITION BY 

 

● GROUP BY: 

(1) As  it suggests, it will make groups for the same type of records in a column. 
 

(2) Generally it helps in data fetching operations for aggregating functions like SUM, AVERAGE,COUNT. In result table, it will give only one row/record per group.

---------------------------

Select dept, sum(salary) as dept_wise_total_salary
from EMPLOYEE
group by dept;

---------------------------

deptdept_wise_total_salary 
Accounting670000
Marketing160000

 

 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

(3)  In some cases, In select clause,  apart from aggregated column value; if any other column is mentioned which is not part of Groupby clause, result will show error.

---------------------------


Select employee_name, dept , sum(salary) as dept_wise_total_salary
from EMPLOYEE
group by dept;

---------------------------

 

*Throwing an error*

*Note: Here three column were asked to fetch. ‘dept’ column is part of GroupBy clause and ‘salary’ column is used with aggregator  function SUM. Here, it will throw error because column 'employee_name' is not a part of any of these. 

 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

● PARTITION BY:  

(1) It gives aggregated columns with each record in the given table.

 

(2)It keeps original values while also allowing to produce aggregated values. It is used with OVER clause.

 

---------------------------

select  dept, 

Count(employee_name)  Over (partition by dept)  as dep_wise_employee

from EMPLOYEE;
 

---------------------------

deptdep_wise_employee
Accounting4
Accounting4
Accounting4
Accounting4
Marketing2
Marketing2

*Note: Here partition by is with each dept and asked to count employee_name for same. So record is shown for every row. 

 

(3) Apart from asked column  and aggregated column if any other column is specified in Select query then it will also give it as an output.

---------------------------

Select  employee_name,salary, gender,

Count(employee_name) Over (partition by gender)  as gen_wise_employee

from EMPLOYEE;

---------------------------

 

employee_namesalarygendergen_wise_employee
EvaNULLf2
Alexa150000f2
Clark140000m4
Dave230000m4
Joy160000m4
Danial150000m4

*Note:  Here column ‘employee_name’  is associated with count aggregator function and partition is happening for different gender. The ‘salary’ column which is not related to any of these can also be fetched as shown above.

 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←--------------------------------------------------→>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

 

3. Row_number, Rank  & Dense_Rank

 

● Row_number ():   

 

(1)It gives the serial no. to every row.

 

(2)The ranking is based on given column partition with Over Clause.

---------------------------

select *,
row_number() over (partition by gender) as number
from EMPLOYEE;

---------------------------

 

employee_idemployee_namedeptsalarygendernumber
3EvaMarketingNULLf1
6AlexaAccounting150000f2
1ClarkAccounting140000m1
2DaveAccounting230000m2
4JoyMarketing160000m3
5DanialAccounting150000m4

 

 

(3)  When results are fetched in either ascending or descending order, the row number is keep increasing by 1 unit, keeping 1 as a start value. It does not care about duplicate values. 
 

---------------------------

select *,
row_number() over (partition by dept order by salary) as number
from EMPLOYEE;

---------------------------

 

employee_idemployee_namedeptsalarygendernumber
1ClarkAccounting140000m1
5DanialAccounting150000m2
6AlexaAccounting150000f3
2DaveAccounting230000m4
3EvaMarketingNULLf1
4JoyMarketing160000m2

*Note:  Here column ‘dept'  is used for  partition and with  order by column 'Salary' in  ascending manner. Here salary is same for employee_id =5 and 6 , which is 15000. In such cases the duplication is ignored and given upcoming regular value to record.

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

● Rank (): 

(1)It is used to specify rank for each row in result set. 

 

(2)  In clause for Partitionby, if it find same values; it will give them all ‘1’ as a result. 

---------------------------

select *,
rank() over (partition by gender order by salary) as rnk_1
from EMPLOYEE;


---------------------------

employee_idemployee_namedeptsalarygenderrnk_1
3EvaMarketingNULLf1
6AlexaAccounting150000f2
1ClarkAccounting140000m1
5DanialAccounting150000m2
4JoyMarketing160000m3
2DaveAccounting230000m4

 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

(3)It gives the same rank (n) to  duplicate records and does not give (n+1) rank to next record. Instead, it gives (n+2)  rank to next record. 

 

---------------------------

select *,
rank() over (partition by dept order by salary) as rank_no.
from EMPLOYEE;


---------------------------

employee_idemployee_namedeptsalarygenderrank_no.
1ClarkAccounting140000m1
5DanialAccounting150000m2
6AlexaAccounting150000f2
2DaveAccounting230000m4
3EvaMarketingNULLf1
4JoyMarketing160000m2

*Note : Here is data is fetched for partition with dept and then ascending order with salary,  since employee_id = 5 and 6 has same salary being in same department , RANK=2 will be given to both of them. For the next record, it will skip the upcoming value and give next to next (n+2) [here 2+2 = 4] value.

 

 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

● Dense_rank() :

(1) It is also used to specify rank for each row in result set. 

 

(2) Same as  rank here also ranking is based on any Column Partition with Over Clause. 

---------------------------

select *,
rank() over (partition by gender order by salary) as d_rank
from EMPLOYEE;


---------------------------

 

employee_idemployee_namedeptsalarygenderd_rank
3EvaMarketingNULLf1
6AlexaAccounting150000f2
1ClarkAccounting140000m1
5DanialAccounting150000m2
4JoyMarketing160000m3
2DaveAccounting230000m4

 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<←------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

(3)The difference is ,it gives the same rank (n) to  duplicate records and does not skip next value as in case of RANK. it will follow regular order after finding duplicates. 

 

---------------------------

select *,

dense_rank() over(partition by dept order by salary) as d_rank

from EMPLOYEE;


---------------------------

employee_idemployee_namedeptsalarygenderd_rank
1ClarkAccounting140000m1
5DanialAccounting150000m2
6AlexaAccounting150000f2
2DaveAccounting230000m3
3EvaMarketingNULLf1
4JoyMarketing160000m2

*Note : Here is data is fetched for partition with dept and then ascending order with salary,  since employee_id = 5 and 6 has same salary being in same department , RANK will give them same value.  For next record it will skip the upcoming value and give next to next (n+2) value to record. 

 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------------------------------------------>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

-------------------------- Few concepts discussed here and next will be covered in second part of this blog ---------------------------------

 

 

  • Harsh Mehta
  • Oct, 17 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.