Overview of Python in Data analytics
Introduction:
Python is one of the most important programing languages
which is used in the data analytics nowadays, Many data scientists prefer it
for its simplicity and it’s helpful libraries like pandas, Matplotlib,
SciKit-Learn,
BeautifulSoup, and PyTorch. It helps in processing complex data in
this era where the data is in petabyte.
Essential Python
libraries introductory:
NumPy:
Is the short for Numerical Python it
is important in the scientific computing analysis, provides narray objects efficiently,
and linear algebra operations
Pandas:
Pandas’ name came from panel data. It
helps in making structuring data easier by providing data structure and
functions made especially for it.
Matplotlib:
Is the best known Python library for
providing interactive plots as well as many 2D data
Visualizations.
IPython:
It provides a robust and productive
environment for interactive and exploratory computing,it provides a mathematica
to connect IPython through a web browser, and an infrastructure for interactive parallel and
distributed computing.
SciPy:
A packages addressing different
scientific computing issues
Python can be used in web applications as well as the desktop applications it is well known for its simplicity all thank to its libraries. It is object-oriented, high-level programming language with dynamic semantics.
Python in Data
analytics applications:
RFM:
Recency, frequency,
and mandatory is important in the business analytics industry.
Recency helps in
determining the last time the costumer purchased.
Frequency helps in
determining how often the costumer purchases.
Mandatory helps in
determining how much the costumer spends when he is purchasing from us.
The RFM helps the
business analytics to know their costumers more thus increase the profits, The
customers are then ranked according to their RFM values.
Python helps in
calculating the RFM with its libraries ( pandas, DateTime, and NumPy)
Pandas helps in reading the data , DateTime helps in calculating the difference between the dates to determine the recency and the frequency, and NumPy helps in ranking the customers in order to determine the most and least loyal ones and after determining them the business analytics can decide how to gain more customers.
Web scraping:
Is
a process of collecting raw data from the Web using automated method, But some
webs forbid scrapping and they have their good reasons to protect their data.
Python
provide easy ways to make the web scrapping more powerful.
Urllib is python standard library which helps in dealing with links to help accessing the web we want to scrap, BeautifulSoup helps in scraping the information from the web
Market basket analysis:
It is one of the best
applications in retrial industry, industries need to mine and analyst their
database to understand the data’s pattern, Correlation Relationships among the
data is very helpful in transactions, decision making and recognizing the
customer’s behavior in a large data set.
To determine the
history of:
Products that are
likely purchased together
Products that are
likely sequentially purchased
Products that are
purchased seasonally
It helps in choosing
the best promotion, increase revenue and decrease the expenses
First we need to
calculate the support by using the sum of the two items together then dividing
them by the total number of all the items.
And the confidence of
item one to item two by taking the sum of the two items showing together and
dividing them by the total of item 1 showing.
After that we will be
able to calculate the lift by dividing the confidence of item 1 to item 2 by
item1 divided by item two
Limitations:
1-It takes long time
to be implemented and may require regression and decision tree analysis skills
and other more.
2-Sometimes hard to
determine the product groupings
3-Complexity grows
exponentially with size
Association Rule for
Market basket Analysis:
The market places use the association rule to
know which application is more likely to be purchased after the other
(antecedent , consequent)
Association rule have
associated population which consists of instances.
MBA by python:
We use the pandas,
NumPy and apyori (used as API).
It is one of the best
applications in retrial industry, industries need to mine and analyst their
database to understand the data’s pattern, Correlation Relationships among the
data is very helpful in transactions, decision making and recognizing the
customer’s behavior in a large data set.
To determine the
history of:
Products that are
likely purchased together
Products that are
likely sequentially purchased
Products that are
purchased seasonally
It helps in choosing
the best promotion, increase revenue and decrease the expenses
First we need to
calculate the support by using the sum of the two items together then dividing
them by the total number of all the items.
And the confidence of
item one to item two by taking the sum of the two items showing together and
dividing them by the total of item 1 showing.
After that we will be
able to calculate the lift by dividing the confidence of item 1 to item 2 by
item1 divided by item two
Limitations:
1-It takes long time
to be implemented and may require regression and decision tree analysis skills
and other more.
2-Sometimes hard to
determine the product groupings
3-Complexity grows
exponentially with size
Association Rule for
Market basket Analysis:
The market places use the association rule to
know which application is more likely to be purchased after the other
(antecedent , consequent)
Association rule have
associated population which consists of instances.
MBA by python:
We use the pandas,
NumPy and apyori (used as API).
- Salma Allam
- Mar, 28 2022