Essential Mathematics for Data Science
Introduction
“Data
Scientist is a professional who uses scientific methods and algorithms and
create a meaning from raw data.”
Data
Science is much interesting to learn and also most booming field but why people
fails in Data Science and they “QUIT”? Ever wondered. One of the
most common reasons for quitting Data Science because of lack of knowledge
of “FUNDAMENTALS”.
It’s a
clear that if you want to excel in certain field one needs have quite knowledge
of basics. Programming is one of the important basics of Data
Science. But the most vital subject is Math. We can say that Math
is the back-end for Data Science and Machine Learning. Because if you don’t
know the math behind the algorithm or the way the prediction values you are
getting then how you will convey your reports. So Math is one of the
most important fundamental of the data science and Machine Learning.
Math in
Data Science mainly comprises of Statistics, Probability, Linear
Algebra and Differential Calculus. Almost all the techniques of modern data
science have some deep mathematical concepts because as I usually say Math
is the back-end of the Machine Learning Algorithms. So this is one of the
important article in which we are going to study the essential math topics to
excel Data Science concepts.
Importance of Mathematics
Always remember applying for the position of Data Scientist doesn’t just require you to know the Tensor flow or some other machine learning frame work, what you require is to know the math behind an algorithm. You required to know how a cost function of a linear regression model is optimized, or what does the decision function for a linear SVM classifier do? and even many more the list will go on…
Different Mathematics topic distribution
When you
see the above representation you can clearly see the distribution of Math’s
topics needed for Data Science. Linear Algebra and Statistics +
Probability is most important branch covering 35% and 25%.
And other branches you need is calculus and Algorithmic Complexity.
Statistics
Statistics
in the must to know concept if you want to become a Data
Scientist. Because Statistics is the fuel of Data Science process. Many people
call Machine Learning as Statistical Learning because of scope of Statistics in
the area. Statistics is vast but if done properly you can find it much
easy.
Some
topics you should know are:
- Descriptive
Statistics, measures of central tendency, variance, standard deviation,
covariance, correlation.
- Basic
idea of probability, Conditional probability, Bayes Theorem.
- Probability
Distribution function which includes: Uniform, Normal, Binomial,
t-distribution, central limit theorem, etc.
- Hypothesis
testing
- A/B
testing, p values error measurement
- ANOVA,
t-test
- Least
square methods and regression.
You
should know the above concepts because you are going to use in your day to day
data science activities. During interviews you can
even impress your interviewer easily if you know the concepts of Statistics.
Linear Algebra
Have you
every thought how recommendation system works it works through Deep
learning concept and Linear Algebra. What is there in Linear Algebra?
Basically, Linear Algebra consist of Matrix Algebra. This is one of the
important branch of Mathematics which will help you to understand how Machine
Learning and Deep Learning algorithms work.
Some
topics you should know are:
- Basic
matrix operation: Scalar Multiplication, Transpose, determinant, etc.
- Matrix
multiplication and Inverse of Matrix
- Different
types of Matrix.
- Linear
system of equation
- Gauss-Jordan
elimination Gaussian elimination
- Vector
operations
- Eigen
values, Eigen vectors
- Diagonalization
- Projection
onto line and plane
- Singular
Value Decomposition (SVD)
SVD
concept is used in Dimension reduction and Principal Component Analysis (PCA).
Most of the Deep Learning and Neural Network concepts use Linear Algebra in
their algorithms.
Calculus
“Ladies
and Gentlemen Calculus the rebel of Mathematics here in the house.” Calculus is the topic most
of the people face difficulties and the only reason why people hate Math.
But the
truth is Calculus is used in various fields of Machine Learning and this is the
reason you should learn Calculus. There are many online resources which we will
discuss at the end of the blog.
So some
of the essential topics are:
- Limit
and Continuity
- Mean
value theorems
- L’Hospital
Rule
- Maxima
and Minima
- Product
chain rule
- Differential
Equations
- Beta
Gama functions
- Partial
Derivatives
- Gradient,
etc
Have you
wondered how logistic regression works. How Gradient Descent finds Minimum Loss
function. To understand this concept of calculus is important. Also there are
many algorithms who use calculus.
Discrete Mathematics
One of
the easiest and coolest branch of Mathematics. Modern Data Science and computational
systems has Discrete Math at its core. Many a times Discrete Math is
also used in analytics project.
So lets
discuss some of the important topics for Discrete Math
- Set
theory: Power set, super sets, Subsets , etc.
- Venn
diagrams
- Counting
functions
- Positional
Logic
- Basic
proofs: Induction and Contradiction
- Graph
Theory
- Basic
Data Structures
- Recurrence
Complexity concepts and many more
When you
study any algorithm then you need to understand the time space complexities for
this purpose Discrete Math is used also there are many applications where you
can find Discrete Math.
Some other topics
Also
there are some other topics which you should know because you can encounter
them many times. They are—
- Logarithm
and Exponential Functions
- Rational
Numbers
- Basic
geometric theorems
- Trigonometric
identities
- Real
and Complex numbers
- Sequence
and Series
- Graphing
Plotting
- Cartesian
and Polar Co-ordinate System
- Conic
Sections
- Linear
and Integer programming
Conclusion
So here
we have discussed the essential topics to excel in Data Science and Machine
Learning. There are many topics but this are some of the
important topics that you should know.
But I
must say one thing to my readers that do not feel scared or worried by reading
this topics. Please! But one thing is
sure if you want to be successful in Data Science you need have that will power
and dedication to learn new things. You should have excitement of learning.
Data science is indeed tough and vast but if you show your interest here you
can do wonders.
Consistency
and handwork will make you successful. But I can guarantee you that studying
this topics you Data Science understanding level will change. The learning and
application of Math will take time and is lengthy process but this will provide
you with long term results. And that is the big step towards becoming a
successful Data Scientist….
- Jay Charole
- Mar, 11 2022