Complete Introduction to Machine Learning
Data Science

Complete Introduction to Machine Learning


In this article we will see what machine learning exactly is from very basic. We will observe how ML exactly came into existence and what are various terms related to Machine Learning.

What is learning?

·        The acquisition of knowledge or skills through study, experience, or being taught. (Oxford)

·        Learning is a key process in human behavior.(

·        Learning is the process of acquiring new, or modifying existing, knowledge, behaviors, skills, values, or preferences. (Wikipedia)

·        Learning has essentially been a human attribute. We perform tasks better when we learn!

·        Man has made machines which can solve problems using human-designed algorithms.

·        Now, we are creating algorithms which learn. Essentially algorithms that solve problems better as they (algorithms) learn!

Origin of Machine Learning

1943 : a neurophysiologist Warren Mcculloch and mathematician Walter Pitts created a model of neurons using an electrical circuit, and the 1stneural network was born.

1950: Alan Turing created the Turing test. For a computer to pass it has to be able to convince the human that it is a human and not a computer.

1952: The first computer program which could learn as it Ran was created. It was a game which played checkers created by Arthur Samuel.

Early definition of Machine Learning

Tom Mitchell –Professor in Carnegie Mellon University in his book titled ‘Machine Learning’, defined Machine Learning as:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.


1. To better filter emails as spam or not.

Task – Classifying emails as spam or not.

Performance Measure – The fraction of emails accurately classified as spam or not spam.

Experience – Observing you label emails as spam or not spam.


2. A checkers learning problem

Task – Playing checkers game.

Performance Measure – percent of games won against oppose.

Experience – playing implementation games against itself.


3. Handwriting Recognition Problem

Task – Acknowledging handwritten words within portrayal.

Performance Measure – percent of words accurately classified.

Experience – a directory of handwritten words with given classifications.


Introduction to Machine Learning

An ML algorithm tries to frame data in the context of a hypothetical function (f).

• That is, given some input variables (input), what is the predicted output variable (output).

• We represent it as :

𝑂𝑢𝑡𝑝𝑢𝑡 = 𝑓 (𝐼𝑛𝑝𝑢𝑡) OR

O𝑢𝑡𝑝𝑢𝑡𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒 = 𝑓 (𝐼𝑛𝑝𝑢𝑡𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠) OR

𝑌 = 𝑓(𝑋)

Example :

Precipitation = f(Windspeed, cloudcover%, temperature)

Car_price = f(make, model, engine, color,...)

Weight = f(Height) OR maybe Weight = f(Height, Age, Gender)

Visibility = f(Distance, FogDensity)


Modelling uses machine learning algorithms in which machine learns from Data just like humans learn from experience.

Machine learning models can be classified into following three types based on the task performed and the nature of the output.


Regression: Output variable to be predicted is a continuous variable. For example, score of student in a subject.

Classification: The output variable to be predicted in categorical variable. For example, classify incoming emails as spam or ham.

Clustering: No predefined notion of a label is allocated to the groups cluster formed.  Example: customer segmentation.

We can also classify machine learning models into two broad categories.

1. Supervised learning: Past data available is used for building the model. Regression and classification algorithms fall under this category.

2. Unsupervised learning:  No predefined labels are assigned to pass data. Clustering and association algorithms fall under this category.


There are two types of ML algorithms:

1) Parametric algorithms

Algorithms that simplify the function to a known form are called parametric machine learning algorithms.

A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model.

E.g. The line used in Linear Regression is represented by the form:

𝐵0 + 𝐵1𝑋1 + 𝐵2𝑋2 = 0

where 𝐵0, 𝐵1 and 𝐵2 are the coefficients of the line that control the intercept and slope, and 𝑋1 and 𝑋2 are two input variables. 

Examples of parametric algorithms:

·        Regression – is a classification algorithm

·        Linear Discriminant Analysis – is a dimensionality reduction technique

·        Perceptron – is an algorithm for supervised learning of binary classifiers


·        Simpler: These methods are easier to understand and interpret results.

·        Speed: Parametric models are very fast to learn from data.

·        Less Data: They do not require as much training data and can work well even if the data is not perfect.


·        Constrained: By choosing a functional form these methods are highly constrained to the specified form.

·        Limited Complexity: The methods are more suited to simpler problems.

·        Poor Fit: In practice the methods are unlikely to match the underlying mapping function.


2) Non-parametric algorithm

Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms.

By not making assumptions, they are free to learn any functional form from the training data.

Example: k-nearest neighbor’s algorithm that makes predictions based on the k most similar training patterns for a new data instance. The method does not assume anything about the form of the mapping function other than patterns that are close are likely have a similar output variable.

Examples of non-parametric algorithms:

·        Decision Trees like CART and C4.5

·        Naive Bayes

·        Support Vector Machines

·        Neural Networks


·        Flexibility: Capable of fitting a large number of functional forms.

·        Power: No assumptions (or weak assumptions) about the underlying function.

·        Performance: Can result in higher performance models for prediction.


·        More data: Require a lot more training data to estimate the mapping function.

·        Slower: A lot slower to train as they often have far more parameters to train.

·        Over fitting: More of a risk to over fit the training data and it is harder to explain why specific predictions are made.

  • Jay Charole
  • Mar, 11 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.