Data engineer vs Data analyst vs Data scientist
What is data science?
What is
data? Data is information that has been collected through any form be it surveys,
questionnaire or any other form and can be used to plan, make decisions or gain
insights into the data is being collected. The increase in volume and type of
data being collected resulted in demand for ways to handle, store, process and
deal with these data being generated. Hence the concept of data science
emerged. Data science is applicable in almost every area of the industry
including medical line, business, financial system, engineering field among
many others, statistics is a very crucial part of data science, sometimes, a
lot of people might even say data science can’t exist without statistics.
Data science lifecycle
Data science involves various processes which
are as follows:
1. 1.Data wrangling or collection of
data
2. 2.Data cleaning
3. 3.Data exploration or EDA(exploratory
data analysis)
4. 4.Data modelling
5. 5.Interpretation and insight collection
These processes require a lot of skills and programming knowledge such as python, R, SQL, MS EXCEL, Tableau, AWS and so on, concepts such as data warehousing, data visualization is important. These myriad of tasks are handled by various by different data professionals namely:
1. Data engineer
2. Data analyst
3. Data scientist
Data engineer:
Data engineering involves collecting,
storing and making high quality data accessible to the data analyst or the data
scientist. They are responsible for creating data pipelines which sources for
information from multiple sources and collate or integrate them together, this
data must be structured so as to make it easy to perform analytical processes
on them. Data engineers require programming language knowledge such Python,
C++, Scala, Java, R, SQL.it is also pertinent to have knowledge of data
storage, data warehousing, relational databases, non-relational databases,
ETL(extract, transform and load) systems.
Data analyst:
Data analysts are responsible for further cleaning of the
data set, performing exploratory data analysis, produce visualizations and
derive useful insights from the dataset. The data analyst makes use of the data
prepared by the data engineer through multiple processes to derive insights and
learn what is actually going on in the data so as to assist in making data
driven decisions. These data can be quantitative, qualitative or categorical. These
are some of the skills and tools required by a data analyst are as follows:
1. Python
2. Pandas
3. SQL
4. MS EXCEL
5. Tableau
6. Power bi
7. R
8. SAS
Python, pandas, R are used to perform
exploratory data analysis which is where most of the analytical processes are
done. SQL is also used to do some form of analysis but they are called queries.
Queries are ran on relational databases to derive some sort of information from
the data. Tableau ,power bi , seaborn are part of tools used to perform data
visualization, This is the process of using images, charts to represent or
explain what’s going on in a dataset in a visual form. The data analyst perform
most of the analytical processes and make data driven conclusions or decisions
based on the analysis performed.
Data scientist:
The data scientist is the most advanced
role in the data science lifecycle, it involves the use of the most advanced
programming concepts. A data scientist usually possess the skills of a data
analyst and a lot more. A data scientist cleans the data set, perform
exploratory data analysis, produce visualization, build predictive models, make
data-driven decisions. After performing the analysis of the dataset, the data
scientist makes use of machine learning models to further analyze and build
predictive models which aid in making decisions. The type of machine learning
model used on the dataset depends on the type of data contained and the kind of
problem being solved. There are different kind of models such as classification
models, regression models, clustering models. These predictive models are
always prone to little errors but can be highly accurate also.
- Odubajo Abdul qoyyum
- Mar, 25 2022