Data engineer vs Data analyst vs Data scientist
Data Science

Data engineer vs Data analyst vs Data scientist

 

What is data science?

What is data? Data is information that has been collected through any form be it surveys, questionnaire or any other form and can be used to plan, make decisions or gain insights into the data is being collected. The increase in volume and type of data being collected resulted in demand for ways to handle, store, process and deal with these data being generated. Hence the concept of data science emerged. Data science is applicable in almost every area of the industry including medical line, business, financial system, engineering field among many others, statistics is a very crucial part of data science, sometimes, a lot of people might even say data science can’t exist without statistics.

 

Data science lifecycle

      Data science involves various processes which are as follows:

1.         1.Data wrangling or collection of data

2.        2.Data cleaning

3.        3.Data exploration or EDA(exploratory data analysis)

4.        4.Data modelling

5.       5.Interpretation and insight collection

        These processes require a lot of skills and programming knowledge such as python, R, SQL, MS EXCEL, Tableau, AWS and so on, concepts such as data warehousing, data visualization is important. These myriad of tasks are handled by various by different data professionals namely:


  1. Data engineer

  2. Data analyst

  3. Data scientist

 

Data engineer:

   Data engineering involves collecting, storing and making high quality data accessible to the data analyst or the data scientist. They are responsible for creating data pipelines which sources for information from multiple sources and collate or integrate them together, this data must be structured so as to make it easy to perform analytical processes on them. Data engineers require programming language knowledge such Python, C++, Scala, Java, R, SQL.it is also pertinent to have knowledge of data storage, data warehousing, relational databases, non-relational databases, ETL(extract, transform and load) systems.

 

Data analyst:

   Data analysts are responsible for further cleaning of the data set, performing exploratory data analysis, produce visualizations and derive useful insights from the dataset. The data analyst makes use of the data prepared by the data engineer through multiple processes to derive insights and learn what is actually going on in the data so as to assist in making data driven decisions. These data can be quantitative, qualitative or categorical. These are some of the skills and tools required by a data analyst are as follows:

1. Python

2. Pandas

3. SQL

4. MS EXCEL

5. Tableau

6. Power bi

7. R

8. SAS

 

       Python, pandas, R are used to perform exploratory data analysis which is where most of the analytical processes are done. SQL is also used to do some form of analysis but they are called queries. Queries are ran on relational databases to derive some sort of information from the data. Tableau ,power bi , seaborn are part of tools used to perform data visualization, This is the process of using images, charts to represent or explain what’s going on in a dataset in a visual form. The data analyst perform most of the analytical processes and make data driven conclusions or decisions based on the analysis performed.

 

Data scientist:

     The data scientist is the most advanced role in the data science lifecycle, it involves the use of the most advanced programming concepts. A data scientist usually possess the skills of a data analyst and a lot more. A data scientist cleans the data set, perform exploratory data analysis, produce visualization, build predictive models, make data-driven decisions. After performing the analysis of the dataset, the data scientist makes use of machine learning models to further analyze and build predictive models which aid in making decisions. The type of machine learning model used on the dataset depends on the type of data contained and the kind of problem being solved. There are different kind of models such as classification models, regression models, clustering models. These predictive models are always prone to little errors but can be highly accurate also.

 

 

  • Odubajo Abdul qoyyum
  • Mar, 25 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.