Basics of data science with Python-SQL
Data Science

Basics of data science with Python-SQL

Introduction to data science using Python & SQL

Python is a very good choice to use when wanting to get into the field of Data science, as it has many libraries that support data extract, analyze and visualize.
What you need to have before hitting into Data science:
  • good Python knowledge
  • SQL experience

Now, Basic examples of the needed python libraries will be shown and explained;

Python Iibraries needed:

  1. Numpy : it is a library that supports creating arrays and gives helpful methods to work with them

  • first you will need to run this command in your terminal to install the numpy library in Python ->  pip install numpy
  • create a new project in python write this line  ->  import numpy as np
  • Example of creating  a numpy array  named ->  x = np.array([5, 7, 8, 1, 0])
  • to insert elements in - > x = np.insert(x, 2, [13, 20])
  • here we inserted (2), and an array of [13, 20] . we used np.insert( ), we insert elements in order. So, first said x  'as the np array we want to add to'  and then started adding the other elements so the new value of will be as the new inserted array. 
  • change shape of numpy array -> x.reshape(2,1)  'let shape of x be 2x1, 2 rows & 1 column'

   2.  Pandas: library used to create dataframes and work with them

  • first you will need to run this command in your terminal to install the pandas library in Python -> pip install pandas
  • add the import to the project -> import pandas as pd
  • Example of creating a pandas series named groceries -> groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])
  • A series is a bit like the dictionary (HashMap) as a key-value relationship
  • if we want to check if there is a key/index called 'bananas' in the series -> print( 'bananas' in groceries)
  • if we want to drop an element (key) in the series and we want the drop to happen in place -> groceries.drop('eggs', inplace=True)

  • A dataframe is a collection of series, first we create a dictionary  items ->  items = {'Bob' : pd.Series(data = [245, 25, 55], index = ['bike', 'pants', 'watch']),   'Alice' : pd.Series(data = [40, 110, 500, 45], index = ['book', 'glasses', 'bike', 'pants'])}
  • we let items be a dataframe named shopping_cart -> shopping_cart = pd.DataFrame(items)
  • get shape of shopping_cart -> print(shopping_cart.shape)
  • dimension of the dataframe -> print(shopping_cart.ndim)
  • number of elements in the dataframe -> print(shopping_cart.size)
  • row index in the dataframe shopping_cart defined by -> print(shopping_cart.index)
  • column index in the dataframe shopping_cart defined by -> print(shopping_cart.columns)
That was a brief introduction and examples of implementing numpy & pandas in python.
Now, we will get into an introduction to SQL and its main queries.

SQL: it is a shortcut for  'sequel query language', it is used to write queries to deal with database tables. Usually SQL keywords are written in all CAPS as to be recognizable but it won't give any errors typing them otherwise.

SQL Commands:

1. CREATE DATABASE: to create a new database 


2. CREATE TABLE: to create a new table in the database


title varchar(100),

category varchar(25),

duration float,

language varchar(25),

release_date date  );

we created a table called film with 4 columns.

3. SELECT: is a keyword that is mostly used to return the data we are looking for from database

SELECT title, category 

FROM film;

Here, we want to retrieve the title and category for every element in the table film.

4. DROP DATABASE: to remove a database


5. DROP TABLE: to remove a table from database with all its data


6. JOIN: to join two or more tables together

SELECT film.title,

FROM film JOIN actor

WHERE actor.movie_name = film.title;

here we want to retrieve the name of the movie and the actor where that actor made a role in that movie (movie names match)

That was a quick simple review about python and SQL which are of the technologies used to work in data science.

  • Nadah Khaled
  • Mar, 25 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.