Toxic Twitter? - Analyze It To know Better
Data Science

Toxic Twitter? - Analyze It To know Better

The takeover of Twitter by Tesla's co-founder Elon Musk has unveiled one of the most unforeseen consequences of social media and an increasingly polarized society. Promoting a "free speech absolutist" ideology has sparked fears that such a hands-off, uncontrolled attitude may lead to even more deception and mendacity than it already has. The negativity spewing through the tweets by anonymous users has led to concerns about elevated hatred and incorrect information. So what can be done?

What is Social Media Analysis?

Apart from analyzing social media for scientific and research purposes, there seems a dire need to analyze platforms, especially Twitter for possible tweets that are distasteful or spiral hatred amongst people/communities. This is where Artificial Intelligence sweeps in - to analyze the most trending and real-time tweets on Twitter and categorize them as positive or negative for enabling further methods of action.

Twitter Developer APIs

Now, the first and foremost question to be asked is how to access the real-time tweets from Twitter? The answer lies in Twitter Developer APIs. 

Twitter Developer is an initiative by Twitter that enables businesses, researchers, and developers all over the world to gain valuable insights from Twitter real-time data and develop innovative projects. 

How to Use Twitter Developer Platform? 

All a developer has to do is sign-up on the platform and fill out an application form describing their need for the account and wait for authentication from the company. Once you've successfully received your account credentials, you can sign in to explore the plethora of options that the Twitter Developer platform has for the developers to use. 

How to Use the Twitter Developer APIs?

In order to access real-time tweets from Twitter, we have to use the API and consumer and token keys as shown below. These tokens have to be kept secret and can be regenerated as per need. 

Creating a Twitter Tweet Analysis Model using AI and ML

Tweepy - is an open-source Python package that gives a very convenient way to access the Twitter API with Python. Tweepy includes a set of classes and methods that represent Twitter's models and API endpoints, and it transparently handles various implementation details such as data encoding and decoding. Read more about Tweepy at

TextBlob - is a Python(2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. Read more about TextBlob at

STEP 1: Importing all Necessary Libraries

The first and the foremost step in coding in Python requires us to import all the necessary libraries and modules.

#import all the necessary libraries
import tweepy
import json
from datetime import datetime, date, time, timedelta
from collections import Counter
import sys
import tweepy
import numpy as np
import pandas as pd
import re
from textblob import TextBlob

STEP 2: Setting Consumer Keys and APIs

Now, the keys and tokens generated from your Developer's account has to be authorized in the python notebook.

#The tokens are stored in the variables consumer_key, consumer_secret, access_token and access_token_secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

STEP 3: Accessing Real-Time Tweets Based on Hashtags
#Function to extract tweets based on hashtag.
def get_tweets(hashtag):
    all_tweets = []
    #Using tweepy's module to extract tweets
    #Extended mode makes sure that the fuction waits till extraction is completed
    #1200 items/tweets will be extracted
    for tweet in tweepy.Cursor(, q=hashtag, lang='en', tweet_mode = 'extended').items(1200):
    return all_tweets

#Function to create a dataFrame of the tweets
def tweet_to_data_frame(tweets):
    df = pd.DataFrame(data=[tweet for tweet in tweets], columns=['Tweets'])
    return df

#Taking the hashtag from user as input
hashtag = input('Enter hashtag: ')
Enter hashtag: #rape

hash = hashtag + ' -filter:retweets' #Filtering all retweets in the hashtag in the first step
alltweets = get_tweets(hash)
data = tweet_to_data_frame(alltweets) #Converting the list to DataFrame

STEP 4: Applying Data Cleaning and Tokenisation
#Function that cleans the text and performs tokenization
def cleantext(text):
    text = re.sub(r'@[A-Za-z0-9]+''', text) #remove mentions
    text = re.sub(r'#''', text) #remove hashtags
    text = re.sub(r'RT[\s]+''', text) #removes retweets
    text = re.sub(r"http\S+""", text) #removes links
    text = text.replace('|'' '#removes | symbols
    text = text.replace('\n'' '#removes lines
    text = text.replace('\\'' '#removes \ symbols

    return text

#Applying text cleaning to the DataFrame
data['Tweets'] = data['Tweets'].apply(cleantext)

STEP 5: Calculating Subjectivity and Polarity

Subjectivity - quantifies the amount of personal opinion and factual information contained in the text. It lies between [0, 1].

Polarity - It lies between [-1, 1]. -1 defines negative sentiment whereas 1 defines positive sentiment.

#Function to calculate the subjectivity and polarity respectively
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

def getPolarity(text):
    return TextBlob(text).sentiment.polarity

#Calculating Subjectivity and Polarity for every row/tweet
data['Subjectivity'] = data['Tweets'].apply(getSubjectivity)
data['Polarity'] = data['Tweets'].apply(getPolarity)

STEP 6: Creating a Word Cloud
#Visualization using Matplotlib 
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS

#Creating a list of all sentences in every tweet
allWords = ' '.join([twts for twts in data['Tweets']])
words = []

#Creating a list of all words 
for tweet in data['Tweets']:
  wordList = re.sub("[^\w]"" ",  tweet).split()
  words = words + wordList

#Plotting Word Cloud
allWords = ' '.join([twts for twts in data['Tweets']])

#Using stopwords to remove common words like prepositions, filler words etc
stopwords = STOPWORDS
wordCloud = WordCloud(stopwords=stopwords, width = 800, height = 500
                      random_state = 21, max_font_size = 100).generate(allWords)

#Plotting the wordcloud
plt.imshow(wordCloud, interpolation = "bilinear")

STEP 7: Analysing the Dataset Obtained
#Filtering words to eradicate stopwords
filtered_words = [word for word in words if word not in stopwords]
counted_words = collections.Counter(filtered_words)

#Counting the frequency of each word
common_words = []
counts = []
for letter, count in counted_words.most_common(10):

#Applying score and assign each row as positive, negative or neutral.  
def getAnalysis(score):
  if score < 0:
    return 'Negative'
  elif score == 0:
    return 'Neutral'
    return 'Positive'

data['Analysis'] = data['Polarity'].apply(getAnalysis)

STEP 8: Visualizing the Analysis
#Percentage of Positive Tweets
ptweets = data[data.Analysis == 'Positive']
ptweets = ptweets['Tweets']
pos = round(ptweets.shape[0]/data.shape[0]*1002)

#Percentage of Negative Tweets
ntweets = data[data.Analysis == 'Negative']
ntweets = ntweets['Tweets']
neg = round(ntweets.shape[0]/data.shape[0]*1002)

#Percentage of Neutral Tweets
neutral_tweets = data[data.Analysis == 'Neutral']
neutral_tweets = neutral_tweets['Tweets']
neu = round(neutral_tweets.shape[0]/data.shape[0]*1002)

#List of Positive, Negative and Neutral values
values = [pos, neg, neu]

#Creating a pie chart
plt.pie(values, labels=['Positive''Negative''Neutral'], autopct = '%1.0f%%',


A lot more can be done with the help of AI and NLP tools available today. What's stopping you from creating a model of your own and analysing the Twitter world yourself? Now you have the power to answer the question - How Toxic is Twitter?

  • Kashika Akhouri
  • May, 01 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.