Toxic Twitter? - Analyze It To know Better
The takeover of Twitter by Tesla's co-founder Elon Musk has unveiled one of the most unforeseen consequences of social media and an increasingly polarized society. Promoting a "free speech absolutist" ideology has sparked fears that such a hands-off, uncontrolled attitude may lead to even more deception and mendacity than it already has. The negativity spewing through the tweets by anonymous users has led to concerns about elevated hatred and incorrect information. So what can be done?
What is Social Media Analysis?
Apart from analyzing social media for scientific and research purposes, there seems a dire need to analyze platforms, especially Twitter for possible tweets that are distasteful or spiral hatred amongst people/communities. This is where Artificial Intelligence sweeps in - to analyze the most trending and real-time tweets on Twitter and categorize them as positive or negative for enabling further methods of action.
Twitter Developer APIs
Now, the first and foremost question to be asked is how to access the real-time tweets from Twitter? The answer lies in Twitter Developer APIs.
Twitter Developer is an initiative by Twitter that enables businesses, researchers, and developers all over the world to gain valuable insights from Twitter real-time data and develop innovative projects.
How to Use Twitter Developer Platform?
All a developer has to do is sign-up on the platform https://developer.twitter.com/en and fill out an application form describing their need for the account and wait for authentication from the company. Once you've successfully received your account credentials, you can sign in to explore the plethora of options that the Twitter Developer platform has for the developers to use.
How to Use the Twitter Developer APIs?
In order to access real-time tweets from Twitter, we have to use the API and consumer and token keys as shown below. These tokens have to be kept secret and can be regenerated as per need.
Creating a Twitter Tweet Analysis Model using AI and ML
Tweepy - is an open-source Python package that gives a very convenient way to access the Twitter API with Python. Tweepy includes a set of classes and methods that represent Twitter's models and API endpoints, and it transparently handles various implementation details such as data encoding and decoding. Read more about Tweepy at https://www.tweepy.org/
TextBlob - is a Python(2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. Read more about TextBlob at https://textblob.readthedocs.io/en/dev/
STEP 1: Importing all Necessary Libraries
The first and the foremost step in coding in Python requires us to import all the necessary libraries and modules.
#import all the necessary librariesimport tweepyimport jsonfrom datetime import datetime, date, time, timedeltafrom collections import Counterimport sysimport tweepyimport numpy as npimport pandas as pdimport refrom textblob import TextBlobSTEP 2: Setting Consumer Keys and APIs
Now, the keys and tokens generated from your Developer's account has to be authorized in the python notebook.
#The tokens are stored in the variables consumer_key, consumer_secret, access_token and access_token_secretauth = tweepy.OAuthHandler(consumer_key, consumer_secret)auth.set_access_token(access_token, access_token_secret)api = tweepy.API(auth)STEP 3: Accessing Real-Time Tweets Based on Hashtags
#Function to extract tweets based on hashtag.def get_tweets(hashtag):all_tweets = []#Using tweepy's module to extract tweets#Extended mode makes sure that the fuction waits till extraction is completed#1200 items/tweets will be extractedfor tweet in tweepy.Cursor(api.search, q=hashtag, lang='en', tweet_mode = 'extended').items(1200):all_tweets.append(tweet.full_text)return all_tweets#Function to create a dataFrame of the tweetsdef tweet_to_data_frame(tweets):df = pd.DataFrame(data=[tweet for tweet in tweets], columns=['Tweets'])return df#Taking the hashtag from user as inputhashtag = input('Enter hashtag: ')Enter hashtag: #rapehash = hashtag + ' -filter:retweets' #Filtering all retweets in the hashtag in the first stepalltweets = get_tweets(hash)data = tweet_to_data_frame(alltweets) #Converting the list to DataFrameSTEP 4: Applying Data Cleaning and Tokenisation
#Function that cleans the text and performs tokenizationdef cleantext(text):text = re.sub(r'@[A-Za-z0-9]+', '', text) #remove mentionstext = re.sub(r'#', '', text) #remove hashtagstext = re.sub(r'RT[\s]+', '', text) #removes retweetstext = re.sub(r"http\S+", "", text) #removes linkstext = text.replace('|', ' ') #removes | symbolstext = text.replace('\n', ' ') #removes linestext = text.replace('\\', ' ') #removes \ symbolsreturn text#Applying text cleaning to the DataFramedata['Tweets'] = data['Tweets'].apply(cleantext)STEP 5: Calculating Subjectivity and Polarity
Subjectivity - quantifies the amount of personal opinion and factual information contained in the text. It lies between [0, 1].
Polarity - It lies between [-1, 1]. -1 defines negative sentiment whereas 1 defines positive sentiment.
#Function to calculate the subjectivity and polarity respectivelydef getSubjectivity(text):return TextBlob(text).sentiment.subjectivitydef getPolarity(text):return TextBlob(text).sentiment.polarity#Calculating Subjectivity and Polarity for every row/tweetdata['Subjectivity'] = data['Tweets'].apply(getSubjectivity)data['Polarity'] = data['Tweets'].apply(getPolarity)STEP 6: Creating a Word Cloud
#Visualization using Matplotlibimport matplotlib.pyplot as pltfrom wordcloud import WordCloud, STOPWORDS#Creating a list of all sentences in every tweetallWords = ' '.join([twts for twts in data['Tweets']])words = []#Creating a list of all wordsfor tweet in data['Tweets']:wordList = re.sub("[^\w]", " ", tweet).split()words = words + wordList#Plotting Word CloudallWords = ' '.join([twts for twts in data['Tweets']])#Using stopwords to remove common words like prepositions, filler words etcstopwords = STOPWORDSwordCloud = WordCloud(stopwords=stopwords, width = 800, height = 500,random_state = 21, max_font_size = 100).generate(allWords)#Plotting the wordcloudplt.imshow(wordCloud, interpolation = "bilinear")plt.axis("off")plt.show()STEP 7: Analysing the Dataset Obtained
#Filtering words to eradicate stopwordsfiltered_words = [word for word in words if word not in stopwords]counted_words = collections.Counter(filtered_words)#Counting the frequency of each wordcommon_words = []counts = []for letter, count in counted_words.most_common(10):common_words.append(letter)counts.append(count)#Applying score and assign each row as positive, negative or neutral.def getAnalysis(score):if score < 0:return 'Negative'elif score == 0:return 'Neutral'else:return 'Positive'data['Analysis'] = data['Polarity'].apply(getAnalysis)STEP 8: Visualizing the Analysis
#Percentage of Positive Tweetsptweets = data[data.Analysis == 'Positive']ptweets = ptweets['Tweets']pos = round(ptweets.shape[0]/data.shape[0]*100, 2)#Percentage of Negative Tweetsntweets = data[data.Analysis == 'Negative']ntweets = ntweets['Tweets']neg = round(ntweets.shape[0]/data.shape[0]*100, 2)#Percentage of Neutral Tweetsneutral_tweets = data[data.Analysis == 'Neutral']neutral_tweets = neutral_tweets['Tweets']neu = round(neutral_tweets.shape[0]/data.shape[0]*100, 2)#List of Positive, Negative and Neutral valuesvalues = [pos, neg, neu]plt.figure(figsize=(6, 16))#Creating a pie chartplt.pie(values, labels=['Positive', 'Negative', 'Neutral'], autopct = '%1.0f%%',textprops={'fontsize': 14},startangle=90)plt.show()Conclusion
A lot more can be done with the help of AI and NLP tools available today. What's stopping you from creating a model of your own and analysing the Twitter world yourself? Now you have the power to answer the question - How Toxic is Twitter?
- Kashika Akhouri
- May, 01 2022