I wanted to play around with Sentiment Analysis of Tweets; specifically, I wanted to try the Python TextBlob library, which has a built-in function that performs text analysis to determine if a string has a positive or negative sentiment. After pondering a bit, I decided it would be fun to search for tweets that were created specifically within the city limits of Tuscaloosa, where I am currently attending school. I wrote a script that scrapes Twitter and returns tweets by geolocation, and then uses TextBlob on the results.
# -*- coding: utf-8 -*- """ Created on Wed Jul 6 15:58:58 2022 @author: austin """ import snscrape.modules.twitter as sntwitter #Social Network Scraping Library import pandas as pd #so I can make a dataframe of results from textblob import TextBlob import csv import time #Tuscaloosa = geocode:33.23726448661455,-87.58279011262114,20km query = "geocode:33.23726448661455,-87.58279011262114,20km" tweets =  combinedtweets =  limit = 10000000 #set a limit on how many results I want to pull for tweet in sntwitter.TwitterSearchScraper(query).get_items(): if len(tweets) == limit: break else: # set sentiment text = tweet.content analysis = TextBlob(text) if analysis.sentiment.polarity >= 0: sentiment = 'positive' else: sentiment = 'negative' tweets.append([tweet.date, tweet.user.username, tweet.content, sentiment]) df = pd.DataFrame(tweets, columns=['Date', 'User','Tweet', 'Sentiment']) df.to_csv('twitter_scrape_results.csv') #save dataframe as csv print("\014") #clear console time.sleep(10) print("CSV Successfully Created")
The results were pretty interesting (I uploaded the dataset to Kaggle if anyone is interested). It seems sentiment stays roughly the same each year, hovering around 85% positive and 15% negative. I really would have thought negative sentiment would be much higher based on my personal observations of Twitter content: makes me wonder if Tuscaloosa is an unusually happy place, or if my Twitter observations are influenced by negative bias…
In any case, perhaps a more interesting bit of data is that the total amount of Tweets seems to decline quite a bit each year. This raises the question, why are Tuscaloosians tweeting less often? I put the results into this Tableau dashboard, which displays just how steady and steep a decline there has been.
I decided to test a hypothesis: perhaps the high level of positive tweet sentiment is due to the fact that this is a college town, and numerous tweets were posted by official University of Alabama departments? I used OpenRefine to filter out official UA accounts, which was easy enough to do since their usernames seem to either begin with “UA_” or end with “_UA”. Surprisingly though, that didn’t change the sentiment percentages at all. I now suspect that even if you factor in all official UA Twitter accounts, you would also have to factor for the fact that a large number of Tuscaloosians work for UA (45,000 employees). I know many of my professors post UA related content using their personal Twitter accounts, and by design this content will logically slant positive.