Data Analysis of the MechanicalKeyboards Subreddit

I have been using classic buckling spring IBM Model M computer keyboards since I first began programming. These are great to type on and I still love them (kind of feels like typing on a typewriter), but I decided recently that I should upgrade to a compact keyboard that uses modern mechanical switches. There seems to be an endless sea of options to choose from though; the first step in my consumer journey is to narrow my options down to a few top brands, so what is an aspiring data scientist to do? I thought a good way to cut through the clutter would be to scrape the r/MechanicalKeyboards subreddit to see what brands are the most talked about currently. So I wrote this Python script that uses Reddit’s API to scrape the subreddit.

import praw
from praw.models import MoreComments
import datetime
import pandas as pd

# Lets use PRAW (a Python wrapper for the Reddit API)
reddit = praw.Reddit(client_id='', client_secret='', user_agent='')

# Scraping the posts
posts = reddit.subreddit('MechanicalKeyboards').hot(limit=None) # Sorted by hottest
 
posts_dict = {"Title": [], "Post Text": [], "Date":[],
               "Score": [], "ID": [],
              "Total Comments": [], "Post URL": []
              }

comments_dict = {"Title": [], "Comment": [], "Date":[],
              "Score": [], "ID": [], "Post URL": []
              }

for post in posts:
    # Title of each post
    posts_dict["Title"].append(post.title)
     
    # Text inside a post
    posts_dict["Post Text"].append(post.selftext)
    
    # Date of each post
    dt = datetime.date.fromtimestamp(post.created_utc) # Convert UTC to DateTime
    posts_dict["Date"].append(dt)
     
    # The score of a post
    posts_dict["Score"].append(post.score)
    
    # Unique ID of each post
    posts_dict["ID"].append(post.id)
     
    # Total number of comments inside the post
    posts_dict["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dict["Post URL"].append(post.url)
    
    # Now we need to scrape the comments on the posts
    id = post.id
    submission = reddit.submission(id)
    submission.comments.replace_more(limit=0) # Use replace_more to remove all MoreComments
    
    # Use .list() method to also get the comments of the comments
    for comment in submission.comments.list(): 
        # Title of each post
        comments_dict["Title"].append(post.title)
        
        # The comment
        comments_dict["Comment"].append(comment.body)
        
        # Date of each comment
        dt = datetime.date.fromtimestamp(comment.created_utc) # Convert UTC to DateTime
        comments_dict["Date"].append(dt)
        
        # The score of a comment
        comments_dict["Score"].append(comment.score)
         
        # Unique ID of each post
        comments_dict["ID"].append(post.id)
         
        # URL of each post
        comments_dict["Post URL"].append(post.url)

# Saving the data in pandas dataframes
allPosts = pd.DataFrame(posts_dict)
allPosts

allComments = pd.DataFrame(comments_dict)
allComments

# Time to output everything to csv files
allPosts.to_csv("MechanicalKeyboards_Posts.csv", index=True)
allComments.to_csv("MechanicalKeyboards_Comments.csv", index=True)

Reddit limits API requests to 1000 posts, so the most current 1000 posts is my sample size. My code outputs two files: the last 1000 posts, and more importantly the comments on those 1000 posts, which ended up being 9042 rows of data. (I posted the files to Kaggle if anyone would like to play with them.) Then I imported my comments dataset into OpenRefine so I could run text filters to find brand names, and I recorded the number of mentions for each brand. Finally, using Tableau, I created a couple of Data Visualization charts to express my findings. Here are the most talked about keyboard brands on r/MechanicalKeyboards currently:

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab
Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab