
Developers tend to take their keyboards seriously. I have been using classic buckling spring IBM Model M computer keyboards since I first began programming. These are great to type on, and I still love them (kind of feels like typing on a typewriter), but I decided recently that I should upgrade to a compact keyboard that uses modern mechanical switches. This would give me more space on my desk, and allow for some customization. There seems to be an endless sea of options to choose from, though; the first step in my consumer journey is to narrow my options down to a few top brands, so what is a developer to do? I thought a good way to cut through the clutter would be to scrape the r/MechanicalKeyboards subreddit to see what brands are the most talked about currently. So I wrote this Python script that uses Reddit’s API to scrape the subreddit.
import praw
from praw.models import MoreComments
import datetime
import pandas as pd
# Lets use PRAW (a Python wrapper for the Reddit API)
reddit = praw.Reddit(client_id='', client_secret='', user_agent='')
# Scraping the posts
posts = reddit.subreddit('MechanicalKeyboards').hot(limit=None) # Sorted by hottest
posts_dict = {"Title": [], "Post Text": [], "Date":[],
"Score": [], "ID": [],
"Total Comments": [], "Post URL": []
}
comments_dict = {"Title": [], "Comment": [], "Date":[],
"Score": [], "ID": [], "Post URL": []
}
for post in posts:
# Title of each post
posts_dict["Title"].append(post.title)
# Text inside a post
posts_dict["Post Text"].append(post.selftext)
# Date of each post
dt = datetime.date.fromtimestamp(post.created_utc) # Convert UTC to DateTime
posts_dict["Date"].append(dt)
# The score of a post
posts_dict["Score"].append(post.score)
# Unique ID of each post
posts_dict["ID"].append(post.id)
# Total number of comments inside the post
posts_dict["Total Comments"].append(post.num_comments)
# URL of each post
posts_dict["Post URL"].append(post.url)
# Now we need to scrape the comments on the posts
id = post.id
submission = reddit.submission(id)
submission.comments.replace_more(limit=0) # Use replace_more to remove all MoreComments
# Use .list() method to also get the comments of the comments
for comment in submission.comments.list():
# Title of each post
comments_dict["Title"].append(post.title)
# The comment
comments_dict["Comment"].append(comment.body)
# Date of each comment
dt = datetime.date.fromtimestamp(comment.created_utc) # Convert UTC to DateTime
comments_dict["Date"].append(dt)
# The score of a comment
comments_dict["Score"].append(comment.score)
# Unique ID of each post
comments_dict["ID"].append(post.id)
# URL of each post
comments_dict["Post URL"].append(post.url)
# Saving the data in pandas dataframes
allPosts = pd.DataFrame(posts_dict)
allPosts
allComments = pd.DataFrame(comments_dict)
allComments
# Time to output everything to csv files
allPosts.to_csv("MechanicalKeyboards_Posts.csv", index=True)
allComments.to_csv("MechanicalKeyboards_Comments.csv", index=True)
Reddit limits API requests to 1000 posts, so the most current 1000 posts is my sample size. My code outputs two files: the last 1000 posts, and more importantly the comments on those 1000 posts, which ended up being 9042 rows of data. (I posted the files to Kaggle if anyone would like to play with them.) Then I imported my comments dataset into OpenRefine so I could run text filters to find brand names, and I recorded the number of mentions for each brand. Finally, using Tableau, I created a couple of Data Visualization charts to express my findings. Here are the most talked about keyboard brands on r/MechanicalKeyboards currently:
Update:
I decided to go with the Keychron keyboard that my research found to be the most discussed (and I also added Glorious Panda Switches and HK Gaming PBT Keycaps). Couldn’t be happier; it’s a pleasure to type on.
