Data Analysis of the MechanicalKeyboards Subreddit

I have been using classic buckling spring IBM Model M computer keyboards since I first began programming. These are great to type on and I still love them (kind of feels like typing on a typewriter), but I decided recently that I should upgrade to a compact keyboard that uses modern mechanical switches. There seems to be an endless sea of options to choose from though; the first step in my consumer journey is to narrow my options down to a few top brands, so what is an aspiring data scientist to do? I thought a good way to cut through the clutter would be to scrape the r/MechanicalKeyboards subreddit to see what brands are the most talked about currently. So I wrote this Python script that uses Reddit’s API to scrape the subreddit.

import praw
from praw.models import MoreComments
import datetime
import pandas as pd

# Lets use PRAW (a Python wrapper for the Reddit API)
reddit = praw.Reddit(client_id='', client_secret='', user_agent='')

# Scraping the posts
posts = reddit.subreddit('MechanicalKeyboards').hot(limit=None) # Sorted by hottest
 
posts_dict = {"Title": [], "Post Text": [], "Date":[],
               "Score": [], "ID": [],
              "Total Comments": [], "Post URL": []
              }

comments_dict = {"Title": [], "Comment": [], "Date":[],
              "Score": [], "ID": [], "Post URL": []
              }

for post in posts:
    # Title of each post
    posts_dict["Title"].append(post.title)
     
    # Text inside a post
    posts_dict["Post Text"].append(post.selftext)
    
    # Date of each post
    dt = datetime.date.fromtimestamp(post.created_utc) # Convert UTC to DateTime
    posts_dict["Date"].append(dt)
     
    # The score of a post
    posts_dict["Score"].append(post.score)
    
    # Unique ID of each post
    posts_dict["ID"].append(post.id)
     
    # Total number of comments inside the post
    posts_dict["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dict["Post URL"].append(post.url)
    
    # Now we need to scrape the comments on the posts
    id = post.id
    submission = reddit.submission(id)
    submission.comments.replace_more(limit=0) # Use replace_more to remove all MoreComments
    
    # Use .list() method to also get the comments of the comments
    for comment in submission.comments.list(): 
        # Title of each post
        comments_dict["Title"].append(post.title)
        
        # The comment
        comments_dict["Comment"].append(comment.body)
        
        # Date of each comment
        dt = datetime.date.fromtimestamp(comment.created_utc) # Convert UTC to DateTime
        comments_dict["Date"].append(dt)
        
        # The score of a comment
        comments_dict["Score"].append(comment.score)
         
        # Unique ID of each post
        comments_dict["ID"].append(post.id)
         
        # URL of each post
        comments_dict["Post URL"].append(post.url)

# Saving the data in pandas dataframes
allPosts = pd.DataFrame(posts_dict)
allPosts

allComments = pd.DataFrame(comments_dict)
allComments

# Time to output everything to csv files
allPosts.to_csv("MechanicalKeyboards_Posts.csv", index=True)
allComments.to_csv("MechanicalKeyboards_Comments.csv", index=True)

Reddit limits API requests to 1000 posts, so the most current 1000 posts is my sample size. My code outputs two files: the last 1000 posts, and more importantly the comments on those 1000 posts, which ended up being 9042 rows of data. (I posted the files to Kaggle if anyone would like to play with them.) Then I imported my comments dataset into OpenRefine so I could run text filters to find brand names, and I recorded the number of mentions for each brand. Finally, using Tableau, I created a couple of Data Visualization charts to express my findings. Here are the most talked about keyboard brands on r/MechanicalKeyboards currently:

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab
Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

My New Linux Server

I was going to use this cool cart I found as a Raspberry Pi station, but I found myself needing a decent Linux server for a project, so I decided to rethink my plans. Being on a budget, I repurposed a 2012 Mac Mini I bought off eBay by installing Ubuntu on it and fitting it into the cart. I was lucky to find a Quad-Core i7 Mac Mini that was already tricked-out with 16GB RAM, a 1 TB SSD, and an additional 1TB HDD for storage. This makes it a surprisingly swift little machine, despite its age. I had to find the right networking drivers to get the Wi-Fi working, but otherwise it was a pretty painless installation. I’m going to use this for running longer Python scripts, so I don’t have to use up my main computer’s resources.

Raspberry Pi Station

I wanted a dedicated space for tinkering on Raspberry Pi, so I set this up. I bought a Medical Teleconference cart from a government surplus auction for $50, which I repurposed to be a great adjustable standing desk. Plus, it has lots of built-in cable management and a perfect cubby for holding the Pi. Then I lucked out and found an awesome monitor at the thrift store for $30. The rest of the gear I had lying around. Now I have a really convenient place where I can play with Pi, and as a bonus my home lab has a computer. I plan on using this setup to do some experimenting with Machine Learning soon, so stay tuned.

AI Generated Advertising Content

Due to recent progress in the fields of Artificial Intelligence (AI) and Machine Learning, many of the creative tasks within advertising, such as writing ad copy or ad image selection, are increasingly being performed by machine rather than by humans. The rise of AI generated content stands to shake the advertising world, as some professional roles become obsolete. The ways that consumers and brands interact are also rapidly changing as a result. To understand this phenomenon, we must delve into the benefits and pitfalls of AI generated content.

Some advertisers dream of a time when they can enjoy a three-hour work week, utilizing a myriad of AI tools to streamline and automate their workflows to extreme lengths. While this particular scenario isn’t very likely to happen, it’s not hard to understand the desire: This would be quite the leap from the current day-to-day slog that many advertisers find themselves struggling through. In the digital era, marketing departments must churn out dizzying numbers of variations of digital ads for the various social media platforms currently popular, each with slightly different imagery and calls to action. Wouldn’t it be nice to automate this process, and let robots handle the boring bits? Well, that might seem like some manner of science-fiction futurism, but it is actually a possibility today.

AI can be used to completely generate both the advertising copy and the visual imagery for the ad, and when combined with customer profile data, AI can even customize the ad to be more persuasive to that particular viewer. These ads do not exist before the target consumer is ready to view the ad, then in an instant an ad is automatically generated just for that particular viewer. The AI takes into account the viewer’s interests, behaviors, and demographics. The result is a very tailored communication, which will likely be more effective than a traditional one-size-fits-all ad. It also has the benefit of saving the brand a fortune in advertising costs. All the man hours that would have traditionally been spent in crafting the ad, and then creating endless derivatives for every possible platform, was all accomplished without any man hours spent at all (well, besides the initial setup of the AI campaign that is).

Multiple AI technologies can be used in tandem for particularly creative results. Companies like DataGrid and Rosebud.ai have developed AI technology that allows advertisers to utilize completely artificial models that are almost indistinguishable from real, human models. These virtual models can be used as actors in commercials, or as fashion models for brands. You could even showcase fashion products on a generated model that looks identical to the viewer, letting the viewer know how those items would look on them specifically. The possibilities are almost endless. Albert.ai is another AI brand, one that autonomously plans and executes paid search and social media campaigns. Tech company OpenAI (co-founded by Elon Musk) launched the AI tool “GPT-3,” which can write copy so well that it’s hard to tell that the text wasn’t written by an actual human. Using tools like these, brands can save advertising costs, allowing smaller brands the ability to make advertisements that rival the quality of larger brands. They will also be able to experiment with more creative advertising, since the cost to experiment will be much lower than using traditional methods.

However, the technology isn’t all AI generated roses. For example, according to Google’s Search Advocate John Mueller, content automatically generated with AI writing tools is considered spam and against webmaster guidelines. It is possible platforms will begin banning AI generated content in the future, which would certainly dampen the technology’s potential. As of now though, no ban exists for this emerging technology, and platforms like Google are unable to detect if ads were AI generated or human created. Another concern is that AI generation tools might lead to a stall in the advertising job market, as many traditional roles are replaced with AI counterparts. However, it is also possible that freeing advertisers from the more tedious aspects of ad creation will have a positive effect, allowing them to spend more time and resources on more creative pursuits. This could lead to higher quality advertising for consumers, and more high-level positions for prospective advertisers. A more pressing fear is that AI generated content might lead to empty, uncreative, repetitive advertising. This could further crowd an already crowded market, and could erode brand trust with consumers, if the technology is used to generate low-quality content. 

Despite these potential pitfalls though, the future of AI generated content is going to be too lucrative to ignore. Advertisers that learn how to put these new tools to work for them will enjoy an advantage over brands who aren’t able to capitalize on the power of generated content. And as this technology matures, that will only increasingly be the case. Even as it stands now, if some parts of this paper were AI generated using the tools currently available, would you be able to tell?