
If you’re a huge anime fan like me, then you should check out this GitHub repository! AwesomeAnimeResearch, compiled by the user SerialLain3170 contains a massive list of published research papers for machine learning projects related to anime or manga.
The categories of projects are popular topics such as image generation, image-to-image transformation, automatic line art coloration, automatic character lighting, automatic character editing, automatic sketch editing, talking head animation and much more.
Here are some cool projects in the repository that I found particularly interesting.
This project uses a deep neural network that takes an image of an anime character’s face…

The primary goal of a data visualization is to represent a point in a clear, compelling way. As Data Scientists, we’re always aiming to create the coolest, most innovative types of data visualizations. To the point where the actual message is lost. Why over complicate things? Sometimes, a simple message only requires a simple visualization to convey it. In this blog, I’ll be sharing a brief tutorial on how you can easily create a venn diagram with Matplotlib and any kind of data.
In my recent project, Twitter Hate Speech Detection, I faced a major challenge with the class imbalance…

With Data Science, we need different tools to handle the diverse range of datasets. Before we dive into the different methods for sentiment analysis, it’s important to note that it’s a technique within Natural Language Processing. Often called NLP, it is the study of how computers can understand human language. And although this is a specialty that is popular among Data Scientists, it’s not exclusive to the industry.
Working with text data comes with a unique set of problems and solutions that other types of datasets don’t have. Often, text data requires more cleaning and preprocessing than other data types…
Have you ever tried to preview your Jupyter Notebook file and received this pesky error message after about 20+ seconds of waiting?

About 70% of the time, we see this message. As if it’s a gamble to see whether your .ipynb file will render or not.
Good news, this “Sorry, something went wrong. Reload?” message doesn’t have an impact on your actual commit. It’s just an issue on GitHub’s end because it’s unable to render a preview of the file.
After much research, it seems that no one on the internet is quite sure about why this problem is occurring…

Pytrends is an unofficial API for tracking Google Search trends, and we can use matplotlib to visualize these trends over time. You can read more about the package in the project’s GitHub repository here.
Well, we all know what Google Search is. We all use it several times a day, sometimes without a second thought, to access millions of search results for a wide range of topics. Fun fact, the Google Search Toolbar was first released in 2000 for Internet Explorer 5!
By looking at a time series visualization of Google Search keywords over a decade, we can draw valuable…

Before I dive into the definition of the No Free Lunch theorem, let’s quickly discuss the context. The beauty of data science and machine learning is that no two datasets will ever be the same. The size, noise and content will always be different. Therefore, our approach to every problem must be different.
The No Free Lunch theorem states that there is no one model that works best for every problem. The assumptions of a great model for one problem may not hold for another problem. …

With data science, the key to learning any new technology will always be practicing first-hand with a project. To learn Tableau, I performed an analysis of the survival rates of the Titanic. The full project can be found here, hosted on Tableau Public.
Anyone familiar with Kaggle, the data science and machine learning dataset resource, may already recognize the Titanic dataset. This dataset provides observations for each passenger on the Titanic and their survival outcome. For the purposes of this project, only 871 observations from the training set were used. Ultimately, out of the 2,435 total passengers on board, only…

A key skill for any Data Scientist is the ability to write production-quality code to create models and deploy them into cloud environments. Typically, working with cloud computing and data architectures falls in the Data Engineer job title. However, every data professional is expected to be a generalist who can adapt and scale their projects.
Here is an introduction to popular platforms that I have seen across dozens of job descriptions. This doesn’t mean that we have to become experts overnight, but it helps to understand the services that are out there. …

In general, when people think of Natural Language Processing (NLP), they tend to restrict it to English. This is due to the idea that English is the only language that can be applied. Because of this linguistic bias, I decided to investigate how to preprocess Chinese text data for NLP.
First, I would like to thank my cohort mate David Bruce for pointing out this disparity. In his blog post on Learning a New Language in a Word Cloud, he shared that Professor Emily M. …
Now that I’m ten weeks deep in a fifteen-week Data Science program, there’s still a subject that weighs heavily on my mind. From the very start of the program, all language and educational examples have almost always been binary. This makes sense, given that computers themselves are binary machines. But Data Science specifically deals with real-life human data and problems. So, it should be able to adapt to the evolving identities of the people it’s about. Additionally, this isn’t a critique of the program that I’m in; this is a widespread problem across industries.
As a disclaimer, I’m writing this…

Data Scientist | Machine Learning | Digital Media Studies