If you’re a huge anime fan like me, then you should check out this GitHub repository!
AwesomeAnimeResearch, compiled by the user SerialLain3170 contains a massive list of published research papers for machine learning projects related to anime or manga.
The categories of projects are popular topics such as image generation, image-to-image transformation, automatic line art coloration, automatic character lighting, automatic character editing, automatic sketch editing, talking head animation and much more.
Here are some cool projects in the repository that I found particularly interesting.
This project uses a deep neural network that takes an image of an anime character’s face…
The primary goal of a data visualization is to represent a point in a clear, compelling way. As Data Scientists, we’re always aiming to create the coolest, most innovative types of data visualizations. To the point where the actual message is lost. Why over complicate things? Sometimes, a simple message only requires a simple visualization to convey it. In this blog, I’ll be sharing a brief tutorial on how you can easily create a venn diagram with Matplotlib and any kind of data.
With Data Science, we need different tools to handle the diverse range of datasets. Before we dive into the different methods for sentiment analysis, it’s important to note that it’s a technique within Natural Language Processing. Often called NLP, it is the study of how computers can understand human language. And although this is a specialty that is popular among Data Scientists, it’s not exclusive to the industry.
Working with text data comes with a unique set of problems and solutions that other types of datasets don’t have. Often, text data requires more cleaning and preprocessing than other data types…
Have you ever tried to preview your Jupyter Notebook file and received this pesky error message after about 20+ seconds of waiting?
About 70% of the time, we see this message. As if it’s a gamble to see whether your
.ipynb file will render or not.
Good news, this “Sorry, something went wrong. Reload?” message doesn’t have an impact on your actual commit. It’s just an issue on GitHub’s end because it’s unable to render a preview of the file.
After much research, it seems that no one on the internet is quite sure about why this problem is occurring…
Pytrends is an unofficial API for tracking Google Search trends, and we can use
matplotlib to visualize these trends over time. You can read more about the package in the project’s GitHub repository here.
Well, we all know what Google Search is. We all use it several times a day, sometimes without a second thought, to access millions of search results for a wide range of topics. Fun fact, the Google Search Toolbar was first released in 2000 for Internet Explorer 5!
By looking at a time series visualization of Google Search keywords over a decade, we can draw valuable…
Before I dive into the definition of the No Free Lunch theorem, let’s quickly discuss the context. The beauty of data science and machine learning is that no two datasets will ever be the same. The size, noise and content will always be different. Therefore, our approach to every problem must be different.
The No Free Lunch theorem states that there is no one model that works best for every problem. The assumptions of a great model for one problem may not hold for another problem. …
With data science, the key to learning any new technology will always be practicing first-hand with a project. To learn Tableau, I performed an analysis of the survival rates of the Titanic. The full project can be found here, hosted on Tableau Public.
Anyone familiar with Kaggle, the data science and machine learning dataset resource, may already recognize the Titanic dataset. This dataset provides observations for each passenger on the Titanic and their survival outcome. For the purposes of this project, only 871 observations from the training set were used. Ultimately, out of the 2,435 total passengers on board, only…
Last week, I shared a tutorial about creating a spam filter to classify an email. You can find it linked here. In that blog, I walked through the theory behind a Naive Bayes algorithm. And as promised, this blog will be about implementing all of that code.
We can test out the model by feeding in real-life data. A popular dataset that is commonly used for spam filter testing is the SpamAssassin public corpus. We’ll be looking at the files prefixed with
Creating a spam filter isn’t a new concept, but it’s important to understand the underlying theory that drives these predictions. Furthermore, understanding the theory behind machine learning algorithms in general is crucial for a Data Scientist to effectively implement them on real-life data.
Naive Bayes classifiers are a popular statistical technique used for email filtering. These algorithms typically use bag of words features to identify spam emails. This baseline technique can tailor itself to the email needs of individual users and give a low false positive rate, which is generally acceptable to users.
The key to the Naive Bayes algorithm…
A key skill for any Data Scientist is the ability to write production-quality code to create models and deploy them into cloud environments. Typically, working with cloud computing and data architectures falls in the Data Engineer job title. However, every data professional is expected to be a generalist who can adapt and scale their projects.
Here is an introduction to popular platforms that I have seen across dozens of job descriptions. This doesn’t mean that we have to become experts overnight, but it helps to understand the services that are out there. …
Data Scientist | Machine Learning | Digital Media Studies