Machine learning is a specific subset of artificial intelligence that trains a machine how to learn. It is based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. We sat down with two students on the Master in Applied Economics program to talk about the topics they chose for their Machine Learning Techniques term projects: an anime movies recommender system and customer churn.
In a broad sense, recommender systems are algorithms that suggest relevant content to users, be it movies to watch, music to listen to, products to buy or anything else depending on the industry. They are very critical for certain industries as they can generate a huge amount of income when operating efficiently.
Customer churn (also known as customer attrition) occurs when an existing customer, user, player, subscriber or any kind of return client cancels a subscription, ends a membership, closes an account or decides to buy his or her groceries in another store. Companies can measure their customer churn rate, and by applying adequate algorithms, prevent their businesses from losing customers.
Before enrolling in the MAE program, David Pacon worked as a financial analyst for T-Mobile and had zero experience with R programming. Elshana Mammadli, the youngest member on the MAE program, obtained her BA degree in Business Administration at the ADA University in Azerbaijan. She had also had no previous experience with programming before.
What theme did you choose for your project and why?
Elshana: I was always wondering how recommendation systems work, so I chose anime movies and I worked on their recommendation. I obtained the data from an open source database called Kaggle.com.
Subject: I analyzed customer churn. The reason is very simple because I worked at T-Mobile Czech Republic as a financial analyst and I was also responsible for analyzing customer churn on a monthly basis. So, this topic is very close to me and I have previous experience with it.
Algorithm: I decided to use two algorithms which I learned during the Machine Learning class. The first one is Naive Bias classifier, which is one of the simplest Bayesian network models and assumes strong independence between the features. The second one is called a kNN algorithm, which deals with distances.
Data: I also obtained my data from Kaggle.com. It was a standardized dataset suitable for customer churn prediction.
Outcome of the algorithms: Based on the output from my algorithms I can more easily predict whether a customer will leave the company or not because basically we are searching for the same patterns in the behavior of customers who are leaving. This may help companies to predict customer churn in a more precise way.
What do you need to start programming a recommender system/customer churn?
E: I firstly needed proper data with all details about anime; producers, genres, user id, user rating for each anime. Those were the main components I focused on during the recommendation. Then I downloaded data, cleaned them and made them ready for the recommendation process. Subsequently, I chose a kNN (k-nearest neighbors) algorithm to continue. In this algorithm, you choose one user and give a recommendation based on the characteristics of animes he/she watched. The characteristics that I focused on were genres, producers and ranking of anime movies.
D: I programmed everything in R, which is very useful software for data science. The biggest advantage of this software is that it is completely free and it also provides nice support, and if you are struggling with something you can simply google it and most likely you will find the answer or possible solution for your problem.
What was the most challenging part of the project?
E: My data was too big, millions of rows, which made my codes very slow, so I focused on the most frequent set of genres and producers to give the recommendation. As a result, I solved the problem, and my codes ran fast.
D: During the course we covered 9 different algorithms and it was completely up to me to choose which algorithm I wanted to apply and which data I would use. This part was pretty challenging for me because I had no experience with those algorithms or with programming in general before. Of course, the final presentation was a huge challenge as well. In the committee there were three people including the director of our program, so you wanted to simply make as good an impression as possible. However, during the course, we had highly professional support from our TA and lecturer; they were both always willing to answer our questions or arrange a Skype meeting and discuss our projects.
Do you think programming in R helped you improve your memory and/or your learning process?
E: Machine learning requires a good knowledge of R, so you need to have a solid background in R to be successful.
D: In my opinion, it surely did because if you study a problem in a theoretical way and you are just reading about the problem, you never know whether you understand the problem properly or you just pretend to understand. But, if you have to write your own program or apply your own algorithm to the real data, you will see how the algorithm indeed works and what the pros and cons are.