Projects
Prediction of Telecom Customer Churn
Mar. 2022 - May 2022
• Applied the Random Forest machine learning model in Python on a Telecom churn Kaggle dataset to predict which customers are at high risk to churn (switch telecom service providers).
• Analysed the data and used the model to identify the key factors causing customer churn.
• Check out this project here.
Frequent Itemset Mining
Mar. 2022 - May 2022
• Developed in MapReduce, a parallel implementation of the Apriori Algorithm for frequent itemset mining.
• Analyzed H&M transaction data from Kaggle to find products that were frequently bought together.
• Used a cluster of machines in Amazon EMR to run the frequent itemset algorithm and AWS S3 to store the data.
• Check out this project here.
Insider Trading Analysis
Oct. 2021 - Dec. 2021
• Web scraped insider filing and stock price data from Form 4 filings from the U.S Securities and Exchange Commission (SEC) website and the Yahoo Finance API respectively, using Python.
• Web scraped insider filing and stock price data from Form 4 filings from the U.S Securities and Exchange Commission (SEC) website and the Yahoo Finance API respectively, using Python.
• Web scraped insider filing and stock price data from Form 4 filings from the U.S Securities and Exchange Commission (SEC) website and the Yahoo Finance API respectively, using Python.
• Check out this project here.
Real-Time Facemask Detection and Analytics
Dec. 2020 - Apr. 2021
• Inspired by the ongoing pandemic, a prototype system was built to identify if a person is wearing a face mask properly or not.
• Identified people’s faces by applying a deep-learning technique in python called Single Shot Detector and scanned their faces for a face mask using another deep learning technique called MobileNet.
• Stored the different types of data obtained to visualize them for insights such as the number of people who wore a face mask on a certain day.
• Developed further into a research paper by adding person recognition using QR Codes and improved visualizations by using Tableau which was then presented at the 2021 ACMI 4.0 International Conference and published on IEEE Xplore.
• Check out my paper here.
Analysis of Tweets
Dec. 2021
• Analysed in R, the most common words in Donald Trump's tweets per year from 2015-2020.
• Calculated the most document defining words (most unique words) for each year and visualised them.
• Applied Machine Learning techniques such as Sparse Regression to find the words that have the strongest relationships to the amount of retweets the the tweet gets.
• Determined that the words fnn, quarantine and rocky had the strongest relationship with the number of retweets the respective tweets got.
• Check out this project here.
Prediction of survivors aboard the Titanic
Oct. 2019
• Applied the Random Forest machine learning model in Python on the Titanic dataset from Kaggle to predict whether a person aboard the Titanic survived or not.
• Determined that a person’s survival aboard the Titanic depended largely on factors such as their gender, age, and ticket class.
• Check out this project here.
Insider Trading Analysis
Oct. 2021 - Dec. 2021
• Web scraped insider filing and stock price data from Form 4 filings from the U.S Securities and Exchange Commission (SEC) website and the Yahoo Finance API respectively, using Python.
• Analysed in R, the aspects of insider trading such as how the trading behaviour of insiders changed over time or by their position, as well as whether there was a correlation between insider trading and stock prices of the company. • Discovered that purchases made by insiders have a positive correlation with the stock prices of their company.
• Check out this project here.
Real-Time Facemask Detection and Analytics
Dec. 2020 - Apr. 2021
• Inspired by the ongoing pandemic, a prototype system was built to identify if a person is wearing a face mask properly or not.
• Identified people’s faces by applying a deep-learning technique in python called Single Shot Detector and scanned their faces for a face mask using another deep learning technique called MobileNet.
• Stored the different types of data obtained to visualize them for insights such as the number of people who wore a face mask on a certain day.
• Developed further into a research paper by adding person recognition using QR Codes and improved visualizations by using Tableau which was then presented at the 2021 ACMI 4.0 International Conference and published on IEEE Xplore.
• Check out my paper here.
Analysis of Tweets
Dec. 2021
• Analysed in R, the most common words in Donald Trump's tweets per year from 2015-2020.
• Calculated the most document defining words (most unique words) for each year and visualised them.
• Applied Machine Learning techniques such as Sparse Regression to find the words that have the strongest relationships to the amount of retweets the the tweet gets.
• Determined that the words fnn, quarantine and rocky had the strongest relationship with the number of retweets the respective tweets got.
• Check out this project here.
Prediction of survivors aboard the Titanic
Oct. 2019
• Applied the Random Forest machine learning model in Python on the Titanic dataset from Kaggle to predict whether a person aboard the Titanic survived or not.
• Determined that a person’s survival aboard the Titanic depended largely on factors such as their gender, age, and ticket class.
• Check out this project here.