This Project is a part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The initial dataset contains pre-labelled tweet and messages from real-life disasters. The aim of this project is to build a Natural Language Processing tool that categorize messages.
A deployed machine learning model that has the capability to automatically classify the incoming disaster messages into related 36 categories. Project developed as a part of Udacity's Data Science Nanodegree program.
Collaborative team machine learning project classifying reviews scraped from the IMDB website as either positive or negative using sentiment classification. Tools used: BeautifulSoup and Splinter to scrape reviews, Pyspark, SQLAlchemy and Heroku.
Big data application of Machine Learning concepts for sentiment classification of US Airlines tweets. The focus is on the usage of pyspark libraries (ml-lib) on big data to solve a problem using Machine Learning algorithms and not about the choice of algorithm used in the ML model creation. It also involves data pre-processing using NLP techniques, cross-validation and parameter-grid builder.