Transformed data using Spark RRD operations on 5000K reviews. Built a product recommendation system using ALS Collaborative Filtering obtaining an RMSE value of 0.91. The data set contains data for 287,209 products with 5,074,160 reviews and ratings by 1, 57,386 users.
Performed Apache Spark operations and found out the list of customers who are active on Amazon. This would be helpful to Amazon as it is conducting an A/B testing experiment on potential target users and want to know if customers list which they have are ACTIVE users or not.
Developed a mapper reducer function which can perform the same functionalities of a mapper reducer used in big data applications. Implemented Multi-threading operations for faster functionality of Mapper-Reducer function by parallel processing of data.
Analyzed Twitter Sentiment by implementing Logistic Regression from scratch. Used twitter dataset having 1 million tweets and applied logistic regression on it to find the sentiment associated with each tweet which would help in classifying of tweets.