Built a predictive model to find the risk of diabetes in patients using Insurance Claims dataset. Featurized the Insurance claims data, and train and optimize a predictive SVC model and found out probability of diabetes in each patient.
Transformed data using Spark RRD operations on 5000K reviews. Built a product recommendation system using ALS Collaborative Filtering obtaining an RMSE value of 0.91. The data set contains data for 287,209 products with 5,074,160 reviews and ratings by 1, 57,386 users.
Created an interactive screening tool using R Shiny that predicts the risk of having a chronic Kidney disease using the Logistic regression model attaining 97% recall. The screening tool can be used by doctors in finding the probablity of Chronic Kidney Disease in patients.