Developed an Automated Data Cleaning App which performs 15 most important data cleaning features i.e. handling missing values, correlation tests, statistical tests, balancing dataset using sampling techniques etc. It reduces the overall data cleaning time by nearly 30-40%.
Transformed data using Spark RRD operations on 5000K reviews. Built a product recommendation system using ALS Collaborative Filtering obtaining an RMSE value of 0.91. The data set contains data for 287,209 products with 5,074,160 reviews and ratings by 1, 57,386 users.
Objective of this assignment is to extract some sections (which are mentioned below) from SEC / EDGAR financial reports and perform text analysis to compute variables those are explained below. Link to SEC / EDGAR financial reports are given in excel spreadsheet “cik_list.xlsx”.
Built an Image Captioning system using Encoder CNN and Decoder LSTM with an attention mechanism to generate relevant captions for any input image. Analyzed the results using BLEU 3, 4 metrics, and deployed the model in real-time using Amazon SageMaker.
Developed a English news to Telugu language translator application using LSTM. Encased the application in Flask and orchestrated it on AWS by pushing the containerized Flask application to AWS Linux EC2 instance deploying the application in-live.
Performed Apache Spark operations and found out the list of customers who are active on Amazon. This would be helpful to Amazon as it is conducting an A/B testing experiment on potential target users and want to know if customers list which they have are ACTIVE users or not.
Developed a mapper reducer function which can perform the same functionalities of a mapper reducer used in big data applications. Implemented Multi-threading operations for faster functionality of Mapper-Reducer function by parallel processing of data.