DE

All in One Data Cleaning App

Developed an Automated Data Cleaning App which performs 15 most important data cleaning features i.e. handling missing values, correlation tests, statistical tests, balancing dataset using sampling techniques etc. It reduces the overall data cleaning time by nearly 30-40%.

Amazon Product Recommendation System

Transformed data using Spark RRD operations on 5000K reviews. Built a product recommendation system using ALS Collaborative Filtering obtaining an RMSE value of 0.91. The data set contains data for 287,209 products with 5,074,160 reviews and ratings by 1, 57,386 users.

Automated ETL pipeline to Analyze US-Security Financial Documents

Objective of this assignment is to extract some sections (which are mentioned below) from SEC / EDGAR financial reports and perform text analysis to compute variables those are explained below. Link to SEC / EDGAR financial reports are given in excel spreadsheet “cik_list.xlsx”.

Automated Image Captioning System

Built an Image Captioning system using Encoder CNN and Decoder LSTM with an attention mechanism to generate relevant captions for any input image. Analyzed the results using BLEU 3, 4 metrics, and deployed the model in real-time using Amazon SageMaker.

English to Telugu Language Translator Application

Developed a English news to Telugu language translator application using LSTM. Encased the application in Flask and orchestrated it on AWS by pushing the containerized Flask application to AWS Linux EC2 instance deploying the application in-live.

Find Amazons most potential customers using Spark Operations

Performed Apache Spark operations and found out the list of customers who are active on Amazon. This would be helpful to Amazon as it is conducting an A/B testing experiment on potential target users and want to know if customers list which they have are ACTIVE users or not.

Mapper Reducer Implementation from Scratch

Developed a mapper reducer function which can perform the same functionalities of a mapper reducer used in big data applications. Implemented Multi-threading operations for faster functionality of Mapper-Reducer function by parallel processing of data.