Experience

ML Research Assistant

University of Illlinois at Chicago

Jun 2020 – Present Chicago

Prediction of Diabetes from Lumiata Claims dataset

Streamlined ETL operations on a semi-structured Claims dataset and analyzed the important variables i.e. LOINC, ICD 10 and, CPT codes. Balanced the data using stratified sampling and predicted the probability of diabetes for patients using SVC model achieving a recall of 89%.

Identification of Leader words from any local meaning

Optimized Python script to transform unstructured data from Merriam Webster dictionary to structured format a strong rule-based association. Modeled a Universal Leader words identifier using Bi-directional LSTM which takes Glove Embedding and POS tags attaining a recall of 96%.

Data Science Intern

AutomizeApps

May 2020 – Present Chicago

Sentiment Analysis of Product Reviews

Implemented a rule-based classification and a Deep Learning LSTM model for sentiment analysis of 4 different languages i.e. English, Spanish, German, and French. The rule-based model outperformed the LSTM model achieving an accuracy of 92%.

Twitter Topic Modeling Application

Developed an application that applies topic modeling on the tweets of a subject using unsupervised LDA, and semi-supervised CorEx. Obtained 10 different latent topics and their sentiment over 14 weeks. Deployed the model in production using Azure ML deployment

Business Analyst

CGI

Jun 2016 – Jun 2019 India

Performed statistical analysis of FI and SD data and suggested tangible solutions to stakeholders by visualizing the KPI metrics in Tableau.
Streamlined workflows and built applications in SAP business modules deploying the applications using the IBM TWS tool.
Optimized complex SQL Queries for faster data processing which improved the speed and efficiency of applications by 28%.
Key achievement: Received “Pat on the Back Award” for spearheading the whole team and managing adhoc requests during challenging time.

Projects

All in One Data Cleaning App

Developed an Automated Data Cleaning App which performs 15 most important data cleaning features i.e. handling missing values, correlation tests, statistical tests, balancing dataset using sampling techniques etc. It reduces the overall data cleaning time by nearly 30-40%.

Amazon Product Recommendation System

Transformed data using Spark RRD operations on 5000K reviews. Built a product recommendation system using ALS Collaborative Filtering obtaining an RMSE value of 0.91. The data set contains data for 287,209 products with 5,074,160 reviews and ratings by 1, 57,386 users.

Automated ETL pipeline to Analyze US-Security Financial Documents

Objective of this assignment is to extract some sections (which are mentioned below) from SEC / EDGAR financial reports and perform text analysis to compute variables those are explained below. Link to SEC / EDGAR financial reports are given in excel spreadsheet “cik_list.xlsx”.

Automated Image Captioning System

Built an Image Captioning system using Encoder CNN and Decoder LSTM with an attention mechanism to generate relevant captions for any input image. Analyzed the results using BLEU 3, 4 metrics, and deployed the model in real-time using Amazon SageMaker.

Document Similarity Identifier

Calculated similarity between the documents ‘UIC’ with ‘UIUC’, ‘MIT’,‘UIS’,‘Tesla’, and ‘Stanford’ text documents. Peformed Jaccard similarity and cosine similarity functions on the text documents and obtained the similarity between UIC and rest of the documents.

English to Telugu Language Translator Application

Developed a English news to Telugu language translator application using LSTM. Encased the application in Flask and orchestrated it on AWS by pushing the containerized Flask application to AWS Linux EC2 instance deploying the application in-live.

Find Amazons most potential customers using Spark Operations

Performed Apache Spark operations and found out the list of customers who are active on Amazon. This would be helpful to Amazon as it is conducting an A/B testing experiment on potential target users and want to know if customers list which they have are ACTIVE users or not.

Image Classification System using CIFAR10 Dataset

Performed Image Classification system using CIFAR10 dataset by implementing 4 different neural networks i.e. Softmax, TwoLayerNN, ConvNet and my own model(MyModel), after which obtained the best performance i.e. 92% accuracy on my own customerized model “MyModel”.

Mapper Reducer Implementation from Scratch

Developed a mapper reducer function which can perform the same functionalities of a mapper reducer used in big data applications. Implemented Multi-threading operations for faster functionality of Mapper-Reducer function by parallel processing of data.

Mobility Impact on the new Covid19 cases Visualization in Tableau

Created an interactive dashboard that visualized the impact of mobility in different areas on the new covid19 cases. Proven the results both visually in Tableau and statistically using a Linear regression model showing that the mobility in parks is causing high number of new cases.

Prediction of Diabetes using Insurance Claims Dataset

Built a predictive model to find the risk of diabetes in patients using Insurance Claims dataset. Featurized the Insurance claims data, and train and optimize a predictive SVC model and found out probability of diabetes in each patient.

Prediction of EMR Usage using Hints Survery Form Data

Screening Tool for Chronic Kidney Disease Prediction

Created an interactive screening tool using R Shiny that predicts the risk of having a chronic Kidney disease using the Logistic regression model attaining 97% recall. The screening tool can be used by doctors in finding the probablity of Chronic Kidney Disease in patients.

SkyWest Airlines Competitor Visualization in Tableu

Created interactive dashboards to visually represent KPIs for for the performance of SkyWest Airlines in the United States. Analyzed the metrics like airlines delay, cause of delays etc. and compared it’s performance with the rest of the US airlines.

Twitter Sentimental Analysis using Logistic Regression

Analyzed Twitter Sentiment by implementing Logistic Regression from scratch. Used twitter dataset having 1 million tweets and applied logistic regression on it to find the sentiment associated with each tweet which would help in classifying of tweets.

Walmart Sales Forecasting using Time Series Analysis

Predicted the department wise sales for 45 Walmart stores modeling the effects of markdowns on holidays using the ARIMA and Holt-Winters model. The historical sales data is provided for 45 Walmart stores and each store contains a number of departments.

Word Embedding using Glove

Performed word embeddings using Glove on the wikipedia document of ‘UIC’ and created a scatter plot to see the clusters of the projections. These embeddings are a type of word representation that allows words with similar meaning to have a similar representation.

Ankit B V S

Data Science Intern

University of Illinois at Chicago

Biography

Interests

Education

Skills

R

Statistics

Photography

Experience

ML Research Assistant

University of Illlinois at Chicago

Data Science Intern

AutomizeApps

Business Analyst

CGI

Projects

All in One Data Cleaning App

Amazon Product Recommendation System

Automated ETL pipeline to Analyze US-Security Financial Documents

Automated Image Captioning System

Document Similarity Identifier

English to Telugu Language Translator Application

Find Amazons most potential customers using Spark Operations

Image Classification System using CIFAR10 Dataset

Mapper Reducer Implementation from Scratch

Mobility Impact on the new Covid19 cases Visualization in Tableau

Prediction of Diabetes using Insurance Claims Dataset

Prediction of EMR Usage using Hints Survery Form Data

Screening Tool for Chronic Kidney Disease Prediction

SkyWest Airlines Competitor Visualization in Tableu

Twitter Sentimental Analysis using Logistic Regression

Walmart Sales Forecasting using Time Series Analysis

Word Embedding using Glove

Recent & Upcoming Talks

Contact

Ankit B V S

Data Science Intern

University of Illinois at Chicago

Biography

Interests

Education

Skills

R

Statistics

Photography

Experience

ML Research Assistant

University of Illlinois at Chicago

Data Science Intern

AutomizeApps

Business Analyst

CGI

Projects

Recent & Upcoming Talks

Popular Topics

Contact