Transformed data using Spark RRD operations on 5000K reviews. Built a product recommendation system using ALS Collaborative Filtering obtaining an RMSE value of 0.91. The data set contains data for 287,209 products with 5,074,160 reviews and ratings by 1, 57,386 users.
Objective of this assignment is to extract some sections (which are mentioned below) from SEC / EDGAR financial reports and perform text analysis to compute variables those are explained below. Link to SEC / EDGAR financial reports are given in excel spreadsheet “cik_list.xlsx”.
Built an Image Captioning system using Encoder CNN and Decoder LSTM with an attention mechanism to generate relevant captions for any input image. Analyzed the results using BLEU 3, 4 metrics, and deployed the model in real-time using Amazon SageMaker.
Calculated similarity between the documents 'UIC' with 'UIUC', 'MIT','UIS','Tesla', and 'Stanford' text documents. Peformed Jaccard similarity and cosine similarity functions on the text documents and obtained the similarity between UIC and rest of the documents.
Analyzed Twitter Sentiment by implementing Logistic Regression from scratch. Used twitter dataset having 1 million tweets and applied logistic regression on it to find the sentiment associated with each tweet which would help in classifying of tweets.
Performed word embeddings using Glove on the wikipedia document of 'UIC' and created a scatter plot to see the clusters of the projections. These embeddings are a type of word representation that allows words with similar meaning to have a similar representation.