Truncated SVD (The Dimensionality Reduction Algorithm) on Amazon Fine Food Reviews Analysis

 

Truncated SVD can be used as one of the DIMENSIONALITY REDUCTION algorithm like PCA (Principal Component Analysis)

Truncated SVD on Amazon Fine Food Reviews Analysis

Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews

EDA: https://nycdatascience.com/blog/student-works/amazon-fine-foods-visualization/

The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon.

Number of reviews: 568,454
Number of users: 256,059
Number of products: 74,258
Timespan: Oct 1999 - Oct 2012
Number of Attributes/Columns in data: 10

Attribute Information:

  1. Id
  2. ProductId - unique identifier for the product
  3. UserId - unqiue identifier for the user
  4. ProfileName
  5. HelpfulnessNumerator - number of users who found the review helpful
  6. HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
  7. Score - rating between 1 and 5
  8. Time - timestamp for the review
  9. Summary - brief summary of the review
  10. Text - text of the review

 

Objective:

  1. Apply Truncated-SVD on only this feature set:
    • SET 2:Review text, preprocessed one converted into vectors using (TFIDF)

    • Procedure:
      • Take top 2000 or 3000 features from tf-idf vectorizers using idf_ score.
      • You need to calculate the co-occurrence matrix with the selected features (Note: X.X^T doesn’t give the co-occurrence matrix, it returns the covariance matrix, check these bolgs blog-1, blog-2 for more information)
      • You should choose the n_components in truncated svd, with maximum explained variance. Please search on how to choose that and implement them. (hint: plot of cumulative explained variance ratio)
      • After you are done with the truncated svd, you can apply K-Means clustering and choose the best number of clusters based on elbow method.
      • Print out wordclouds for each cluster, similar to that in previous assignment.
      • You need to write a function that takes a word and returns the most similar words using cosine similarity between the vectors(vector: a row in the matrix after truncatedSVD)

SOLUTION:

<iframe src="https://www.kaggle.com/embed/pradipdharam/a11-truncated-svd-amazonfinefoodreviews?kernelSessionId=17002106" height="800" style="margin: 0 auto; width: 100%; max-width: 950px;" frameborder="0" scrolling="auto" title="A11. Truncated SVD - AmazonFineFoodReviews"></iframe><iframe src="https://www.kaggle.com/embed/pradipdharam/a11-truncated-svd-amazonfinefoodreviews?kernelSessionId=17002106" height="800" style="margin: 0 auto; width: 100%; max-width: 950px;" frameborder="0" scrolling="auto" title="A11. Truncated SVD - AmazonFineFoodReviews"></iframe>

<iframe src="https://www.kaggle.com/embed/pradipdharam/a11-truncated-svd-amazonfinefoodreviews?kernelSessionId=17002106" height="800" style="margin: 0 auto; width: 100%; max-width: 950px;" frameborder="0" scrolling="auto" title="A11. Truncated SVD - AmazonFineFoodReviews"></iframe>
Previous Post Next Post