According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it can be challenging for companies to process and derive meaningful insights from the data using traditional methods. Machine learning (ML) can analyze large volumes of product reviews and identify patterns, sentiments, and topics discussed. With this information, companies can gain a better understanding of customer preferences, pain points, and satisfaction levels. They can also use this information to improve products and services, identify trends, and take strategic actions that drive business growth. However, implementing ML can be a challenge for companies that lack resources such as ML practitioners, data scientists, or artificial intelligence (AI) developers. With the new Amazon SageMaker Canvas features, business analysts can now use ML to derive insights from product reviews.
SageMaker Canvas is designed for the functional needs of business analysts to use AWS no code ML for ad hoc analysis of tabular data. SageMaker Canvas is a visual, point-and-click service that allows business analysts to generate accurate ML predictions without writing a single line of code or requiring ML expertise. You can use models to make predictions interactively and for batch scoring on bulk datasets. SageMaker Canvas offers fully-managed ready-to-use AI model and custom model solutions. For common ML use cases, you can use a ready-to-use AI model to generate predictions with your data without any model training. For ML use cases specific to your business domain, you can train an ML model with your own data for custom prediction.
In this post, we demonstrate how to use the ready-to-use sentiment analysis model and custom text analysis model to derive insights from product reviews. In this use case, we have a set of synthesized product reviews that we want to analyze for sentiments and categorize the reviews by product type, to make it easy to draw patterns and trends that can help business stakeholders make better informed decisions. First, we describe the steps to determine the sentiment of the reviews using the ready-to-use sentiment analysis model. Then, we walk you through the process to train a text analysis model to categorize the reviews by product type. Next, we explain how to review the trained model for performance. Finally, we explain how to use the trained model to perform predictions.
Sentiment analysis is a natural language processing (NLP) ready-to-use model that analyzes text for sentiments. Sentiment analysis may be run for single line or batch predictions. The predicted sentiments for each line of text are either positive, negative, mixed or neutral.
Text analysis allows you to classify text into two or more categories using custom models. In this post, we want to classify product reviews based on product type. To train a text analysis custom model, you simply provide a dataset consisting of the text and the associated categories in a CSV file. The dataset requires a minimum of two categories and 125 rows of text per category. After the model is trained, you can review the model’s performance and retrain the model if needed, before using it for predictions.
Prerequisites
Complete the following prerequisites:
Have an AWS account.
Set up SageMaker Canvas.
Download the sample product reviews datasets:
sample_product_reviews.csv – Contains 2,000 synthesized product reviews and is used for sentiment analysis and Text Analysis predictions.
sample_product_reviews_training.csv – Contains 600 synthesized product reviews and three product categories, and is for text analysis model training.
Sentiment analysis
First, you use sentiment analysis to determine the sentiments of the product reviews by completing the following steps.
On the SageMaker console, click Canvas in the navigation pane, then click Open Canvas to open the SageMaker Canvas application.
Click Ready-to-use models in the navigation pane, then click Sentiment analysis.
Click Batch prediction, then click Create dataset.
Provide a Dataset name and click Create.
Click Select files from your computer to import the sample_product_reviews.csv dataset.
Click Create dataset and review the data. The first column contains the reviews and is used for sentiment analysis. The second column contains the review ID and is used for reference only.
Click Create dataset to complete the data upload process.
In the Select dataset for predictions view, select sample_product_reviews.csv and then click Generate predictions.
When the batch prediction is complete, click View to view the predictions.
The Sentiment and Confidence columns provide the sentiment and confidence score, respectively. A confidence score is a statistical value between 0 and 100%, that shows the probability that the sentiment is correctly predicted.
Click Download CSV to download the results to your computer.
Text analysis
In this section, we go through the steps to perform text analysis with a custom model: importing the data, training the model and then making predictions.
Import the data
First import the training dataset. Complete the following steps:
On Ready-to-use models page, click Create a custom model
For Model name, enter a name (for example, Product Reviews Analysis). Click Text analysis, then click Create.
On the Select tab, click Create dataset to import the sample_product_reviews_training.csv dataset.
Provide a Dataset name and click Create.
Click Create dataset and review the data. The training dataset contains a third column describing product category, the target column consisting of three products: books, video, and music.
Click Create dataset to complete the data upload process.
On the Select dataset page, select sample_product_reviews_training.csv and click Select dataset.
Train the model
Next, you configure the model to begin the training process.
On the Build tab, on the Target column drop-down menu, click product_category as the training target.
Click product_review as the source.
Click Quick build to start the model training.
For more information about the differences between Quick build and Standard build, refer to Build a custom model.
When the model training is complete, you may review the performance of the model before you use it for prediction.
On the Analyze tab, the model’s confidence score will be displayed. A confidence score indicates how certain a model is that its predictions are correct. On the Overview tab, review the performance for each category.
Click Scoring to review the model accuracy insights.
Click Advance metrics to review the confusion matrix and F1 score.
Make predictions
To make a prediction with your custom model, complete the following steps:
On the Predict tab, click Batch prediction, then click Manual.
Click the same dataset, sample_product_reviews.csv, that you used previously for the sentiment analysis, then click Generate predictions.
When the batch prediction is complete, click View to view the predictions.
For custom model prediction, it takes some time for SageMaker Canvas to deploy the model for initial use. SageMaker Canvas automatically de-provisions the model if idle for 15 minutes to save costs.
The Prediction (Category) and Confidence columns provide the predicted product categories and confidence scores, respectively.
Highlight the completed job, select the three dots and click Download to download the results to your computer.
Clean up
Click Log out in the navigation pane to log out of the SageMaker Canvas application to stop the consumption of Canvas session hours and release all resources.
Conclusion
In this post, we demonstrated how you can use Amazon SageMaker Canvas to derive insights from product reviews without ML expertise. First, you used a ready-to-use sentiment analysis model to determine the sentiments of the product reviews. Next, you used text analysis to train a custom model with the quick build process. Finally, you used the trained model to categorize the product reviews into product categories. All without writing a single line of code. We recommend that you repeat the text analysis process with the standard build process to compare the model results and prediction confidence.
About the Authors
Gavin Satur is a Principal Solutions Architect at Amazon Web Services. He works with enterprise customers to build strategic, well-architected solutions and is passionate about automation. Outside work, he enjoys family time, tennis, cooking and traveling.
Les Chan is a Sr. Solutions Architect at Amazon Web Services, based in Irvine, California. Les is passionate about working with enterprise customers on adopting and implementing technology solutions with the sole focus of driving customer business outcomes. His expertise spans application architecture, DevOps, serverless, and machine learning.
Aaqib Bickiya is a Solutions Architect at Amazon Web Services based in Southern California. He helps enterprise customers in the retail space accelerate projects and implement new technologies. Aaqib’s focus areas include machine learning, serverless, analytics, and communication services