By Jérémy Lambert, Data Consultant
In this article, we are going to compare the sentiment extraction performance between Sentiment Analysis engines and Custom Text classification engines. The idea is to show pros and cons of these two types of engines on a concrete dataset.
Sentiment analysis (or opinion mining) is a natural language processing technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs.
Text classification is a machine learning technique that assigns a set of predefined categories to a dataset of texts. Text classifiers can be used to organize, structure, and categorize pretty much any kind of text. It has to be trained with a set of labellized texts.
Sentiment analysis engines are Trained engines whereas custom text classification engines are “AutoML” (Automated Machine Learning) engines.
It is very important to distinguish Trained APIs and AutoML APIs:
- Trained APIs are based on models already trained by providers with their databases. These models are usually used to manage common use cases of : sentiment analysis, named entity recognition, translation, etc. However, it is always relevant to try these APIs before custom models since they are more and more competitive and efficient.
- For specific use cases where a very high precision is needed, it may be better to use AutoML APIs. These APIs are provided by multiple providers like Google Cloud Platform, Amazon Web Services, Microsoft Azure, IBM Watson, and many others. AutoML APIs allow users to build their own custom model, trained on the user’s database. These models are trained on multiple datasets beforehand by providers.
This article compares already Trained Sentiment APIs and Custom text classification APIs. The aim is to give you insights on what to choose depending on: price, performances, integration, etc.
During our study, we used different engines of sentiment analysis and custom text classification. To easily access these engines, we used Eden AI that centralizes multiple NLP engines from different providers.
For sentiment analysis, we used:
For Custom text classification, we used:
As said previously, sentiment analysis is used in hundreds of fields, for many various use cases. In this article, we chose a very common use case:
You are a company that wants to extract tweets about our support and products. You want to extract sentiment from these tweets in order to analyze negative comments and improve our services.
To illustrate this use case, the comparison was realized on this Kaggle dataset: https://www.kaggle.com/sureshmecad/identify-the-sentiments-analytics-vidhya?select=train.csv
We keep the 1000 last lines of training dataset as a test dataset to compare predictions from sentiment analysis and custom text classification engines. The rest of the data set is used for training custom text classification engines.
Custom text classification
First, we started training custom text classification models with Google Cloud and AWS engines. We used directly the Eden AI platform which allows us to train both GCP and AWS models on a unique platform:
Eden AI: create Custom text classification project
Eden AI: import data for Custom text classification
The creation is very simple, we just need to select the language, the type of classification and import our dataset. Once the project is created, we can train both GCP and AWS engines:
Eden AI: Custom text classification models trained
Once models are trained, you can generate predictions directly from the platform with our test dataset:
Eden AI: Custom text classification prediction
Sentiment Analysis APIs
For the prediction with trained sentiment analysis APIs, we use Eden AI Python SDK. It allows us to use one unique script to generate predictions with GCP, AWS and Azure engines:
Eden AI: Python SDK for Sentiment Analysis API
Code is the same for AWS and Azure engines, we just had to change “provider” parameter to “amazon” and “microsoft”.
Here is the accuracy of our trainings, it is just an indicative metric:
Eden AI: Custom text classification models metric
Now, we generate predictions with our test dataset (1000 predictions) and we calculate the accuracy.
After getting performancing for custom text classification, we repeat the same operation for sentiment analysis engines.
AWS and Azure engines use percentages for positive, negative and neutral (equivalent to mixed for AWS). So we show here the results when keeping “neutral” and “mixed” predictions and results without keeping them:
Accuracy (over a 1000 predictions batch):
Custom text classification and Sentiment analysis performance
Pricing between sentiment analysis APIs and custom text classification is quite different. Actually, for sentiment analysis APIs, you will only pay for inferences you do (characters pricing base) whereas for custom text classification, you need to pay for training the model, deploying the model and doing inferences.
Here are the pricings:
Sentiment analysis pricing for GCP, AWS and Azure
These are prices for the lowest consumption limit. With higher volumes, you can get better prices.
Custom text classification
Custom text classification pricing for GCP and AWS
The price for inference is $1 per million characters for sentiment analysis and $5 per million characters for custom text classification.
Sentiment analysis API is 5 times cheaper than custom text classification, without considering costs for training and deployment.
Both alternatives are viable. The choice between Sentiment Analysis API and Custom text classification must be made depending on the expected performance and budget allocated. You can definitely reach better performance with custom text classification but sentiment analysis performance remains acceptable. As shown in the article, sentiment analysis is much cheaper than custom text classification.
To conclude, we can advise you to try sentiment analysis first and use custom text classification if you want to get better accuracy.
Bio: Jérémy Lambert is a Data Consultant at DataGenius.
Original. Reposted with permission.