By Jérémy Lambert, Data Consultant



In this article, we are going to compare the sentiment extraction performance between Sentiment Analysis engines and Custom Text classification engines. The idea is to show pros and cons of these two types of engines on a concrete dataset.

 

Definitions:

 
 
Sentiment analysis (or opinion mining) is a natural language processing technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs.

Text classification is a machine learning technique that assigns a set of predefined categories to a dataset of texts. Text classifiers can be used to organize, structure, and categorize pretty much any kind of text. It has to be trained with a set of labellized texts.

Sentiment analysis engines are Trained engines whereas custom text classification engines are “AutoML” (Automated Machine Learning) engines.

It is very important to distinguish Trained APIs and AutoML APIs:

  • Trained APIs are based on models already trained by providers with their databases. These models are usually used to manage common use cases of : sentiment analysis, named entity recognition, translation, etc. However, it is always relevant to try these APIs before custom models since they are more and more competitive and efficient.
  • For specific use cases where a very high precision is needed, it may be better to use AutoML APIs. These APIs are provided by multiple providers like Google Cloud Platform, Amazon Web Services, Microsoft Azure, IBM Watson, and many others. AutoML APIs allow users to build their own custom model, trained on the user’s database. These models are trained on multiple datasets beforehand by providers.

This article compares already Trained Sentiment APIs and Custom text classification APIs. The aim is to give you insights on what to choose depending on: price, performances, integration, etc.

 

Providers:

 
 
During our study, we used different engines of sentiment analysis and custom text classification. To easily access these engines, we used Eden AI that centralizes multiple NLP engines from different providers.

For sentiment analysis, we used:

For Custom text classification, we used:

This is the pull of providers APIs we tested. It is interesting to note that a lot of other proprietary and open source solutions exist. We can mention Monkey LearnTwinwordsConnexun, etc.

 

Use cases:

 
 
As said previously, sentiment analysis is used in hundreds of fields, for many various use cases. In this article, we chose a very common use case:

You are a company that wants to extract tweets about our support and products. You want to extract sentiment from these tweets in order to analyze negative comments and improve our services.

To illustrate this use case, the comparison was realized on this Kaggle dataset: https://www.kaggle.com/sureshmecad/identify-the-sentiments-analytics-vidhya?select=train.csv

We keep the 1000 last lines of training dataset as a test dataset to compare predictions from sentiment analysis and custom text classification engines. The rest of the data set is used for training custom text classification engines.

 

Tests:

 
 

Custom text classification

 

First, we started training custom text classification models with Google Cloud and AWS engines. We used directly the Eden AI platform which allows us to train both GCP and AWS models on a unique platform:



Eden AI: create Custom text classification project

 


Eden AI: import data for Custom text classification

 

The creation is very simple, we just need to select the language, the type of classification and import our dataset. Once the project is created, we can train both GCP and AWS engines:



Eden AI: Custom text classification models trained

 

Once models are trained, you can generate predictions directly from the platform with our test dataset:



Eden AI: Custom text classification prediction

 

Sentiment Analysis APIs

 
For the prediction with trained sentiment analysis APIs, we use Eden AI Python SDK. It allows us to use one unique script to generate predictions with GCP, AWS and Azure engines:



Eden AI: Python SDK for Sentiment Analysis API

 

Code is the same for AWS and Azure engines, we just had to change “provider” parameter to “amazon” and “microsoft”.

 

Performances:

 
 
Here is the accuracy of our trainings, it is just an indicative metric:



Eden AI: Custom text classification models metric

 

Now, we generate predictions with our test dataset (1000 predictions) and we calculate the accuracy.

After getting performancing for custom text classification, we repeat the same operation for sentiment analysis engines.

AWS and Azure engines use percentages for positive, negative and neutral (equivalent to mixed for AWS). So we show here the results when keeping “neutral” and “mixed” predictions and results without keeping them:

Accuracy (over a 1000 predictions batch):



Custom text classification and Sentiment analysis performance

 

Pricing:

 
 
Pricing between sentiment analysis APIs and custom text classification is quite different. Actually, for sentiment analysis APIs, you will only pay for inferences you do (characters pricing base) whereas for custom text classification, you need to pay for training the model, deploying the model and doing inferences.

Here are the pricings:

 

Sentiment Analysis

 



Sentiment analysis pricing for GCP, AWS and Azure

 

These are prices for the lowest consumption limit. With higher volumes, you can get better prices.

 

Custom text classification

 



Custom text classification pricing for GCP and AWS

 

The price for inference is $1 per million characters for sentiment analysis and $5 per million characters for custom text classification.

Sentiment analysis API is 5 times cheaper than custom text classification, without considering costs for training and deployment.

 

Conclusion:

 
 
Both alternatives are viable. The choice between Sentiment Analysis API and Custom text classification must be made depending on the expected performance and budget allocated. You can definitely reach better performance with custom text classification but sentiment analysis performance remains acceptable. As shown in the article, sentiment analysis is much cheaper than custom text classification.

To conclude, we can advise you to try sentiment analysis first and use custom text classification if you want to get better accuracy.

 
Bio: Jérémy Lambert is a Data Consultant at DataGenius.

Original. Reposted with permission.

Related:



Source link

Leave a Reply

Your email address will not be published.