Module 4

Text Classification Application

This Streamlit app is an interactive tool designed for real-time text classification. Users can input text, and the app utilizes a machine learning model to predict the category of the input text, displaying both the predicted class and confidence scores.

Assessment - MODULE 4

Choose the appropriate option

Which of the following approaches is used for text classification?

Rule-based approach
Machine-based approach
Hybrid Approaches
All of the above

The target variable for the classification problem should be a

Quantitative data
Qualitative data
No target variable
None of the above

Logistic regression uses ________________ function

Softmax function
Logistic Function
Relu Activation Function
All of the mentioned

The output value of logistic regression like between?

–α to +α
0 to 1
-1 to +1
Both A & C

Support vector machines uses ____________________ to construct margins

Support Vectors
Hyper Plane
Regression line
None of the above

Fill in the spaces with appropriate answers

True Negative refers to the ratio of negatives correctly predicted from all the false labels.
True Positive is the proportion of true positives out of predicted positives.
F1 Score is the harmonic means of precision and recall.
What is accuracy? Accuracy is defined as the total number of correct classifications divided by the total number of classifications.
What is Sensitivity? The total number of positive results how many positives were correctly predicted by the model.

True or False

- Logistic regression is used for regression tasks.

1. True
2. False (used for classification tasks, not regression tasks )

Accuracy is not the best metric for classification problem statements.
1. True
2. False

We need to perform data cleaning before we pass the text data into the machine learning algorithm.
1. True
2. False

ROC helps us to choose the best model amongst the models for which we have plotted the ROC curves.
1. True
2. False

Kernel function in SVM is to compute in a higher-dimensional space without calculating the new coordinates in that higher dimension.
1. True
2. False

Programming Assignment

Using the data from the below URL.

https://www.kaggle.com/balaka18/email-spam-classification-dataset-csv?select=emails.csv

Links to an external site.

By referring to the code used in tasks, perform the following tasks on the above text.

Text cleaning

Feature Extraction using TF-IDF

Build the Machine Learning model

Evaluate the performance on the Test set

Pickle the Model

Create a UI for the Model on Streamlit

Page updated

Google Sites

Report abuse

Module 4

Text Classification Application

This is a Chat Bot project that helps students in a Natural Language Processing class by answering questions specific to the class details and can also be used to practice quiz questions.

You can find her by clicking the button Below! ⬇️