Multi-label User Utterance Classification in Natural Language | Shubhamggaur | Oct 2024

SeniorTechInfo
4 Min Read
Shubhamggaur

In an era where data drives decision-making, extracting actionable insights from textual data has become crucial. This article details my approach to multilabel classification of user utterances, particularly focusing on movie descriptions, and how various machine learning techniques were employed to predict multiple attributes associated with each movie.

Introduction

The primary objective of this project was to develop a supervised multilabel classification model that can accurately predict various attributes from textual descriptions of movies. Each movie can be linked to multiple attributes, including genres, actors, directors, and more. The task is not only to identify all relevant labels for a given movie description but also to effectively handle the complexity inherent in textual data.

Dataset Overview

The dataset comprises two main files: a training dataset (`hw1_train.csv`) containing 2,312 samples and a test dataset (`hw1_test.csv`) with 981 samples. Each sample in the training data includes a textual description of the movie (column: UTTERANCES) and associated attributes (column: CORE RELATIONS). The attributes are represented as a binary vector, indicating the presence of each attribute for the movie.

Example Input Data

Methodology

Data Preprocessing

I utilized `CountVectorizer` for feature extraction, focusing on character-level n-grams to robustly handle variations in movie descriptions. By analyzing sequences of characters, I aimed to capture nuances in the text that could influence classification.

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(analyzer=’char_wb’, ngram_range=(1, 5))
input_vectorized_df = vectorizer.fit_transform(df[‘UTTERANCES’])

Model Architecture

The classification model was implemented using a Multi-Layer Perceptron (MLP) architecture, which consists of several hidden layers equipped with dropout layers to mitigate overfitting — a critical concern given the complexity of the dataset.

import torch
import torch.nn as nn
class MLPModel(nn.Module):
def __init__(self, input_dim, output_dim, dropout_rate=0.2):
super(MLPModel, self).__init__()
self.fc1 = nn.Linear(input_dim, 512)
self.bn1 = nn.BatchNorm1d(512)
self.dropout1 = nn.Dropout(dropout_rate)

self.fc2 = nn.Linear(512, 256)
self.bn2 = nn.BatchNorm1d(256)
self.dropout2 = nn.Dropout(dropout_rate)

self.fc3 = nn.Linear(256, 128)
self.bn3 = nn.BatchNorm1d(128)
self.dropout3 = nn.Dropout(dropout_rate)

self.fc4 = nn.Linear(128, output_dim) # Output layer for multi-label classification

def forward(self, x):
x = torch.relu(self.bn1(self.fc1(x)))
x = self.dropout1(x)
x = torch.relu(self.bn2(self.fc2(x)))
x = self.dropout2(x)
x = torch.relu(self.bn3(self.fc3(x)))
x = self.dropout3(x)
x = torch.sigmoid(self.fc4(x)) # Sigmoid for multi-label output
return x

Training and Evaluation

The model was trained using Binary Cross-Entropy Loss, suitable for multilabel classification. An Adam optimizer was used, and various hyperparameters were optimized, including the learning rate, dropout rate, and the number of hidden units.

criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0005)

To evaluate the model’s performance, precision-recall curves were used to determine optimal thresholds for classification, ensuring a balanced trade-off between precision and recall.

Classification Report and Performance Metrics

After training the model, I generated a classification report to evaluate its performance across various metrics like precision, recall, and F1-score.

from sklearn.metrics import classification_report
last_val_predictions = (last_val_outputs > average_threshold).astype(int)
report = classification_report(y_val.numpy(), last_val_predictions, target_names=class_names)
print(report)

The classification report provides insights into how well the model performs for each class, helping identify areas for improvement.

Precision-Recall Optimization:

I calculated optimal thresholds for each class by analyzing the precision-recall curve. For each class, I extracted precision, recall, and thresholds using the precision_recall_curve function. Subsequently, I computed the F1 scores, which provide a balance between precision and recall, and identified the threshold that maximizes the F1 score. This approach ensures that the selected threshold effectively balances false positives and false negatives for each class, improving model performance on the validation set. The optimal thresholds are stored for further evaluation and predictions.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *