Final Code is in Final_Gen Folder and linked below as well
This repository contains the code and necessary instructions to perform sentiment analysis on Twitter data using a Convolutional Neural Network (CNN). The code is implemented in Python and utilizes various libraries such as nltk, emoji, pandas, and torch.
Before running the code, ensure that you have the following prerequisites:
-
Google Colab: This code is designed to be run on Google Colab, a cloud-based Jupyter notebook environment.
-
Dataset: Prepare your training and validation datasets in CSV format, containing at least the following columns: 'sentiment' and 'tweet'.
-
GPU: A GPU is highly recommended for faster training times.
Follow these steps to get started with the sentiment analysis code:
-
Open the provided Google Colab link: Twitter Sentiment Analysis Colab.
-
Make a copy of the Colab notebook: Go to "File" > "Save a copy in Drive" to make your own copy.
-
Upload your datasets: Replace the file paths for
train_pathandval_pathwith the paths to your training and validation datasets in the Colab notebook.
The provided code performs the following steps:
-
Data Preprocessing:
- Remove NaN values and empty rows.
- Encode sentiment labels as integers (0: Negative, 1: Neutral, 2: Positive).
- Clean and preprocess tweets by removing URLs, mentions, hashtags, emojis, etc.
-
Model Building and Training:
- Define a CNN architecture for sentiment analysis.
- Train the model using the training dataset.
- Tune hyperparameters (learning rate, dropout, optimizer) using hyperopt.
-
Model Evaluation:
- Evaluate the trained model on the validation dataset.
- Display the confusion matrix and accuracy for each sentiment class.
-
Predicting on Unseen Data:
- Preprocess and clean unseen data.
- Use the trained model to predict sentiment labels for the unseen data.
- Display the confusion matrix and accuracy for the unseen data.
Feel free to customize the code for your specific needs:
-
Adjust hyperparameters: Modify hyperparameters such as learning rate, dropout rate, optimizer, and number of epochs to experiment with different settings.
-
Model architecture: You can modify the CNN architecture by changing the number of convolutional layers, filters, and fully connected layers.
-
Data preprocessing: Customize the data preprocessing steps to better fit your dataset's characteristics.
This readme provides an overview of the Twitter sentiment analysis code available in the provided Colab notebook. Follow the steps to upload your datasets, run the code, and customize it as needed for your sentiment analysis tasks. If you have any questions or need further assistance, feel free to reach out.