Text Classification
Tutorial shows how to train a sentiment analysis model using Grid and Lightning Flash

Goal

This example covers a text classification deep learning task
    1.
    Create a Datastore
    2.
    Start new Run
    3.
    Visualize Metrics
    4.
    Download Artifacts
Tutorial time: 5 minutes

Overview

Classifying text is a deep learning task that represents a class of problems such as sentiment analysis: predicting the sentiment of tweets and movie reviews or customer sentiment regarding a product as well as classifying email as spam or not.
This example uses the Transformers model (bert-base-uncased) from huggingface repository, it is adapted to use PyTorch Lightning, https://github.com/gridai/grid-text-classification and Lightning Flash

IMDB Dataset

IMDB dataset contains movie reviews, it is used for natural language processing, text analytics or sentiment classification such as positive or negative sentiment based on the text in the review. This dataset was originally created for this paper

Model (BERT)

BERT(Bidirectional Encoder Representations from Transformers) is a popular architecture used in NLP tasks. It works by applying bidirectional training of Transformer, a popular attention model, to language modeling.

Step 1: Create Datastore

It is fastest to upload zipped datasets from the Web UI. Grid supports uploading files in formats s/.zip, .tar or tar.gz/.zip, .tar and tar.gz/
The content of the archive is unzipped and unarchived when the datastore is created and presented for use in Session and Runs.
1
https://pl-flash-data.s3.amazonaws.com/imdb.zip
Copied!
new-datastore

Step 2: Start a new Run

Take a look at this file if you are curious about the model. https://github.com/gridai/grid-text-classification/blob/main/train.py
Paste the link to file in the New Run page.
Make sure to select the datastore created above. Note the mount directory. Make sure to add flags to your script.
new run full 1
Add the following flags to the script, then Run
1
--gpus 1 \
2
--train_file /datastores/imdb-ds/imdb/train.csv \
3
--valid_file /datastores/imdb-ds/imdb/valid.csv \
4
--test_file /datastores/imdb-ds/imdb/test.csv \
5
--max_epochs 1
Copied!
new run full 3

Step 3: Visualize Metrics

As the model starts to train, metrics appear in the metrics section, make sure to select the Experiments to see metrics.
Tensorboard is also accessible
train loss

Step 4: Download Artifacts

Artifacts are available to download as well. You can choose to train for many epochs, create multiple checkpoints.
artifacts

Bonus: Run in CLI

If you prefer to use CLI, use this command below
1
git clone https://github.com/gridai/grid-text-classification.git
2
cd grid-text-classification
Copied!
1
grid run \
2
--gpus 1 \
3
--instance_type 1_v100_16gb \
4
--datastore_name imdb-ds \
5
train.py \
6
--gpus 1 \
7
--train_file /datastores/imdb-ds/imdb/train.csv \
8
--valid_file /datastores/imdb-ds/imdb/valid.csv \
9
--test_file /datastores/imdb-ds/imdb/test.csv \
10
--max_epochs 1
Copied!
Last modified 3mo ago