From 436aebe1d8fba0c3e8d8fee8126077b591406eec Mon Sep 17 00:00:00 2001
From: Aryan Utkarsh <106980972+Aryanutkarsh@users.noreply.github.com>
Date: Fri, 14 Apr 2023 23:31:21 +0530
Subject: [PATCH] Made README.md more Readable
---
README.md | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index c6770388..8798d11a 100644
--- a/README.md
+++ b/README.md
@@ -1,16 +1,17 @@
-ALBERT
-======
+
ALBERT
-***************New March 28, 2020 ***************
+
+
+
New March 28, 2020
Add a colab [tutorial](https://github.com/google-research/albert/blob/master/albert_glue_fine_tuning_tutorial.ipynb) to run fine-tuning for GLUE datasets.
-***************New January 7, 2020 ***************
+New January 7, 2020
+v2 TF-Hub models should be working now with TF 1.15, as we removed the native Einsum op from the graph. See updated TF-Hub links below.
-v2 TF-Hub models should be working now with TF 1.15, as we removed the
-native Einsum op from the graph. See updated TF-Hub links below.
+New December 30, 2019
-***************New December 30, 2019 ***************
+
Chinese models are released. We would like to thank [CLUE team ](https://github.com/CLUEbenchmark/CLUE) for providing the training data.
@@ -29,6 +30,7 @@ Version 2 of ALBERT models is released.
In this version, we apply 'no dropout', 'additional training data' and 'long training time' strategies to all models. We train ALBERT-base for 10M steps and other models for 3M steps.
The result comparison to the v1 models is as followings:
+
| | Average | SQuAD1.1 | SQuAD2.0 | MNLI | SST-2 | RACE |
|----------------|----------|----------|----------|----------|----------|----------|
@@ -43,6 +45,8 @@ The result comparison to the v1 models is as followings:
|ALBERT-xlarge |85.5 |92.5/86.1 | 86.1/83.1|86.4 |92.4 | 74.8 |
|ALBERT-xxlarge |91.0 |94.8/89.3 | 90.2/87.4|90.8 |96.9 | 86.5 |
+
+
The comparison shows that for ALBERT-base, ALBERT-large, and ALBERT-xlarge, v2 is much better than v1, indicating the importance of applying the above three strategies. On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet. For v2, we simply adopt the parameters from v1 except for RACE, where we use a learning rate of 1e-5 and 0 [ALBERT DR](https://arxiv.org/pdf/1909.11942.pdf) (dropout rate for ALBERT in finetuning). The original (v1) RACE hyperparameter will cause model divergence for v2 models. Given that the downstream tasks are sensitive to the fine-tuning hyperparameters, we should be careful about so called slight improvements.
ALBERT is "A Lite" version of BERT, a popular unsupervised language
@@ -66,6 +70,7 @@ Results
Performance of ALBERT on GLUE benchmark results using a single-model setup on
dev:
+
| Models | MNLI | QNLI | QQP | RTE | SST | MRPC | CoLA | STS |
|-------------------|----------|----------|----------|----------|----------|----------|----------|----------|
@@ -74,9 +79,11 @@ dev:
| RoBERTa-large | 90.2 | 94.7 | **92.2** | 86.6 | 96.4 | **90.9** | 68.0 | 92.4 |
| ALBERT (1M) | 90.4 | 95.2 | 92.0 | 88.1 | 96.8 | 90.2 | 68.7 | 92.7 |
| ALBERT (1.5M) | **90.8** | **95.3** | **92.2** | **89.2** | **96.9** | **90.9** | **71.4** | **93.0** |
+
Performance of ALBERT-xxl on SQuaD and RACE benchmarks using a single-model
setup:
+
|Models | SQuAD1.1 dev | SQuAD2.0 dev | SQuAD2.0 test | RACE test (Middle/High) |
|--------------------------|---------------|---------------|---------------|-------------------------|
@@ -87,7 +94,7 @@ setup:
|XLNet + SG-Net Verifier++ | - | - | 90.1/87.2 | - |
|ALBERT (1M) | 94.8/89.2 | 89.9/87.2 | - | 86.0 (88.2/85.1) |
|ALBERT (1.5M) | **94.8/89.3** | **90.2/87.4** | **90.9/88.1** | **86.5 (89.0/85.5)** |
-
+
Pre-trained Models
==================