Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,9 @@ Use `run_pretraining.py` to pre-train an ELECTRA model. It has the following arg
If training is halted, re-running the `run_pretraining.py` with the same arguments will continue the training where it left off.

You can continue pre-training from the released ELECTRA checkpoints by
1. Setting the model-name to point to a downloaded model (e.g., `--model-name electra_small` if you downloaded weights to `$DATA_DIR/electra_small`).
2. Setting `num_train_steps` by (for example) adding `"num_train_steps": 4010000` to the `--hparams`. This will continue training the small model for 10000 more steps (it has already been trained for 4e6 steps).
3. Increase the learning rate to account for the linear learning rate decay. For example, to start with a learning rate of 2e-4 you should set the `learning_rate` hparam to 2e-4 * (4e6 + 10000) / 10000.
4. For ELECTRA-Small, you also need to specifiy `"generator_hidden_size": 1.0` in the `hparams` because we did not use a small generator for that model.
1. Setting the model-name to point to a downloaded model (e.g., `--model-name electra_small`).
2. Specify an initial checkpoint (usually from a released ELECTRA checkpoint, e.g., `--init_checkpoint pretrained_models_dir/electra_small`)
3. For ELECTRA-Small, you also need to specifiy `"generator_hidden_size": 1.0` in the `hparams` because we did not use a small generator for that model.

## Quickstart: Pre-train a small ELECTRA model.
These instructions pre-train a small ELECTRA model (12 layers, 256 hidden size). Unfortunately, the data we used in the paper is not publicly available, so we will use the [OpenWebTextCorpus](https://skylion007.github.io/OpenWebTextCorpus/) released by Aaron Gokaslan and Vanya Cohen instead. The fully-trained model (~4 days on a v100 GPU) should perform roughly in between [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) and BERT-Base in terms of GLUE performance. By default the model is trained on length-128 sequences, so it is not suitable for running on question answering. See the "expected results" section below for more details on model performance.
Expand Down
1 change: 1 addition & 0 deletions configure_pretraining.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ def __init__(self, model_name, data_dir, **kwargs):
self.num_warmup_steps = 10000

# training settings
self.init_checkpoint = None
self.iterations_per_loop = 200
self.save_checkpoints_steps = 1000
self.num_train_steps = 1000000
Expand Down
31 changes: 31 additions & 0 deletions run_pretraining.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,35 @@ def model_fn(features, labels, mode, params):
model = PretrainingModel(config, features,
mode == tf.estimator.ModeKeys.TRAIN)
utils.log("Model is built!")

# Load pre-trained weights from checkpoint
tvars = tf.trainable_variables()

init_checkpoint = tf.train.latest_checkpoint(config.init_checkpoint)
utils.log("Using checkpoint", init_checkpoint)
tvars = tf.trainable_variables()

initialized_variable_names = {}
scaffold_fn = None
if init_checkpoint:
assignment_map, initialized_variable_names = modeling.get_assignment_map_from_checkpoint(
tvars, init_checkpoint)
if config.use_tpu:
def tpu_scaffold():
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
return tf.train.Scaffold()
scaffold_fn = tpu_scaffold
else:
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)

utils.log("**** Trainable Variables ****")
for var in tvars:
init_string = ""
if var.name in initialized_variable_names:
init_string = ", *INIT_FROM_CKPT*"
utils.log(" name = %s, shape = %s%s", var.name, var.shape,
init_string)

if mode == tf.estimator.ModeKeys.TRAIN:
train_op = optimization.create_optimizer(
model.total_loss, config.learning_rate, config.num_train_steps,
Expand All @@ -284,6 +313,7 @@ def model_fn(features, labels, mode, params):
mode=mode,
loss=model.total_loss,
train_op=train_op,
scaffold_fn=scaffold_fn,
training_hooks=[training_utils.ETAHook(
{} if config.use_tpu else dict(loss=model.total_loss),
config.num_train_steps, config.iterations_per_loop,
Expand All @@ -293,6 +323,7 @@ def model_fn(features, labels, mode, params):
output_spec = tf.estimator.tpu.TPUEstimatorSpec(
mode=mode,
loss=model.total_loss,
scaffold_fn=scaffold_fn,
eval_metrics=model.eval_metrics,
evaluation_hooks=[training_utils.ETAHook(
{} if config.use_tpu else dict(loss=model.total_loss),
Expand Down