How to get the test embeddings from output of fine-tuned model (tutorial)

**Is there a way to easily generate the embeddings of the test data from a fine-tuned model?**

Here's what I've tried:

I followed the tutorial on MRPC with these flags (all default except predict=true and export_dir=dir):

```
os.environ['TFHUB_CACHE_DIR'] = OUTPUT_DIR
!python -m albert.run_classifier \
  --data_dir="glue_data/" \
  --output_dir=$OUTPUT_DIR \
  --albert_hub_module_handle=$ALBERT_MODEL_HUB \
  --spm_model_file="from_tf_hub" \
  --do_train=True \
  --do_eval=True \
  --do_predict=True \
  --max_seq_length=512 \
  --optimizer=adamw \
  --task_name=$TASK \
  --warmup_step=200 \
  --learning_rate=2e-5 \
  --train_step=800 \
  --save_checkpoints_steps=100 \
  --train_batch_size=32 \
  --tpu_name=$TPU_ADDRESS \
  --use_tpu=True \
  --export_dir=$OUTPUT_DIR + "/saved_models/"
```

This gave me a `saved_model.pb` file, which I wanted to load in order to generate embeddings for the test data, in order to do some error analysis.

I tried running something similar to this code:

```
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
preprocessor = hub.KerasLayer(
    "http://tfhub.dev/tensorflow/albert_en_preprocess/3")
encoder_inputs = preprocessor(text_input)
encoder = hub.KerasLayer(
    "https://tfhub.dev/tensorflow/albert_en_base/3",
    trainable=True)
outputs = encoder(encoder_inputs)
pooled_output = outputs["pooled_output"]      # [batch_size, 768].
sequence_output = outputs["sequence_output"]  # [batch_size, seq_length, 768].

embedding_model = tf.keras.Model(text_input, pooled_output)
sentences = tf.constant(["hello", "hello"])
print(embedding_model(sentences))
```

This worked with the base model from TensorFlow Hub, but when I replaced the url with the location of my saved model folder (which also included assets/ and variables/), I got the following error:

```
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-38-6c11f4769dd0> in <module>()
      5     signature='tokens',
      6     signature_outputs_as_dict=True)
----> 7 encoder_inputs = preprocessor(text_input)
      8 encoder = hub.KerasLayer(
      9     "https://tfhub.dev/tensorflow/albert_en_base/3",

1 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs)
    690       except Exception as e:  # pylint:disable=broad-except
    691         if hasattr(e, 'ag_error_metadata'):
--> 692           raise e.ag_error_metadata.to_exception(e)
    693         else:
    694           raise

TypeError: Exception encountered when calling layer "keras_layer_7" (type KerasLayer).

in user code:

    File "/usr/local/lib/python3.7/dist-packages/tensorflow_hub/keras_layer.py", line 229, in call  *
        result = f()

    TypeError: pruned(input_ids, input_mask, segment_ids) takes 0 positional arguments, got 1.


Call arguments received:
  • inputs=tf.Tensor(shape=(None,), dtype=string)
  • training=False
```


This may come down to my limited knowledge of TensorFlow, but the albert code is giving me a `saved_model` which seems to be of a different format than other saved_models I've used. Can the saved model generated by the albert classifier be used in this way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the test embeddings from output of fine-tuned model (tutorial) #260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to get the test embeddings from output of fine-tuned model (tutorial) #260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions