Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when deserializing tfrecord's in TF 2.x: Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices #178

Open
dgoldenberg-audiomack opened this issue Dec 28, 2020 · 0 comments

Comments

@dgoldenberg-audiomack
Copy link

The lib doesn't seem to be working in the context of TensorFlow 2.x.

My environment:

  • AWS emr-5.31.0
  • TF=2.1.0, Spark=3.0.1
  • built the library with the following:
 mvn versions:set -DnewVersion=1.15.0
 mvn clean install -Dspark.version=3.0.1

The reading and the writing do not match. After persisting into AWS S3, I see that the serde is somehow mismatched, perhaps it's doing a TF 1.x compatible stuff and not TF 2.x ?

To reproduce:

  • execute in Spark, with --jars s3://<your jars location>/spark-tensorflow-connector_2.12-1.15.0.jar
  • use a Bootstrap action in EMR to get boto3 installed on the cluster; this worked for me:
#!/bin/bash
pip3 install --user boto3
  • the python tester program is attached
  • I used the small movielens dataset (see the attached movies.csv and ratings.csv)

Output

>> Ratings: <class 'tensorflow.python.data.ops.readers.TFRecordDatasetV2'>; size=100004
>> Movies: <class 'tensorflow.python.data.ops.readers.TFRecordDatasetV2'>; size=9125

(Deserialization doesn't seem to be working)

********************************************************************************
(b'\ns\n\x11\n\x08movie_id\x12\x05\x1a\x03\n\x01\x01\n#\n\x0bmovie_title'
 b'\x12\x14\n\x12\n\x10Toy Story (1995)\n9\n\x06genres\x12/\n-\n+Adventure|Anim'
 b'ation|Children|Comedy|Fantasy')
********************************************************************************

(b'\n|\n\x10\n\x07user_id\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08movie_id'
 b'\x12\x05\x1a\x03\n\x01\x1f\n)\n\x0bmovie_title\x12\x1a\n\x18\n\x16Dangerou'
 b's Minds (1995)\n\x12\n\x06rating\x12\x08\x12\x06\n\x04\x00\x00 @\n\x16\n\tti'
 b'mestamp\x12\t\x1a\x07\n\x05\xe8\xd0\x96\xd9\x04')
********************************************************************************

The error when running in the cluster:

Traceback (most recent call last):
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 300, in
main(sys.argv)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 50, in main
movies, test, train, unique_movie_titles, unique_user_ids = prepare_data(movies, ratings)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 155, in prepare_data
ratings = ratings.map(lambda x: {"movie_title": x["movie_title"], "user_id": x["user_id"]})
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1695, in map
return MapDataset(self, map_func, preserve_cardinality=True)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4045, in init
use_legacy_function=use_legacy_function)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3371, in init
self._function = wrapper_fn.get_concrete_function()
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2939, in get_concrete_function
*args, **kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2906, in _get_concrete_function_garbage_collected
graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3364, in wrapper_fn
ret = _wrapper_helper(*args)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3299, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 155, in
ratings = ratings.map(lambda x: {"movie_title": x["movie_title"], "user_id": x["user_id"]})
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 986, in _slice_helper
_check_index(s)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 865, in _check_index
raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx))
TypeError: Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices, got 'movie_title'

spark-tf-connector-serde-issue.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant