BytesList with length 0 or 1 is inferred to have StringType instead of ArrayType

If `BytesList` in TFRecords has always length of 0 or 1, then the feature is inferred to have `StringType` instead of `ArrayType`. Is there a reason for this behavior? With this behavior you can write a DataFrame as TFRecords, but you can't read those TFRecords back to a DataFrame. Zero length `BytesList` is valid in Tensorflow.

Below is the implementation of the `parseBytesList`  from
https://github.com/tensorflow/ecosystem/blob/master/spark/spark-tensorflow-connector/src/main/scala/org/tensorflow/spark/datasources/tfrecords/TensorFlowInferSchema.scala#L144:

```
  private def parseBytesList(feature: Feature): DataType = {
    val length = feature.getBytesList.getValueCount

    if (length == 0) {
      null
    }
    else if (length > 1) {
      ArrayType(StringType)
    }
    else {
      StringType
    }
  }
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BytesList with length 0 or 1 is inferred to have StringType instead of ArrayType #159

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BytesList with length 0 or 1 is inferred to have StringType instead of ArrayType #159

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions