Skip to content

[Bug] [Transform-V2] XxxModel#dimension return number of vectors not number of dimensions #9643

@loupipalien

Description

@loupipalien

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

DoubaoModel#dimension returns the number of vectors causing dimensions of vector fields not expected

SeaTunnel Version

dev/2.3.12-SNAPSHOT

SeaTunnel Config

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  S3File {
    path = "/seatunnel/test_csv_data.csv"
    bucket = "s3a://lthen"
    fs.s3a.endpoint="tos-s3-cn-beijing.volces.com"
    fs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
    file_format_type = "csv"
    access_key="xxx"
    secret_key="xxx",
    csv_use_header_line = true,
    field_delimiter = ","
    schema={
        fields {
            id = int
            code = int
            data = string
            success = boolean
        },
        primaryKey {
            name = "id"
            columnNames = ["id"]
        }
    }
  }
}

transform {
  Embedding {
    model_provider = "DOUBAO"
    model = "doubao-embedding-text-240715"
    api_key = "xxx"
    secret_key = "xxx"
    vectorization_fields {
        data_vector = data
    },
    custom_config={
      custom_response_parse = "$.data[*].embedding"
      custom_request_headers = {
          "Content-Type"= "application/json"
          "Authorization"= "Bearer ${api_key}"
      }
      custom_request_body ={
          model = "${model}"
          input = ["${input}"]
      }
    }
  }
}

sink {
  Milvus {
    url = "http://127.0.0.1:19530"
    token = "root:xxx"
    batch_size = 1000
    enable_auto_id=true,
    schema_save_mode=RECREATE_SCHEMA
  }
}

Running Command

bin/seatunnel.sh -c jobs/s32milvus_embedding.conf -m local

Error Exception

Caused by: org.apache.seatunnel.engine.common.exception.JobException: org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[API-09], ErrorDescription:[Handle save mode failed]
	at org.apache.seatunnel.engine.server.master.JobMaster.handleSaveMode(JobMaster.java:573)
	at org.apache.seatunnel.engine.server.master.JobMaster.lambda$init$1(JobMaster.java:273)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.apache.seatunnel.engine.server.master.JobMaster.init(JobMaster.java:267)
	at org.apache.seatunnel.engine.server.CoordinatorService.lambda$submitJob$6(CoordinatorService.java:656)
	at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.seatunnel.connectors.seatunnel.milvus.exception.MilvusConnectorException: ErrorCode:[MILVUS-14], ErrorDescription:[Create collection error]
	at org.apache.seatunnel.connectors.seatunnel.milvus.catalog.MilvusCatalog.createTableInternal(MilvusCatalog.java:308)
	at org.apache.seatunnel.connectors.seatunnel.milvus.catalog.MilvusCatalog.createTable(MilvusCatalog.java:200)
	at org.apache.seatunnel.api.sink.DefaultSaveModeHandler.createTable(DefaultSaveModeHandler.java:213)
	at org.apache.seatunnel.api.sink.DefaultSaveModeHandler.recreateSchema(DefaultSaveModeHandler.java:134)
	at org.apache.seatunnel.api.sink.DefaultSaveModeHandler.handleSchemaSaveMode(DefaultSaveModeHandler.java:85)
	at org.apache.seatunnel.api.sink.SaveModeHandler.handleSaveMode(SaveModeHandler.java:42)
	at org.apache.seatunnel.api.sink.SaveModeExecuteWrapper.execute(SaveModeExecuteWrapper.java:36)
	at org.apache.seatunnel.engine.server.master.JobMaster.handleSaveMode(JobMaster.java:568)
	... 20 more
Caused by: org.apache.seatunnel.connectors.seatunnel.milvus.exception.MilvusConnectorException: ErrorCode:[MILVUS-14], ErrorDescription:[Create collection error] - invalid dimension: 1. should be in range 2 ~ 32768
	at org.apache.seatunnel.connectors.seatunnel.milvus.catalog.MilvusCatalog.createTableInternal(MilvusCatalog.java:299)
	... 27 more

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions