-
Notifications
You must be signed in to change notification settings - Fork 313
Open
Labels
dj:coreissues/PRs about the core functions of Data-Juicerissues/PRs about the core functions of Data-Juicerdj:datasetissues/PRs about the dj-datasetissues/PRs about the dj-datasetquestionFurther information is requestedFurther information is requested
Description
Before Asking 在提问之前
-
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking 先搜索,再提问
Question
- 在对数据处理时,可否将HDFS作为数据源输出和输出路径:类似
dataset_path: hdfs://mnt/dst/the-pile-philpaper-refine-result.jsonl
export_path: hdfs:/mnt/dst/processed_demo/
- 如果数据存在iceberg,如何能够使用data-juicer进行清洗
Additional 额外信息
No response
Metadata
Metadata
Assignees
Labels
dj:coreissues/PRs about the core functions of Data-Juicerissues/PRs about the core functions of Data-Juicerdj:datasetissues/PRs about the dj-datasetissues/PRs about the dj-datasetquestionFurther information is requestedFurther information is requested