Skip to content

Commit f8854ef

Browse files
committed
update readme
1 parent ca5de7c commit f8854ef

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -305,8 +305,8 @@ python tools/process_data.py --config ./demos/process_video_on_ray/configs/demo.
305305
- To run data processing across multiple machines, it is necessary to ensure that all distributed nodes can access the corresponding data paths (for example, by mounting the respective data paths on a file-sharing system such as NAS).
306306
- The deduplicator operators for RAY mode are different from the single-machine version, and all those operators are prefixed with `ray`, e.g. `ray_video_deduplicator` and `ray_document_deduplicator`. Those operators also rely on a [Redis](https://redis.io/) instance. So in addition to starting the RAY cluster, you also need to setup your Redis instance in advance and provide `host` and `port` of your Redis instance in configuration.
307307
308-
> Users can also opt not to use RAY and instead split the dataset to run on a cluster with [Slurm](https://slurm.schedmd.com/) / [Aliyun PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc). In this case, please use the default Data-Juicer without RAY.
309-
308+
> Users can also opt not to use RAY and instead split the dataset to run on a cluster with [Slurm](https://slurm.schedmd.com/). In this case, please use the default Data-Juicer without RAY.
309+
> [Aliyun PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc) supports the RAY framework, Slurm framework, etc. Users can directly create RAY jobs and Slurm jobs on the DLC cluster.
310310
311311
### Data Analysis
312312
- Run `analyze_data.py` tool or `dj-analyze` command line tool with your config as the argument to analyze your dataset.

README_ZH.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,8 @@ python tools/process_data.py --config ./demos/process_video_on_ray/configs/demo.
282282
- 如果需要在多机上使用RAY执行数据处理,需要确保所有节点都可以访问对应的数据路径,即将对应的数据路径挂载在共享文件系统(如NAS)中。
283283
- RAY 模式下的去重算子与单机版本不同,所有 RAY 模式下的去重算子名称都以 `ray` 作为前缀,例如 `ray_video_deduplicator``ray_document_deduplicator`。这些去重算子依赖于 [Redis](https://redis.io/) 实例.因此使用前除启动 RAY 集群外还需要启动 Redis 实例,并在对应的配置文件中填写 Redis 实例的 `host``port`
284284

285-
> 用户也可以不使用 RAY,拆分数据集后使用 [Slurm](https://slurm.schedmd.com/) / [阿里云 PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc) 在集群上运行,此时使用不包含 RAY 的原版 Data-Juicer 即可。
285+
> 用户也可以不使用 RAY,拆分数据集后使用 [Slurm](https://slurm.schedmd.com/) 在集群上运行,此时使用不包含 RAY 的原版 Data-Juicer 即可。
286+
> [阿里云 PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc) 支持 RAY 框架、Slurm 框架等,用户可以直接在DLC集群上创建 RAY 作业 和 Slurm 作业。
286287

287288
### 数据分析
288289

0 commit comments

Comments
 (0)