Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,9 @@ ds = ds.map(preprocess, remove_columns=ds.column_names)
5. Preprocess and tokenize into format the model uses.
</Callouts>

### (Optional) Upload Dataset into S3 Storage
### (Optional) Upload Dataset into S3-Compatible Object Storage

If you wish to upload datasets into S3, you can run those codes in `JupyterLab`.
If you want to upload datasets into S3-compatible object storage, you can run the following code in `JupyterLab`. Alauda AI supports S3-compatible storage access, and in typical product deployments the storage implementation is Ceph object storage, so you can use the standard `boto3` client.

```python
import os
Expand All @@ -104,7 +104,7 @@ config = TransferConfig(

for root, dirs, files in os.walk(local_folder):
for filename in files:
local_path = os path.join(root, filename)
local_path = os.path.join(root, filename)
relative_path = os.path.relpath(local_path, local_folder)
s3_key = f"ultrachat_200k/{relative_path.replace(os.sep, '/')}"
s3.upload_file(local_path, bucket_name, s3_key, Config=config)
Expand All @@ -116,9 +116,9 @@ for root, dirs, files in os.walk(local_folder):
2. Configure multipart upload with 100 MB chunks and a maximum of 10 concurrent threads.
</Callouts>

### (Optional) Use Dataset in S3 Storage
### (Optional) Use Dataset from S3-Compatible Object Storage

If you wish to use datasets from S3, you can first install the `s3fs` tool and then modify the dataset loading section in the example by following the code below.
If you want to use datasets stored in S3-compatible object storage, first install the `s3fs` tool and then modify the dataset loading section in the example as shown below. In Alauda AI environments, this S3-compatible storage is typically backed by Ceph object storage.

```bash
pip install s3fs -i https://pypi.tuna.tsinghua.edu.cn/simple
Expand All @@ -135,7 +135,7 @@ storage_options = {
"key": "07Apples@",
"secret": "O7Apples@",
"client_kwargs": {
"endpoint_url": "http://minio.minio-system.svc.cluster.local:80" #[!code callout]
"endpoint_url": "https://ceph-obj.example.com" #[!code callout]
}
}

Expand All @@ -149,7 +149,7 @@ ds = load_dataset(

<Callouts>
1. Set environment variables (as a backup, some underlying components will use them).
2. Define storage configuration; you must explicitly specify the endpoint_url to connect to MinIO.
2. Define storage configuration; you must explicitly specify the `endpoint_url` for your S3-compatible object storage service, such as a Ceph object storage endpoint.
3. If the dataset is split, this is equivalent to `split="train_sft"` in the example.
</Callouts>

Expand Down
17 changes: 9 additions & 8 deletions docs/en/workbench/how_to/create_workbench.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ The following images are available out of the box:
#### Multi-architecture images (`x86_64` and `arm64`)

| Image name | Description | Main packages |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Minimal Python**<br />[alauda-workbench-jupyter-minimal-cpu-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-jupyter-minimal-cpu-py312-ubi9) | Use this image if you want a lightweight Jupyter workbench and plan to install project-specific packages yourself. | `Python 3.12`<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`JupyterLab Git 0.52.0`<br />`nbdime 4.0.4`<br />`nbgitpuller 1.2.2` |
| **Standard Data Science**<br />[alauda-workbench-jupyter-datascience-cpu-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-jupyter-datascience-cpu-py312-ubi9) | Use this image for general data science work that does not require a framework-specific GPU image. | `Python 3.12`<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`NumPy 2.4.3`<br />`pandas 2.3.3`<br />`SciPy 1.16.3`<br />`scikit-learn 1.8.0`<br />`Matplotlib 3.10.8`<br />`Plotly 6.5.2`<br />`KFP 2.15.2`<br />`Kubeflow Training 1.9.3`<br />`Feast 0.60.0`<br />`CodeFlare SDK 0.35.0`<br />`ODH Elyra 4.3.2` |
| **code-server**<br />[alauda-workbench-codeserver-datascience-cpu-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-codeserver-datascience-cpu-py312-ubi9) | Use this image if you prefer a VS Code-like IDE for data science development. Elyra-based pipelines are not available with this image. | `Python 3.12`<br />`code-server 4.106.3`<br />`Python extension 2026.0.0`<br />`Jupyter extension 2025.9.1`<br />`ipykernel 7.2.0`<br />`debugpy 1.8.20`<br />`NumPy 2.4.3`<br />`pandas 2.3.3`<br />`scikit-learn 1.8.0`<br />`SciPy 1.16.3`<br />`KFP 2.15.2`<br />`Feast 0.60.0`<br />`virtualenv 21.1.0`<br />`ripgrep 15.0.0` |
Expand All @@ -96,7 +96,7 @@ The following images are available on Docker Hub but are **not built into the pl
These images are intended for `x86_64` nodes with NVIDIA GPU support.

| Image name | Description | Main packages |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **TensorFlow**<br />[alaudadockerhub/odh-workbench-jupyter-tensorflow-cuda-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/odh-workbench-jupyter-tensorflow-cuda-py312-ubi9) | Use this image for TensorFlow model development and training on NVIDIA GPUs. | `Python 3.12`<br />`CUDA base image 12.9`<br />`TensorFlow 2.20.0+redhat`<br />`TensorBoard 2.20.0`<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`NumPy 2.4.3`<br />`pandas 2.3.3` |
| **PyTorch LLM Compressor**<br />[alaudadockerhub/odh-workbench-jupyter-pytorch-llmcompressor-cuda-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/odh-workbench-jupyter-pytorch-llmcompressor-cuda-py312-ubi9) | Use this image for PyTorch-based LLM compression and optimization on NVIDIA GPUs. | `Python 3.12`<br />`CUDA base image 12.9`<br />`PyTorch 2.9.1`<br />`torchvision 0.24.1`<br />`TensorBoard 2.20.0`<br />`llmcompressor 0.9.0.2`<br />`transformers 4.57.3`<br />`datasets 4.4.1`<br />`accelerate 1.12.0`<br />`compressed-tensors 0.13.0`<br />`nvidia-ml-py 13.590.44`<br />`lm-eval 0.4.11` |
| **PyTorch**<br />[alaudadockerhub/odh-workbench-jupyter-pytorch-cuda-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/odh-workbench-jupyter-pytorch-cuda-py312-ubi9) | Use this image for PyTorch model development and training on NVIDIA GPUs. | `Python 3.12`<br />`CUDA base image 12.9`<br />`PyTorch 2.9.1`<br />`torchvision 0.24.1`<br />`TensorBoard 2.20.0`<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`onnxscript 0.6.2` |
Expand All @@ -106,17 +106,18 @@ These images are intended for `x86_64` nodes with NVIDIA GPU support.

These images are intended for `arm64` nodes with Ascend NPU support.

| Image name | Description | Main packages |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **CANN Minimal Python**<br />[alauda-workbench-jupyter-minimal-cann-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-jupyter-minimal-cann-py312-ubi9) | Use this image if you need a lightweight Jupyter base image with Ascend CANN support. | `Python 3.12`<br />`CANN 8.5.0`<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`JupyterLab Git 0.51.4`<br />`nbdime 4.0.4`<br />`nbgitpuller 1.2.2` |
| **PyTorch CANN**<br />[alauda-workbench-jupyter-pytorch-cann-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-jupyter-pytorch-cann-py312-ubi9) | Use this image for PyTorch model development and training on Ascend NPUs. | `Python 3.12`<br />`CANN 8.5.0`<br />`PyTorch 2.9.0`<br />`torch_npu 2.9.0` (Ascend release `7.3.0`)<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`TensorBoard 2.20.0`<br />`Ray 2.54.0`<br />`onnxscript 0.6.2`<br />`NumPy 2.4.3`<br />`pandas 2.3.3`<br />`scikit-learn 1.8.0`<br />`SciPy 1.16.3`<br />`KFP 2.15.2`<br />`Feast 0.60.0` |
| Image name | Description | Main packages |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **CANN Minimal Python**<br />[alauda-workbench-jupyter-minimal-cann-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-jupyter-minimal-cann-py312-ubi9) | Use this image if you need a lightweight Jupyter base image with Ascend CANN support. | `Python 3.12`<br />`CANN 8.5.0`<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`JupyterLab Git 0.51.4`<br />`nbdime 4.0.4`<br />`nbgitpuller 1.2.2` |
| **PyTorch CANN**<br />[alauda-workbench-jupyter-pytorch-cann-py312-ubi9](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-jupyter-pytorch-cann-py312-ubi9) | Use this image for PyTorch model development and training on Ascend NPUs. | `Python 3.12`<br />`CANN 8.5.0`<br />`PyTorch 2.9.0`<br />`torch_npu 2.9.0` (Ascend release `7.3.0`)<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`TensorBoard 2.20.0`<br />`Ray 2.54.0`<br />`onnxscript 0.6.2`<br />`NumPy 2.4.3`<br />`pandas 2.3.3`<br />`scikit-learn 1.8.0`<br />`SciPy 1.16.3`<br />`KFP 2.15.2`<br />`Feast 0.60.0` |
| **MindSpore CANN**<br />[docker.io/alaudadockerhub/alauda-workbench-jupyter-mindspore-cann-py312-ubi9:v0.1.7](https://hub.docker.com/r/alaudadockerhub/alauda-workbench-jupyter-mindspore-cann-py312-ubi9) | Use this image for MindSpore model development, checkpoint conversion, and training on Ascend NPUs. | `Python 3.12`<br />`CANN 8.5.0`<br />`MindSpore 2.8.0`<br />`JupyterLab 4.5.6`<br />`Jupyter Server 2.17.0`<br />`TensorBoard 2.20.0`<br />`ODH Elyra 4.3.2`<br />`onnxscript 0.6.2`<br />`KFP 2.15.2`<br />`Kubeflow Training 1.9.3`<br />`pandas 2.3.3`<br />`scikit-learn 1.8.0`<br />`SciPy 1.16.3` |

To use an additional image, first synchronize it to your own image registry. You can do this with a tool such as `skopeo`, or by using the script described in the next section.

## Docker Hub Image Synchronization Script Guide

[sync-from-dockerhub.sh](/sync-from-dockerhub.sh) is an automated tool for synchronizing selected Docker Hub images, especially very large images, to a private image registry such as Harbor.
Large images are more likely to encounter Out-Of-Memory (OOM) or timeout failures during direct transfer because of network fluctuations. To improve reliability, the script uses a relay workflow: **pull locally -> export as a tar archive -> push the tar archive to the target registry**. It also cleans up temporary files automatically when the task completes or exits unexpectedly.
Large images are more likely to encounter Out-Of-Memory (OOM) or timeout failures during direct transfer because of network fluctuations. To improve reliability, the script uses a relay workflow: **pull locally export as a tar archive push the tar archive to the target registry**. It also cleans up temporary files automatically when the task completes or exits unexpectedly.

### Script Prerequisites

Expand All @@ -133,7 +134,7 @@ The script executes synchronization by reading environment variables, providing
### Required Parameters (Target Private Registry Configuration)

| Environment Variable | Description | Example Value |
| :------------------- | :-------------------------------------------------------------------- | :----------------------- |
|:---------------------|:----------------------------------------------------------------------|:-------------------------|
| `TARGET_REGISTRY` | Address of the target private image registry | `build-harbor.alauda.cn` |
| `TARGET_PROJECT` | Specific project/namespace in the target registry to store the images | `mlops/workbench-images` |
| `TARGET_USER` | Username for logging into the target registry | `admin` |
Expand Down
Loading