RuntimeError in ptx_get_version() without Step 7.
nvcc is missing.
Open your WSL2 terminal and confirm the GPU and CUDA toolkit are visible:
# Must show your GPU + driver version nvidia-smi # Should show: release 12.8 nvcc --version # Confirm Python 3.10 python3.10 --version
sudo apt install -y nvidia-cuda-toolkitOr use NVIDIA's official installer at developer.nvidia.com/cuda-downloads (Linux → x86_64 → Ubuntu → 22.04 → WSL-Ubuntu).
sudo apt install -y python3.10 python3.10-venv python3.10-dev
demo_data/ to arrive as corrupted LFS pointer stubs instead of actual data.
sudo apt-get update
sudo apt-get install -y git git-lfs ffmpeg curl build-essential
# Activate LFS hooks globally
git lfs install
GR00T uses git submodules for external dependencies. The --recurse-submodules flag fetches everything in one shot:
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T cd Isaac-GR00T
git submodule update --init --recursive
error: failed to create directory .venv: Permission denied (os error 13)
Check ownership and fix if needed:
# Check who owns the directory ls -la .. # If owned by root, take ownership sudo chown -R $USER:$USER .
drwxr-xr-x user user Isaac-GR00T ← correct
GR00T uses uv for fast, reproducible dependency management:
# Install uv curl -LsSf https://astral.sh/uv/install.sh | sh source $HOME/.local/bin/env # Create venv and install all dependencies uv sync --python 3.10
CUDA_HOME is unset — deepspeed needs this to compile CUDA extensions at runtime.
export CUDA_HOME=/usr/local/cuda echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc source ~/.bashrc
uv run bash scripts/deployment/dgpu/install_deps.sh
RuntimeError in Triton's ptx_get_version()Triton 3.3.1 (pinned by PyTorch 2.7) does not recognise Blackwell GPU architecture sm_120.
uv run bash scripts/patch_triton_cuda13.sh
Uninstalled 1 package in 2ms · Installed 1 package in 8ms
Patched .../triton/backends/nvidia/compiler.py to support CUDA 13.x
Installed triton_cuda13_patch.pth (runtime monkey-patch, survives uv reinstalls)
TORCH_COMPILE=0 in your environment.
uv run python -c "import gr00t; print('GR00T installed successfully')"
Optionally run zero-shot inference on the included demo dataset (downloads ~6GB base model on first run):
uv run python scripts/deployment/standalone_inference_script.py \
--model-path nvidia/GR00T-N1.7-3B \
--dataset-path demo_data/droid_sample \
--embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
--traj-ids 1 2 \
--inference-mode pytorch \
--action-horizon 8
Replace dataset path and modality config with your own. Demo below uses the included SO100 5-episode dataset as a smoke test:
CUDA_VISIBLE_DEVICES=0 uv run python \
gr00t/experiment/launch_finetune.py \
--base-model-path nvidia/GR00T-N1.7-3B \
--dataset-path demo_data/cube_to_bowl_5 \
--embodiment-tag NEW_EMBODIMENT \
--modality-config-path examples/SO100/so100_config.py \
--num-gpus 1 \
--output-dir /tmp/test_finetune \
--max-steps 5000 \
--global-batch-size 8 \
--gradient-accumulation-steps 4 \
--num-shards-per-epoch 10 \
--save-only-model \
--dataloader-num-workers 2
--global-batch-size 4 --gradient-accumulation-steps 8. Install bitsandbytes for 8-bit Adam: pip install bitsandbytes
--modality-config-path. See getting_started/finetune_new_embodiment.md for the full data format spec.
| Path | Purpose |
|---|---|
gr00t/experiment/launch_finetune.py | Main fine-tuning entry point |
examples/SO100/so100_config.py | Example modality config for a custom embodiment |
getting_started/finetune_new_embodiment.md | Full new-embodiment data prep tutorial |
scripts/patch_triton_cuda13.sh | Triton sm_ version patch — required for RTX 5090 |
scripts/deployment/dgpu/install_deps.sh | Sets CUDA_HOME and GPU deps |
gr00t/eval/open_loop_eval.py | Checkpoint validation |
demo_data/cube_to_bowl_5/ | 5-episode SO100 demo dataset |
gr00t/configs/finetune_config.py | Hyperparameters & state_dropout_prob |
--global-batch-size 8 with --gradient-accumulation-steps 4 — effective batch 32, safe on 32GB VRAM--num-shards-per-epoch 10 and --save-only-model if hitting OOM — these are the key memory flags available in the launcher--state_dropout_prob defaults to 0.2; lower it if your task depends heavily on proprioceptive state--use-wandb to enable Weights & Biases logging during trainingtorch.compile still fails post-patch, set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True before runningGPU driver · Docker
GR00T N1.7 · /workspace/.venv · port 5555
repo · datasets · checkpoints
-v. The Docker image contains only the installed Python environment — all code, datasets, and checkpoints stay on your host disk and survive container restarts.
/workspace/.venv inside the image. If you mount a volume over /workspace, the venv gets hidden. Mount your repo to /data instead and the venv stays intact.
# Install Docker curl -fsSL https://get.docker.com | sh sudo usermod -aG docker $USER newgrp docker # Install NVIDIA Container Toolkit curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker # Verify GPU is visible to Docker docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
sudo apt install -y git-lfs && git lfs install
the --mount option requires BuildKit
# Install buildx plugin mkdir -p ~/.docker/cli-plugins curl -SL https://github.com/docker/buildx/releases/download/v0.17.1/buildx-v0.17.1.linux-amd64 \ -o ~/.docker/cli-plugins/docker-buildx chmod +x ~/.docker/cli-plugins/docker-buildx # Enable BuildKit permanently mkdir -p ~/.docker cat > ~/.docker/config.json <<'EOF' { "features": { "buildkit": "true" } } EOF # Verify docker buildx version
DOCKER_BUILDKIT=1 docker build ...
Clone on the host. The repo will be mounted into the container at /data so edits are live without rebuilding.
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T cd Isaac-GR00T
Run from the repo root. The existing docker/Dockerfile in the GR00T repo is used directly — no custom Dockerfile needed.
cd Isaac-GR00T DOCKER_BUILDKIT=1 docker build \ -f docker/Dockerfile \ -t grootn17:latest \ .
nvidia/cuda:12.8.0-devel-ubuntu22.04, installs system deps (ffmpeg, git-lfs), installs uv, syncs all Python dependencies into /workspace/.venv, and installs the gr00t package itself. Takes 15–25 minutes on first run.
torch.compile via Triton will fail on Blackwell inside the container the same way as WSL2. Use --inference-mode pytorch for inference. Fine-tuning OOM can be managed with batch size and gradient accumulation flags.
docker run -it --gpus all \ --ipc=host \ --shm-size=16g \ -v /path/to/Isaac-GR00T:/data \ -v /path/to/your-datasets:/workspace/datasets \ -v /path/to/your-models:/workspace/models \ -p 5555:5555 \ --name grootn17 \ grootn17:latest
-v /path/to/Isaac-GR00T:/data — your cloned repo, live-editable-v /path/to/datasets:/workspace/datasets — training datasets-v /path/to/models:/workspace/models — fine-tuned checkpoints outputThe venv lives at
/workspace/.venv inside the image — untouched by mounts.
-v argument in quotes:-v "/path/with spaces/Isaac-GR00T:/data"
\. If you get command not found errors on the flag lines, run it as a single line instead.
# Activate venv source /workspace/.venv/bin/activate # Make it auto-activate on every shell entry echo "source /workspace/.venv/bin/activate" >> ~/.bashrc
docker start -ai grootn17 — the venv activates automatically after the bashrc line above.
# GPU check python -c "import torch; print(torch.cuda.get_device_name(0)); print('CUDA:', torch.version.cuda)" # GR00T check python -c "import gr00t; print('GR00T OK')" # Network / HuggingFace reachable python -c "import requests; print(requests.get('https://huggingface.co').status_code)"
NVIDIA GeForce RTX 5090
CUDA: 12.8
GR00T OK
200
The base model (~6 GB) downloads automatically from HuggingFace on first run:
cd /data
python scripts/deployment/standalone_inference_script.py \
--model-path nvidia/GR00T-N1.7-3B \
--dataset-path demo_data/droid_sample \
--embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
--traj-ids 1 2 \
--inference-mode pytorch \
--action-horizon 8
python gr00t/eval/run_gr00t_server.py --model-path nvidia/GR00T-N1.7-3B --embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT --device cuda:0 --host 0.0.0.0 --port 5555
With your dataset mounted at /workspace/datasets and output going to /workspace/models (both on host disk), fine-tuning checkpoints persist across container restarts.
cd /data # Copy your modality config into the repo mkdir -p examples/MY_EMBODIMENT # Run fine-tuning CUDA_VISIBLE_DEVICES=0 \ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ python gr00t/experiment/launch_finetune.py \ --base-model-path nvidia/GR00T-N1.7-3B \ --dataset-path /workspace/datasets/your-dataset \ --embodiment-tag NEW_EMBODIMENT \ --modality-config-path examples/MY_EMBODIMENT/my_config.py \ --num-gpus 1 \ --output-dir /workspace/models/grootn17_finetuned \ --max-steps 5000 \ --global-batch-size 8 \ --gradient-accumulation-steps 4 \ --num-shards-per-epoch 10 \ --save-only-model \ --dataloader-num-workers 2
The VLM backbone is frozen. Only the diffusion action head and projector are trained — fits on 32GB VRAM with the settings above.
--global-batch-size 4 --gradient-accumulation-steps 8. Or add --no-tune-projector to reduce trainable params to ~1.09B.
After training, start the server with your checkpoint:
python gr00t/eval/run_gr00t_server.py \
--model-path /workspace/models/grootn17_finetuned \
--embodiment-tag NEW_EMBODIMENT \
--device cuda:0 \
--host 0.0.0.0 \
--port 5555
# Commit running container state → image docker commit grootn17 grootn17:latest # Save image to disk (gzip compressed) mkdir -p /path/to/backups docker save grootn17:latest | gzip > /path/to/backups/grootn17.tar.gz # Restore later docker load < /path/to/backups/grootn17.tar.gz
docker start -ai grootn17
docker rm old container → docker run from new image with the same volume mounts and flags.
# 1. Commit current state docker commit grootn17 grootn17:latest # 2. Remove old container docker rm grootn17 # 3. Recreate from updated image docker run -it --gpus all \ --ipc=host --shm-size=16g \ -v /path/to/Isaac-GR00T:/data \ -v /path/to/datasets:/workspace/datasets \ -v /path/to/models:/workspace/models \ -p 5555:5555 \ --name grootn17 \ grootn17:latest
| Task | Command |
|---|---|
| Start container | docker start -ai grootn17 |
| Second terminal in running container | docker exec -it grootn17 bash |
| Copy file host → container | docker cp file.py grootn17:/data/file.py |
| Copy between containers (via host) | docker cp c1:/path /tmp/ && docker cp /tmp/file c2:/path |
| List images with sizes | docker images |
| Commit container to image | docker commit grootn17 grootn17:latest |
| Save image to file | docker save grootn17:latest | gzip > grootn17.tar.gz |
| Load image from file | docker load < grootn17.tar.gz |
| Connect container to network | docker network connect my_net grootn17 |
/data, not /workspace — mounting over /workspace hides the installed venv--ipc=host --shm-size=16g — required for multi-worker dataloaders during fine-tuning-v arguments: -v "/my path/repo:/data"PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation OOM errors during fine-tuning