GR00T N1.7 — Install Guide

Jump to step

01Verify CUDA 02System Deps 03Clone Repo 04Fix Permissions 05Install uv 06CUDA_HOME 07Triton Patch 08Verify 09Fine-tune 10Results ✓

Before You Begin

⚠ RTX 5090 (Blackwell / sm_120) — Triton Patch Required Triton 3.3.1 pinned by PyTorch 2.7 does NOT support sm_120. Fine-tuning will crash with RuntimeError in ptx_get_version() without Step 7.

CUDA Drivers on Windows WSL2 shares the Windows GPU driver automatically. Do NOT install a separate CUDA driver inside WSL — only the CUDA toolkit if nvcc is missing.

Installation Steps

Verify CUDA & GPU Inside WSL2 nvidia-smi · nvcc · python3.10

▶

Open your WSL2 terminal and confirm the GPU and CUDA toolkit are visible:

$bash

# Must show your GPU + driver version
nvidia-smi

# Should show: release 12.8
nvcc --version

# Confirm Python 3.10
python3.10 --version

nvcc not found? Install the CUDA toolkit: sudo apt install -y nvidia-cuda-toolkit
Or use NVIDIA's official installer at developer.nvidia.com/cuda-downloads (Linux → x86_64 → Ubuntu → 22.04 → WSL-Ubuntu).

python3.10 not found? sudo apt install -y python3.10 python3.10-venv python3.10-dev

Install System Dependencies git-lfs · ffmpeg · build-essential

▶

Install git-lfs BEFORE cloning. Skipping this causes parquet files in demo_data/ to arrive as corrupted LFS pointer stubs instead of actual data.

$bash

sudo apt-get update
sudo apt-get install -y git git-lfs ffmpeg curl build-essential

# Activate LFS hooks globally
git lfs install

Clone the Repository --recurse-submodules

▶

GR00T uses git submodules for external dependencies. The --recurse-submodules flag fetches everything in one shot:

$bash

git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T

Already cloned without submodules? git submodule update --init --recursive

Fix Directory Permissions ⚠ Common issue — root-owned clone

▶

If cloned as root, uv fails immediately with: error: failed to create directory .venv: Permission denied (os error 13)

Check ownership and fix if needed:

$bash

# Check who owns the directory
ls -la ..

# If owned by root, take ownership
sudo chown -R $USER:$USER .

what to look for

drwxr-xr-x root root Isaac-GR00T ← needs fixing
drwxr-xr-x user user Isaac-GR00T ← correct

Install uv and Sync Environment 137 packages · ~19 min on first run

▶

GR00T uses uv for fast, reproducible dependency management:

$bash — from inside Isaac-GR00T/

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# Create venv and install all dependencies
uv sync --python 3.10

What gets installed: torch 2.7.1+cu128, flash-attn 2.7.4, TensorRT 10.15, deepspeed, transformers, torchcodec, and 130+ more packages. First run takes ~19 minutes.

flash-attn re-validation on every uv run? Normal — uv re-checks the cached wheel URL each time (~2–3s). It is not rebuilding from source.

Set CUDA_HOME Required for fine-tuning · deepspeed compilation

▶

Fine-tuning will fail with: CUDA_HOME is unset — deepspeed needs this to compile CUDA extensions at runtime.

$bash

export CUDA_HOME=/usr/local/cuda
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
source ~/.bashrc

Alternatively, run the provided script: uv run bash scripts/deployment/dgpu/install_deps.sh

Patch Triton for RTX 5090 (Blackwell) ⚠ Critical — sm_120 not supported by Triton 3.3.1

▶

Without this patch, fine-tuning crashes with: RuntimeError in Triton's ptx_get_version()
Triton 3.3.1 (pinned by PyTorch 2.7) does not recognise Blackwell GPU architecture sm_120.

$bash

uv run bash scripts/patch_triton_cuda13.sh

expected output

$ uv run bash scripts/patch_triton_cuda13.sh
Uninstalled 1 package in 2ms · Installed 1 package in 8ms
Patched .../triton/backends/nvidia/compiler.py to support CUDA 13.x
Installed triton_cuda13_patch.pth (runtime monkey-patch, survives uv reinstalls)

Still hitting torch.compile errors after the patch? Set TORCH_COMPILE=0 in your environment.

Verify Installation import gr00t · smoke test

▶

$bash

uv run python -c "import gr00t; print('GR00T installed successfully')"

Optionally run zero-shot inference on the included demo dataset (downloads ~6GB base model on first run):

$bash

uv run python scripts/deployment/standalone_inference_script.py \
    --model-path      nvidia/GR00T-N1.7-3B \
    --dataset-path    demo_data/droid_sample \
    --embodiment-tag  OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
    --traj-ids 1 2 \
    --inference-mode  pytorch \
    --action-horizon  8

Run Fine-tuning on Your Custom Data NEW_EMBODIMENT · single GPU

▶

Replace dataset path and modality config with your own. Demo below uses the included SO100 5-episode dataset as a smoke test:

$bash — single GPU

CUDA_VISIBLE_DEVICES=0 uv run python \
    gr00t/experiment/launch_finetune.py \
    --base-model-path      nvidia/GR00T-N1.7-3B \
    --dataset-path         demo_data/cube_to_bowl_5 \
    --embodiment-tag       NEW_EMBODIMENT \
    --modality-config-path examples/SO100/so100_config.py \
    --num-gpus             1 \
    --output-dir           /tmp/test_finetune \
    --max-steps            5000 \
    --global-batch-size    8 \
    --gradient-accumulation-steps 4 \
    --num-shards-per-epoch 10 \
    --save-only-model \
    --dataloader-num-workers 2

OOM on 32GB VRAM? Use --global-batch-size 4 --gradient-accumulation-steps 8. Install bitsandbytes for 8-bit Adam: pip install bitsandbytes

Using your own data? Replace the dataset path with your dataset and provide your own modality config via --modality-config-path. See getting_started/finetune_new_embodiment.md for the full data format spec.

Zero-Shot Inference Results RTX 5090 · DROID demo · 2 trajectories

▶

✓ Zero-shot inference working on RTX 5090 — no fine-tuning required for these results. Base model nvidia/GR00T-N1.7-3B evaluated on the included DROID demo dataset, 2 trajectories, 200 steps each.

Trajectory 1

MSE 0.0033

MAE 0.0376

266 timesteps · 25 inference steps

Trajectory 2

MSE 0.0369

MAE 0.1168

411 timesteps · 25 inference steps

Avg MSE

0.0201

Avg MAE

0.0772

Avg Step Time

169ms

Control Rate

~5.9 Hz

Key Files Reference

Path	Purpose
`gr00t/experiment/launch_finetune.py`	Main fine-tuning entry point
`examples/SO100/so100_config.py`	Example modality config for a custom embodiment
`getting_started/finetune_new_embodiment.md`	Full new-embodiment data prep tutorial
`scripts/patch_triton_cuda13.sh`	Triton sm_ version patch — required for RTX 5090
`scripts/deployment/dgpu/install_deps.sh`	Sets CUDA_HOME and GPU deps
`gr00t/eval/open_loop_eval.py`	Checkpoint validation
`demo_data/cube_to_bowl_5/`	5-episode SO100 demo dataset
`gr00t/configs/finetune_config.py`	Hyperparameters & state_dropout_prob

Fine-tuning Tips for RTX 5090

→Start with --global-batch-size 8 with --gradient-accumulation-steps 4 — effective batch 32, safe on 32GB VRAM

→Use --num-shards-per-epoch 10 and --save-only-model if hitting OOM — these are the key memory flags available in the launcher

→Train 2,000–5,000 steps for new embodiments; monitor open-loop eval MSE to judge convergence

→--state_dropout_prob defaults to 0.2; lower it if your task depends heavily on proprioceptive state

→Add --use-wandb to enable Weights & Biases logging during training

→If torch.compile still fails post-patch, set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True before running

Jump to step

01Prerequisites 02Enable BuildKit 03Clone Repo 04Build Image 05Run Container 06Activate venv 07Verify 08Inference 09Fine-tune 10Backup Images

Architecture Overview

Host Linux Machine
GPU driver · Docker

↓ --gpus all

grootn17 container
GR00T N1.7 · /workspace/.venv · port 5555

↑ -v host_path:/data

Host filesystem
repo · datasets · checkpoints

🐳 Key Design Principle Mount your repo and data from the host using -v. The Docker image contains only the installed Python environment — all code, datasets, and checkpoints stay on your host disk and survive container restarts.

⚠ Volume Mount Caveat The GR00T Dockerfile installs everything into /workspace/.venv inside the image. If you mount a volume over /workspace, the venv gets hidden. Mount your repo to /data instead and the venv stays intact.

Docker Installation Steps

Install Docker & NVIDIA Container Toolkit nvidia-container-toolkit · GPU passthrough

▶

🐳bash — host

# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU is visible to Docker
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi

Also install git-lfs on the host before cloning — required for demo parquet files:
sudo apt install -y git-lfs && git lfs install

Enable Docker BuildKit Required for --mount=type=cache in Dockerfile

▶

Without BuildKit, the build fails with: the --mount option requires BuildKit

🐳bash — host

# Install buildx plugin
mkdir -p ~/.docker/cli-plugins
curl -SL https://github.com/docker/buildx/releases/download/v0.17.1/buildx-v0.17.1.linux-amd64 \
  -o ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx

# Enable BuildKit permanently
mkdir -p ~/.docker
cat > ~/.docker/config.json <<'EOF'
{
  "features": {
    "buildkit": "true"
  }
}
EOF

# Verify
docker buildx version

Quick one-shot alternative — prefix any build with:
DOCKER_BUILDKIT=1 docker build ...

Clone the Repository On host — mounted into container at /data

▶

Clone on the host. The repo will be mounted into the container at /data so edits are live without rebuilding.

$bash — host

git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T

Build the Docker Image nvidia/cuda:12.8.0-devel · ~15–25 min

▶

Run from the repo root. The existing docker/Dockerfile in the GR00T repo is used directly — no custom Dockerfile needed.

🐳bash — from Isaac-GR00T/ root

cd Isaac-GR00T

DOCKER_BUILDKIT=1 docker build \
  -f docker/Dockerfile \
  -t grootn17:latest \
  .

What the build does: Pulls nvidia/cuda:12.8.0-devel-ubuntu22.04, installs system deps (ffmpeg, git-lfs), installs uv, syncs all Python dependencies into /workspace/.venv, and installs the gr00t package itself. Takes 15–25 minutes on first run.

RTX 5090 (sm_120) note: torch.compile via Triton will fail on Blackwell inside the container the same way as WSL2. Use --inference-mode pytorch for inference. Fine-tuning OOM can be managed with batch size and gradient accumulation flags.

Run the Container --gpus all · mount repo at /data · port 5555

▶

🐳bash — host

docker run -it --gpus all \
  --ipc=host \
  --shm-size=16g \
  -v /path/to/Isaac-GR00T:/data \
  -v /path/to/your-datasets:/workspace/datasets \
  -v /path/to/your-models:/workspace/models \
  -p 5555:5555 \
  --name grootn17 \
  grootn17:latest

Volume mount guide:
-v /path/to/Isaac-GR00T:/data — your cloned repo, live-editable
-v /path/to/datasets:/workspace/datasets — training datasets
-v /path/to/models:/workspace/models — fine-tuned checkpoints output
The venv lives at /workspace/.venv inside the image — untouched by mounts.

Path with spaces? Wrap the entire -v argument in quotes:
-v "/path/with spaces/Isaac-GR00T:/data"

Multi-line shell commands with backslash? Make sure there is NO space after each \. If you get command not found errors on the flag lines, run it as a single line instead.

Activate the Virtual Environment /workspace/.venv — all packages live here

▶

The container does NOT auto-activate the venv. The system Python has no packages. Always activate first.

🐳bash — inside container

# Activate venv
source /workspace/.venv/bin/activate

# Make it auto-activate on every shell entry
echo "source /workspace/.venv/bin/activate" >> ~/.bashrc

Re-entering the container in future sessions:
docker start -ai grootn17 — the venv activates automatically after the bashrc line above.

Verify Installation GPU · CUDA · GR00T import

▶

🐳bash — inside container (venv active)

# GPU check
python -c "import torch; print(torch.cuda.get_device_name(0)); print('CUDA:', torch.version.cuda)"

# GR00T check
python -c "import gr00t; print('GR00T OK')"

# Network / HuggingFace reachable
python -c "import requests; print(requests.get('https://huggingface.co').status_code)"

expected output

(gr00t) root@container:/data# python -c "import torch; print(torch.cuda.get_device_name(0))"
NVIDIA GeForce RTX 5090
CUDA: 12.8
GR00T OK
200

Run Zero-Shot Inference Base model · DROID demo · no fine-tuning needed

▶

The base model (~6 GB) downloads automatically from HuggingFace on first run:

🐳bash — inside container

cd /data python scripts/deployment/standalone_inference_script.py \ --model-path nvidia/GR00T-N1.7-3B \ --dataset-path demo_data/droid_sample \ --embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \ --traj-ids 1 2 \ --inference-mode pytorch \ --action-horizon 8

Policy server mode — run the model as a server for external clients over ZMQ:

python gr00t/eval/run_gr00t_server.py --model-path nvidia/GR00T-N1.7-3B --embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT --device cuda:0 --host 0.0.0.0 --port 5555

Fine-tune on Your Custom Data NEW_EMBODIMENT · custom modality config

▶

With your dataset mounted at /workspace/datasets and output going to /workspace/models (both on host disk), fine-tuning checkpoints persist across container restarts.

🐳bash — inside container

cd /data # Copy your modality config into the repo mkdir -p examples/MY_EMBODIMENT # Run fine-tuning CUDA_VISIBLE_DEVICES=0 \ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ python gr00t/experiment/launch_finetune.py \ --base-model-path nvidia/GR00T-N1.7-3B \ --dataset-path /workspace/datasets/your-dataset \ --embodiment-tag NEW_EMBODIMENT \ --modality-config-path examples/MY_EMBODIMENT/my_config.py \ --num-gpus 1 \ --output-dir /workspace/models/grootn17_finetuned \ --max-steps 5000 \ --global-batch-size 8 \ --gradient-accumulation-steps 4 \ --num-shards-per-epoch 10 \ --save-only-model \ --dataloader-num-workers 2

Trainable parameters: 1.62B / 3.14B (51.54%)
The VLM backbone is frozen. Only the diffusion action head and projector are trained — fits on 32GB VRAM with the settings above.

Still hitting OOM? Drop to --global-batch-size 4 --gradient-accumulation-steps 8. Or add --no-tune-projector to reduce trainable params to ~1.09B.

After training, start the server with your checkpoint:

🐳bash — inside container

python gr00t/eval/run_gr00t_server.py \
    --model-path     /workspace/models/grootn17_finetuned \
    --embodiment-tag NEW_EMBODIMENT \
    --device         cuda:0 \
    --host           0.0.0.0 \
    --port           5555

Save & Restore Docker Images Commit containers · backup to disk

▶

Best practice: commit your running containers to images before stopping them, then save images to disk. Containers reference images by ID — use clean names for clarity.

$bash — host

# Commit running container state → image
docker commit grootn17 grootn17:latest

# Save image to disk (gzip compressed)
mkdir -p /path/to/backups
docker save grootn17:latest | gzip > /path/to/backups/grootn17.tar.gz

# Restore later
docker load < /path/to/backups/grootn17.tar.gz

Re-entering your container after a reboot:
docker start -ai grootn17

Container shows old image ID? Docker cannot relink a container to a new image after creation. Solution: commit container → new image → docker rm old container → docker run from new image with the same volume mounts and flags.

$bash — full container recreation example

# 1. Commit current state docker commit grootn17 grootn17:latest # 2. Remove old container docker rm grootn17 # 3. Recreate from updated image docker run -it --gpus all \ --ipc=host --shm-size=16g \ -v /path/to/Isaac-GR00T:/data \ -v /path/to/datasets:/workspace/datasets \ -v /path/to/models:/workspace/models \ -p 5555:5555 \ --name grootn17 \ grootn17:latest

Docker Quick Reference

Task	Command
Start container	`docker start -ai grootn17`
Second terminal in running container	`docker exec -it grootn17 bash`
Copy file host → container	`docker cp file.py grootn17:/data/file.py`
Copy between containers (via host)	`docker cp c1:/path /tmp/ && docker cp /tmp/file c2:/path`
List images with sizes	`docker images`
Commit container to image	`docker commit grootn17 grootn17:latest`
Save image to file	`docker save grootn17:latest \| gzip > grootn17.tar.gz`
Load image from file	`docker load < grootn17.tar.gz`
Connect container to network	`docker network connect my_net grootn17`

Docker Tips for RTX 5090

→Always mount your repo to /data, not /workspace — mounting over /workspace hides the installed venv

→Use --ipc=host --shm-size=16g — required for multi-worker dataloaders during fine-tuning

→Multiple containers on the same machine share the GPU — only run one fine-tuning job at a time

→For multi-container setups (policy server + client), put all containers on the same Docker network so they resolve each other by name

→Paths with spaces must be quoted in -v arguments: -v "/my path/repo:/data"

→Use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation OOM errors during fine-tuning