Complete Install Guide

GR00T N1.7 WSL2 · Linux · Docker

Battle-tested installation guides for NVIDIA Isaac GR00T N1.7. Choose your platform: WSL2 on Windows, or Docker on Linux. Both cover fine-tuning setup with real fixes for RTX 5090 Blackwell hardware.

Designed by Sanaullah ↗

GR00T N1.7 CUDA 12.8 · Python 3.10 RTX 5090 · Blackwell sm_120 Docker · WSL2 Fine-tuning
Before You Begin
⚠ RTX 5090 (Blackwell / sm_120) — Triton Patch Required Triton 3.3.1 pinned by PyTorch 2.7 does NOT support sm_120. Fine-tuning will crash with RuntimeError in ptx_get_version() without Step 7.
CUDA Drivers on Windows WSL2 shares the Windows GPU driver automatically. Do NOT install a separate CUDA driver inside WSL — only the CUDA toolkit if nvcc is missing.
Installation Steps
01
Verify CUDA & GPU Inside WSL2 nvidia-smi · nvcc · python3.10

Open your WSL2 terminal and confirm the GPU and CUDA toolkit are visible:

$bash
# Must show your GPU + driver version
nvidia-smi

# Should show: release 12.8
nvcc --version

# Confirm Python 3.10
python3.10 --version
nvcc not found? Install the CUDA toolkit: sudo apt install -y nvidia-cuda-toolkit
Or use NVIDIA's official installer at developer.nvidia.com/cuda-downloads (Linux → x86_64 → Ubuntu → 22.04 → WSL-Ubuntu).
python3.10 not found? sudo apt install -y python3.10 python3.10-venv python3.10-dev
02
Install System Dependencies git-lfs · ffmpeg · build-essential
Install git-lfs BEFORE cloning. Skipping this causes parquet files in demo_data/ to arrive as corrupted LFS pointer stubs instead of actual data.
$bash
sudo apt-get update
sudo apt-get install -y git git-lfs ffmpeg curl build-essential

# Activate LFS hooks globally
git lfs install
03
Clone the Repository --recurse-submodules

GR00T uses git submodules for external dependencies. The --recurse-submodules flag fetches everything in one shot:

$bash
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
Already cloned without submodules? git submodule update --init --recursive
04
Fix Directory Permissions ⚠ Common issue — root-owned clone
If cloned as root, uv fails immediately with: error: failed to create directory .venv: Permission denied (os error 13)

Check ownership and fix if needed:

$bash
# Check who owns the directory
ls -la ..

# If owned by root, take ownership
sudo chown -R $USER:$USER .
what to look for
drwxr-xr-x root root Isaac-GR00T ← needs fixing
drwxr-xr-x user user Isaac-GR00T ← correct
05
Install uv and Sync Environment 137 packages · ~19 min on first run

GR00T uses uv for fast, reproducible dependency management:

$bash — from inside Isaac-GR00T/
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# Create venv and install all dependencies
uv sync --python 3.10
What gets installed: torch 2.7.1+cu128, flash-attn 2.7.4, TensorRT 10.15, deepspeed, transformers, torchcodec, and 130+ more packages. First run takes ~19 minutes.
flash-attn re-validation on every uv run? Normal — uv re-checks the cached wheel URL each time (~2–3s). It is not rebuilding from source.
06
Set CUDA_HOME Required for fine-tuning · deepspeed compilation
Fine-tuning will fail with: CUDA_HOME is unset — deepspeed needs this to compile CUDA extensions at runtime.
$bash
export CUDA_HOME=/usr/local/cuda
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
source ~/.bashrc
Alternatively, run the provided script: uv run bash scripts/deployment/dgpu/install_deps.sh
07
Patch Triton for RTX 5090 (Blackwell) ⚠ Critical — sm_120 not supported by Triton 3.3.1
Without this patch, fine-tuning crashes with: RuntimeError in Triton's ptx_get_version()
Triton 3.3.1 (pinned by PyTorch 2.7) does not recognise Blackwell GPU architecture sm_120.
$bash
uv run bash scripts/patch_triton_cuda13.sh
expected output
$ uv run bash scripts/patch_triton_cuda13.sh
Uninstalled 1 package in 2ms · Installed 1 package in 8ms
Patched .../triton/backends/nvidia/compiler.py to support CUDA 13.x
Installed triton_cuda13_patch.pth (runtime monkey-patch, survives uv reinstalls)
Still hitting torch.compile errors after the patch? Set TORCH_COMPILE=0 in your environment.
08
Verify Installation import gr00t · smoke test
$bash
uv run python -c "import gr00t; print('GR00T installed successfully')"

Optionally run zero-shot inference on the included demo dataset (downloads ~6GB base model on first run):

$bash
uv run python scripts/deployment/standalone_inference_script.py \
    --model-path      nvidia/GR00T-N1.7-3B \
    --dataset-path    demo_data/droid_sample \
    --embodiment-tag  OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \
    --traj-ids 1 2 \
    --inference-mode  pytorch \
    --action-horizon  8
09
Run Fine-tuning on Your Custom Data NEW_EMBODIMENT · single GPU

Replace dataset path and modality config with your own. Demo below uses the included SO100 5-episode dataset as a smoke test:

$bash — single GPU
CUDA_VISIBLE_DEVICES=0 uv run python \
    gr00t/experiment/launch_finetune.py \
    --base-model-path      nvidia/GR00T-N1.7-3B \
    --dataset-path         demo_data/cube_to_bowl_5 \
    --embodiment-tag       NEW_EMBODIMENT \
    --modality-config-path examples/SO100/so100_config.py \
    --num-gpus             1 \
    --output-dir           /tmp/test_finetune \
    --max-steps            5000 \
    --global-batch-size    8 \
    --gradient-accumulation-steps 4 \
    --num-shards-per-epoch 10 \
    --save-only-model \
    --dataloader-num-workers 2
OOM on 32GB VRAM? Use --global-batch-size 4 --gradient-accumulation-steps 8. Install bitsandbytes for 8-bit Adam: pip install bitsandbytes
Using your own data? Replace the dataset path with your dataset and provide your own modality config via --modality-config-path. See getting_started/finetune_new_embodiment.md for the full data format spec.
10
Zero-Shot Inference Results RTX 5090 · DROID demo · 2 trajectories
✓ Zero-shot inference working on RTX 5090 — no fine-tuning required for these results. Base model nvidia/GR00T-N1.7-3B evaluated on the included DROID demo dataset, 2 trajectories, 200 steps each.
Trajectory 1
MSE 0.0033
MAE 0.0376
266 timesteps · 25 inference steps
Trajectory 2
MSE 0.0369
MAE 0.1168
411 timesteps · 25 inference steps
Avg MSE
0.0201
Avg MAE
0.0772
Avg Step Time
169ms
Control Rate
~5.9 Hz
Key Files Reference
PathPurpose
gr00t/experiment/launch_finetune.pyMain fine-tuning entry point
examples/SO100/so100_config.pyExample modality config for a custom embodiment
getting_started/finetune_new_embodiment.mdFull new-embodiment data prep tutorial
scripts/patch_triton_cuda13.shTriton sm_ version patch — required for RTX 5090
scripts/deployment/dgpu/install_deps.shSets CUDA_HOME and GPU deps
gr00t/eval/open_loop_eval.pyCheckpoint validation
demo_data/cube_to_bowl_5/5-episode SO100 demo dataset
gr00t/configs/finetune_config.pyHyperparameters & state_dropout_prob
Fine-tuning Tips for RTX 5090
Start with --global-batch-size 8 with --gradient-accumulation-steps 4 — effective batch 32, safe on 32GB VRAM
Use --num-shards-per-epoch 10 and --save-only-model if hitting OOM — these are the key memory flags available in the launcher
Train 2,000–5,000 steps for new embodiments; monitor open-loop eval MSE to judge convergence
--state_dropout_prob defaults to 0.2; lower it if your task depends heavily on proprioceptive state
Add --use-wandb to enable Weights & Biases logging during training
If torch.compile still fails post-patch, set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True before running
Architecture Overview
Host Linux Machine
GPU driver · Docker
↓ --gpus all
grootn17 container
GR00T N1.7 · /workspace/.venv · port 5555
↑ -v host_path:/data
Host filesystem
repo · datasets · checkpoints
🐳 Key Design Principle Mount your repo and data from the host using -v. The Docker image contains only the installed Python environment — all code, datasets, and checkpoints stay on your host disk and survive container restarts.
⚠ Volume Mount Caveat The GR00T Dockerfile installs everything into /workspace/.venv inside the image. If you mount a volume over /workspace, the venv gets hidden. Mount your repo to /data instead and the venv stays intact.
Docker Installation Steps
01
Install Docker & NVIDIA Container Toolkit nvidia-container-toolkit · GPU passthrough
🐳bash — host
# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU is visible to Docker
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
Also install git-lfs on the host before cloning — required for demo parquet files:
sudo apt install -y git-lfs && git lfs install
02
Enable Docker BuildKit Required for --mount=type=cache in Dockerfile
Without BuildKit, the build fails with: the --mount option requires BuildKit
🐳bash — host
# Install buildx plugin
mkdir -p ~/.docker/cli-plugins
curl -SL https://github.com/docker/buildx/releases/download/v0.17.1/buildx-v0.17.1.linux-amd64 \
  -o ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx

# Enable BuildKit permanently
mkdir -p ~/.docker
cat > ~/.docker/config.json <<'EOF'
{
  "features": {
    "buildkit": "true"
  }
}
EOF

# Verify
docker buildx version
Quick one-shot alternative — prefix any build with:
DOCKER_BUILDKIT=1 docker build ...
03
Clone the Repository On host — mounted into container at /data

Clone on the host. The repo will be mounted into the container at /data so edits are live without rebuilding.

$bash — host
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
04
Build the Docker Image nvidia/cuda:12.8.0-devel · ~15–25 min

Run from the repo root. The existing docker/Dockerfile in the GR00T repo is used directly — no custom Dockerfile needed.

🐳bash — from Isaac-GR00T/ root
cd Isaac-GR00T

DOCKER_BUILDKIT=1 docker build \
  -f docker/Dockerfile \
  -t grootn17:latest \
  .
What the build does: Pulls nvidia/cuda:12.8.0-devel-ubuntu22.04, installs system deps (ffmpeg, git-lfs), installs uv, syncs all Python dependencies into /workspace/.venv, and installs the gr00t package itself. Takes 15–25 minutes on first run.
RTX 5090 (sm_120) note: torch.compile via Triton will fail on Blackwell inside the container the same way as WSL2. Use --inference-mode pytorch for inference. Fine-tuning OOM can be managed with batch size and gradient accumulation flags.
05
Run the Container --gpus all · mount repo at /data · port 5555
🐳bash — host
docker run -it --gpus all \
  --ipc=host \
  --shm-size=16g \
  -v /path/to/Isaac-GR00T:/data \
  -v /path/to/your-datasets:/workspace/datasets \
  -v /path/to/your-models:/workspace/models \
  -p 5555:5555 \
  --name grootn17 \
  grootn17:latest
Volume mount guide:
-v /path/to/Isaac-GR00T:/data — your cloned repo, live-editable
-v /path/to/datasets:/workspace/datasets — training datasets
-v /path/to/models:/workspace/models — fine-tuned checkpoints output
The venv lives at /workspace/.venv inside the image — untouched by mounts.
Path with spaces? Wrap the entire -v argument in quotes:
-v "/path/with spaces/Isaac-GR00T:/data"
Multi-line shell commands with backslash? Make sure there is NO space after each \. If you get command not found errors on the flag lines, run it as a single line instead.
06
Activate the Virtual Environment /workspace/.venv — all packages live here
The container does NOT auto-activate the venv. The system Python has no packages. Always activate first.
🐳bash — inside container
# Activate venv
source /workspace/.venv/bin/activate

# Make it auto-activate on every shell entry
echo "source /workspace/.venv/bin/activate" >> ~/.bashrc
Re-entering the container in future sessions:
docker start -ai grootn17 — the venv activates automatically after the bashrc line above.
07
Verify Installation GPU · CUDA · GR00T import
🐳bash — inside container (venv active)
# GPU check
python -c "import torch; print(torch.cuda.get_device_name(0)); print('CUDA:', torch.version.cuda)"

# GR00T check
python -c "import gr00t; print('GR00T OK')"

# Network / HuggingFace reachable
python -c "import requests; print(requests.get('https://huggingface.co').status_code)"
expected output
(gr00t) root@container:/data# python -c "import torch; print(torch.cuda.get_device_name(0))"
NVIDIA GeForce RTX 5090
CUDA: 12.8
GR00T OK
200
08
Run Zero-Shot Inference Base model · DROID demo · no fine-tuning needed

The base model (~6 GB) downloads automatically from HuggingFace on first run:

🐳bash — inside container
cd /data python scripts/deployment/standalone_inference_script.py \ --model-path nvidia/GR00T-N1.7-3B \ --dataset-path demo_data/droid_sample \ --embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT \ --traj-ids 1 2 \ --inference-mode pytorch \ --action-horizon 8
Policy server mode — run the model as a server for external clients over ZMQ:
python gr00t/eval/run_gr00t_server.py --model-path nvidia/GR00T-N1.7-3B --embodiment-tag OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT --device cuda:0 --host 0.0.0.0 --port 5555
09
Fine-tune on Your Custom Data NEW_EMBODIMENT · custom modality config

With your dataset mounted at /workspace/datasets and output going to /workspace/models (both on host disk), fine-tuning checkpoints persist across container restarts.

🐳bash — inside container
cd /data # Copy your modality config into the repo mkdir -p examples/MY_EMBODIMENT # Run fine-tuning CUDA_VISIBLE_DEVICES=0 \ PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ python gr00t/experiment/launch_finetune.py \ --base-model-path nvidia/GR00T-N1.7-3B \ --dataset-path /workspace/datasets/your-dataset \ --embodiment-tag NEW_EMBODIMENT \ --modality-config-path examples/MY_EMBODIMENT/my_config.py \ --num-gpus 1 \ --output-dir /workspace/models/grootn17_finetuned \ --max-steps 5000 \ --global-batch-size 8 \ --gradient-accumulation-steps 4 \ --num-shards-per-epoch 10 \ --save-only-model \ --dataloader-num-workers 2
Trainable parameters: 1.62B / 3.14B (51.54%)
The VLM backbone is frozen. Only the diffusion action head and projector are trained — fits on 32GB VRAM with the settings above.
Still hitting OOM? Drop to --global-batch-size 4 --gradient-accumulation-steps 8. Or add --no-tune-projector to reduce trainable params to ~1.09B.

After training, start the server with your checkpoint:

🐳bash — inside container
python gr00t/eval/run_gr00t_server.py \
    --model-path     /workspace/models/grootn17_finetuned \
    --embodiment-tag NEW_EMBODIMENT \
    --device         cuda:0 \
    --host           0.0.0.0 \
    --port           5555
10
Save & Restore Docker Images Commit containers · backup to disk
Best practice: commit your running containers to images before stopping them, then save images to disk. Containers reference images by ID — use clean names for clarity.
$bash — host
# Commit running container state → image
docker commit grootn17 grootn17:latest

# Save image to disk (gzip compressed)
mkdir -p /path/to/backups
docker save grootn17:latest | gzip > /path/to/backups/grootn17.tar.gz

# Restore later
docker load < /path/to/backups/grootn17.tar.gz
Re-entering your container after a reboot:
docker start -ai grootn17
Container shows old image ID? Docker cannot relink a container to a new image after creation. Solution: commit container → new image → docker rm old container → docker run from new image with the same volume mounts and flags.
$bash — full container recreation example
# 1. Commit current state docker commit grootn17 grootn17:latest # 2. Remove old container docker rm grootn17 # 3. Recreate from updated image docker run -it --gpus all \ --ipc=host --shm-size=16g \ -v /path/to/Isaac-GR00T:/data \ -v /path/to/datasets:/workspace/datasets \ -v /path/to/models:/workspace/models \ -p 5555:5555 \ --name grootn17 \ grootn17:latest
Docker Quick Reference
TaskCommand
Start containerdocker start -ai grootn17
Second terminal in running containerdocker exec -it grootn17 bash
Copy file host → containerdocker cp file.py grootn17:/data/file.py
Copy between containers (via host)docker cp c1:/path /tmp/ && docker cp /tmp/file c2:/path
List images with sizesdocker images
Commit container to imagedocker commit grootn17 grootn17:latest
Save image to filedocker save grootn17:latest | gzip > grootn17.tar.gz
Load image from filedocker load < grootn17.tar.gz
Connect container to networkdocker network connect my_net grootn17
Docker Tips for RTX 5090
Always mount your repo to /data, not /workspace — mounting over /workspace hides the installed venv
Use --ipc=host --shm-size=16g — required for multi-worker dataloaders during fine-tuning
Multiple containers on the same machine share the GPU — only run one fine-tuning job at a time
For multi-container setups (policy server + client), put all containers on the same Docker network so they resolve each other by name
Paths with spaces must be quoted in -v arguments: -v "/my path/repo:/data"
Use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation OOM errors during fine-tuning