Introduction
Over the past few weeks, I’ve been diving into AI model training and inference using open-source GPT-style models. I wanted a setup that could take advantage of my NVIDIA RTX 5070 Ti for faster experimentation, but still run inside WSL (Windows Subsystem for Linux) for maximum compatibility with Linux-based tools.
After a bit of trial and error — and a couple of GPU compatibility hurdles — I now have a fully working environment that runs Hugging Face models directly on my GPU. Here’s exactly how I did it.
1. Installing WSL and Preparing Ubuntu
I started by making sure WSL2 was installed and running an Ubuntu distribution:
wsl --install -d Ubuntu
wsl --set-default-version 2
Then I launched Ubuntu from the Start Menu, created my user, and updated everything:
sudo apt update && sudo apt -y upgrade
2. Enabling GPU Support in WSL
Since I wanted GPU acceleration, I installed the latest NVIDIA Game Ready/Studio Driver for Windows. This is important because WSL uses the Windows driver to expose the GPU inside Linux.
Inside WSL, I checked GPU visibility:
nvidia-smi
If you see your GPU listed, you’re good to go.
3. Installing Micromamba for Environment Management
I like to keep my AI experiments isolated in separate environments, so I use micromamba (a lightweight conda alternative).
First, I installed bzip2
(needed for extracting micromamba):
sudo apt install -y bzip2
Then downloaded and initialized micromamba:
cd ~
curl -L https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj
./bin/micromamba shell init -s bash -r ~/micromamba
exec $SHELL
4. Creating a Python Environment
I created an environment named llm
with Python 3.11:
micromamba create -y -n llm python=3.11
micromamba activate llm
5. Installing PyTorch with RTX 5070 Ti Support
Here’s where I hit my first big roadblock. The PyTorch stable builds didn’t yet support my compute capability 12.0 (Blackwell architecture). The fix was to install the nightly cu128 build of PyTorch, which does include sm_120 support:
pip install transformers datasets accelerate peft bitsandbytes trl sentencepiece evaluate
Transformers – Hugging Face’s main library for working with pre-trained models (like GPT, BERT, etc.), including easy APIs for loading, running, and fine-tuning them.
Datasets – A fast, memory-efficient library for loading, processing, and sharing large datasets used in machine learning.
Accelerate – A tool from Hugging Face that makes it simple to run training across CPUs, GPUs, or multiple devices with minimal code changes.
PEFT (Parameter-Efficient Fine-Tuning) – A library for applying lightweight fine-tuning methods like LoRA so you can adapt large models without retraining all parameters.
Bitsandbytes – A library for quantizing models (e.g., 8-bit, 4-bit) to save memory and speed up inference/training, especially on GPUs.
TRL (Transformers Reinforcement Learning) – Hugging Face’s library for training transformer models with reinforcement learning techniques like RLHF (Reinforcement Learning from Human Feedback).
SentencePiece – A tokenizer library that helps split text into subword units, especially useful for multilingual and large-vocabulary models.
Evaluate – A library to easily compute machine learning metrics (like accuracy, BLEU, ROUGE, etc.) in a standardized way.
6. Installing AI Libraries
With PyTorch sorted, I installed the Hugging Face ecosystem and related tools:
pip install transformers datasets accelerate peft bitsandbytes trl sentencepiece evaluate
7. Testing GPU Inference
To confirm everything worked, I ran a small model on GPU:
from transformers import AutoTokenizer, pipeline
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tok = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model_id, tokenizer=tok, device_map="auto")
print(pipe("Explain LoRA in one sentence.", max_new_tokens=50)[0]["generated_text"])
The output came back quickly — and my GPU usage spiked in nvidia-smi
— a great sign that everything was working.
8. Conclusion
With this setup, I can run and fine-tune open-source GPT models entirely on my RTX 5070 Ti inside WSL. It’s a clean, isolated environment that avoids Windows-specific headaches and keeps me close to the Linux ecosystem most AI tooling is built for.
If you’re working with a newer NVIDIA GPU, don’t be surprised if you need to grab nightly builds until stable releases catch up. Once you do, you’ll be able to enjoy the full speed of your hardware without leaving the comfort of Windows.