← Back to Work

// 02 / LOCAL AI

Local LLMs on
Repurposed Hardware

OSUbuntu 24.04.4 LTS
RuntimeDocker + Ollama
ModelsMistral 7B · Qwen2.5
GPUGTX 1050 Ti (CUDA-accelerated)

// the idea

Your old laptop as a private AI server

There's a common assumption that running large language models locally requires expensive, top-of-the-line hardware. That's not quite right. A decent machine from a few years ago — with a CUDA-capable GPU — can run models like Mistral 7B and Qwen2.5 with genuinely useful performance.

This walkthrough documents exactly what was done to take an HP Omen gaming laptop, replace Windows with Ubuntu, get the NVIDIA drivers and CUDA toolkit properly configured, deploy Docker and Ollama, and pull down and run multiple LLMs — completely locally, no API keys, no cloud costs, full control over your data.

Every command is included. Every gotcha encountered along the way is noted.

Machine
HP Omen (laptop)
CPU
Intel Core i5-8300H × 8
RAM
32GB
GPU
NVIDIA GeForce GTX 1050 Ti
Original OS
Windows (replaced)
New OS
Ubuntu 24.04.4 LTS
Container Runtime
Docker CE
LLM Manager
Ollama (Docker image)
Models Running
Mistral 7B, Qwen2.5 7B
Inference
GPU-accelerated (CUDA)
Note

This guide assumes you're comfortable with basic terminal usage and understand that installing Ubuntu will wipe your existing Windows installation. Back up anything you need first.

// step by step

The full walkthrough

STEP 01

Install Ubuntu 24.04 LTS

Download the Ubuntu 24.04 LTS ISO from ubuntu.com. Use a tool like Rufus (on another Windows machine) or Balena Etcher to create a bootable USB drive from the ISO.

Boot from the USB (usually F12 or F2 on startup to get to the boot menu — this varies by manufacturer). Select "Install Ubuntu" and follow the installer. Choose to erase the disk and install Ubuntu fresh. This replaces Windows entirely.

During installation, create a username and password. Make a note of both — you'll need them constantly.

After Installation

You should boot into a working Ubuntu 24.04 desktop. Connect to your Wi-Fi network before proceeding.

STEP 02

Update the system

Open a terminal (Ctrl+Alt+T) and run a full system update before installing anything else.

terminalbash
$ sudo apt update && sudo apt upgrade -y

This will take a few minutes. Let it complete fully before moving on.

STEP 03

Install NVIDIA drivers

Ubuntu can detect and recommend the right NVIDIA driver automatically. This is the safest approach — it picks the correct proprietary driver version for your card.

First, check what Ubuntu recommends:

terminalbash
$ ubuntu-drivers devices
== /sys/bus/pci/devices/0000:01:00.0 ==
vendor   : NVIDIA Corporation
model    : GP107M [GeForce GTX 1050 Ti Mobile]
driver   : nvidia-driver-535 - distro non-free recommended

Install the recommended driver:

terminalbash
$ sudo ubuntu-drivers autoinstall
$ sudo reboot
Reboot required

The driver won't be active until after a reboot. After restarting, open a terminal and continue from here.

After rebooting, verify the driver is loaded:

terminalbash
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.xx    Driver Version: 535.xx    CUDA Version: 12.x          |
+-----------------------------------------------------------------------------+
| GeForce GTX 1050 Ti   ... 4096MiB  |
+-----------------------------------------------------------------------------+

If nvidia-smi returns a table showing your GPU, driver version, and CUDA version — the driver is working. If you get "command not found" or an error, the driver installation didn't complete cleanly; try running sudo apt install nvidia-driver-535 explicitly, then reboot again.

STEP 04

Install the NVIDIA Container Toolkit

This is the bridge that lets Docker containers see and use your NVIDIA GPU. Without it, Ollama running inside Docker will fall back to CPU inference. This step is what makes the setup genuinely fast.

Add the NVIDIA package repository:

terminalbash
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

$ curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the toolkit:

terminalbash
$ sudo apt update
$ sudo apt install -y nvidia-container-toolkit

Configure Docker to use the NVIDIA runtime:

terminalbash
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

Verify Docker can see the GPU:

terminalbash
$ docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi
+-----------------------------------------------------------------------------+
| GeForce GTX 1050 Ti ... CUDA 12.x  ✓                                       |
+-----------------------------------------------------------------------------+
GPU access confirmed ✓

If nvidia-smi runs successfully inside the Docker container and shows your GTX 1050 Ti, the toolkit is wired up correctly. Ollama will use your GPU for inference automatically from this point.

STEP 05

Install Docker

Ubuntu's default repositories include Docker, but the version can be outdated. The cleanest approach is to install Docker CE (Community Edition) directly from Docker's official repository.

First, install prerequisites:

terminalbash
$ sudo apt install -y ca-certificates curl gnupg lsb-release

Add Docker's GPG key:

terminalbash
$ sudo install -m 0755 -d /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ sudo chmod a+r /etc/apt/keyrings/docker.gpg

Add the Docker repository:

terminalbash
$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install Docker:

terminalbash
$ sudo apt update
$ sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Add your user to the docker group (so you don't need sudo for every docker command):

terminalbash
$ sudo usermod -aG docker $USER
$ newgrp docker

Verify Docker is working:

terminalbash
$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
Docker is working ✓

If you see the "Hello from Docker!" message, you're good to proceed.

STEP 06

Deploy Ollama via Docker — with GPU support

Ollama is the layer that manages downloading, running, and interacting with LLMs. Running it inside Docker keeps everything contained and easy to update or remove. The --gpus all flag passes your GTX 1050 Ti through to the container for CUDA-accelerated inference.

Pull and run the Ollama container:

terminalbash
$ docker run -d \
  --name ollama \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --restart unless-stopped \
  ollama/ollama

Breaking this down:

-d — run in the background (detached)
--name ollama — give the container a name so we can reference it easily
--gpus all — pass all available NVIDIA GPUs into the container (requires the NVIDIA Container Toolkit from Step 04)
-v ollama:/root/.ollama — persist model files using a Docker volume (models survive container restarts)
-p 11434:11434 — expose Ollama's API on port 11434
--restart unless-stopped — auto-restart on reboot

Verify it's running and can see the GPU:

terminalbash
$ docker ps
CONTAINER ID   IMAGE           STATUS          NAMES
a1b2c3d4e5f6   ollama/ollama   Up 2 minutes    ollama

$ docker exec -it ollama nvidia-smi
GeForce GTX 1050 Ti   ✓
STEP 07

Pull your first model — Mistral 7B

Now tell Ollama to download Mistral. Models are pulled on first request and stored in the Docker volume, so they persist across restarts.

terminalbash
$ docker exec -it ollama ollama pull mistral
pulling manifest
pulling 2af3b81862c6... ████████████████ 4.1 GB
success
Download size

Mistral 7B is approximately 4.1GB. This will take a while depending on your connection. Don't interrupt it.

Test it immediately:

terminalbash
$ docker exec -it ollama ollama run mistral
>>> Send a message (/? for help)
> Hello! Can you confirm you're running locally?
I'm running locally on your machine via Ollama. No internet connection
required for inference — I'm entirely self-contained on your hardware.

Press Ctrl+D or type /bye to exit the interactive session.

STEP 08

Pull Qwen2.5

Qwen2.5 is a strong alternative — particularly good at reasoning and code tasks. Pull it the same way:

terminalbash
$ docker exec -it ollama ollama pull qwen2.5
pulling manifest
pulling 8b5... ████████████████ 4.7 GB
success
terminalbash
$ docker exec -it ollama ollama run qwen2.5
STEP 09

Check what's installed and manage models

terminalbash
$ docker exec -it ollama ollama list
NAME              ID              SIZE    MODIFIED
mistral:latest    2ae6f6dd7a3d    4.1 GB  2 minutes ago
qwen2.5:latest    845dbda0ea48    4.7 GB  1 minute ago

To remove a model you don't want (to free up disk space):

terminalbash
$ docker exec -it ollama ollama rm mistral
STEP 10

Use the REST API (optional — for integrations)

Ollama exposes a simple HTTP API on port 11434. You can call it from any other application on your machine — or from another machine on your local network if you open the port.

terminalbash
$ curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "What is Docker in one sentence?",
  "stream": false
}'

This returns a JSON response with the model's reply. Useful if you want to build tools that talk to your local models.

// the models

What's running and why

Mistral 7B
~4.1GB download

Excellent general-purpose model. Strong at conversation, summarisation, coding help, and writing. A great first choice for local deployment — fast enough on CPU for practical use.

Qwen2.5 7B
~4.7GB download

Alibaba's model. Particularly strong at reasoning tasks, code generation, and structured outputs. Worth running alongside Mistral for comparison — different strengths.

Both run with CUDA acceleration via the GTX 1050 Ti. Response speed is comfortably practical — fast enough for real daily use across writing, coding, and reasoning tasks.

// what this demonstrates

Why this matters

The point isn't just that it works — it's what it represents. You don't need a subscription, an API key, or cloud infrastructure to run capable AI models. A gaming laptop that's a few years old, an afternoon, and some terminal commands is enough.

The GTX 1050 Ti is not a cutting-edge GPU — it's a mobile card from 2017 with 4GB of VRAM. Running 7B parameter models on it via CUDA is genuinely fast enough for practical use: conversational responses, code assistance, summarisation, and more. The combination of 32GB system RAM and GPU acceleration means the models stay loaded and respond without constant reloading.

For privacy, this means your prompts and responses never leave your machine. For cost, it means zero ongoing charges for inference. For understanding, it means you get a real sense of how these systems are deployed, not just consumed as a black-box service.

As a next step, a lightweight web UI like Open WebUI can be deployed as another Docker container to give you a full chat interface that connects directly to your local Ollama instance — making it feel very close to using a cloud product, running entirely on your own hardware.