Local AI on Ubuntu — cyber-wyse

STEP 01

Install Ubuntu 24.04 LTS

Download the Ubuntu 24.04 LTS ISO from ubuntu.com. Use a tool like Rufus (on another Windows machine) or Balena Etcher to create a bootable USB drive from the ISO.

Boot from the USB (usually F12 or F2 on startup to get to the boot menu — this varies by manufacturer). Select "Install Ubuntu" and follow the installer. Choose to erase the disk and install Ubuntu fresh. This replaces Windows entirely.

During installation, create a username and password. Make a note of both — you'll need them constantly.

After Installation

You should boot into a working Ubuntu 24.04 desktop. Connect to your Wi-Fi network before proceeding.

STEP 02

Update the system

Open a terminal (Ctrl+Alt+T) and run a full system update before installing anything else.

terminalbash

$ sudo apt update && sudo apt upgrade -y

This will take a few minutes. Let it complete fully before moving on.

STEP 03

Install NVIDIA drivers

Ubuntu can detect and recommend the right NVIDIA driver automatically. This is the safest approach — it picks the correct proprietary driver version for your card.

First, check what Ubuntu recommends:

terminalbash

$ ubuntu-drivers devices
== /sys/bus/pci/devices/0000:01:00.0 ==
vendor   : NVIDIA Corporation
model    : GP107M [GeForce GTX 1050 Ti Mobile]
driver   : nvidia-driver-535 - distro non-free recommended

Install the recommended driver:

terminalbash

$ sudo ubuntu-drivers autoinstall
$ sudo reboot

Reboot required

The driver won't be active until after a reboot. After restarting, open a terminal and continue from here.

After rebooting, verify the driver is loaded:

terminalbash

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.xx    Driver Version: 535.xx    CUDA Version: 12.x          |
+-----------------------------------------------------------------------------+
| GeForce GTX 1050 Ti   ... 4096MiB  |
+-----------------------------------------------------------------------------+

If nvidia-smi returns a table showing your GPU, driver version, and CUDA version — the driver is working. If you get "command not found" or an error, the driver installation didn't complete cleanly; try running sudo apt install nvidia-driver-535 explicitly, then reboot again.

STEP 04

Install the NVIDIA Container Toolkit

This is the bridge that lets Docker containers see and use your NVIDIA GPU. Without it, Ollama running inside Docker will fall back to CPU inference. This step is what makes the setup genuinely fast.

Add the NVIDIA package repository:

terminalbash

$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

$ curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the toolkit:

terminalbash

$ sudo apt update
$ sudo apt install -y nvidia-container-toolkit

Configure Docker to use the NVIDIA runtime:

terminalbash

$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

Verify Docker can see the GPU:

terminalbash

$ docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi
+-----------------------------------------------------------------------------+
| GeForce GTX 1050 Ti ... CUDA 12.x  ✓                                       |
+-----------------------------------------------------------------------------+

GPU access confirmed ✓

If nvidia-smi runs successfully inside the Docker container and shows your GTX 1050 Ti, the toolkit is wired up correctly. Ollama will use your GPU for inference automatically from this point.

STEP 05

Install Docker

Ubuntu's default repositories include Docker, but the version can be outdated. The cleanest approach is to install Docker CE (Community Edition) directly from Docker's official repository.

First, install prerequisites:

terminalbash

$ sudo apt install -y ca-certificates curl gnupg lsb-release

Add Docker's GPG key:

terminalbash

$ sudo install -m 0755 -d /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ sudo chmod a+r /etc/apt/keyrings/docker.gpg

Add the Docker repository:

terminalbash

$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install Docker:

terminalbash

$ sudo apt update
$ sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Add your user to the docker group (so you don't need sudo for every docker command):

terminalbash

$ sudo usermod -aG docker $USER
$ newgrp docker

Verify Docker is working:

terminalbash

$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.

Docker is working ✓

If you see the "Hello from Docker!" message, you're good to proceed.

STEP 06

Deploy Ollama via Docker — with GPU support

Ollama is the layer that manages downloading, running, and interacting with LLMs. Running it inside Docker keeps everything contained and easy to update or remove. The --gpus all flag passes your GTX 1050 Ti through to the container for CUDA-accelerated inference.

Pull and run the Ollama container:

terminalbash

$ docker run -d \
  --name ollama \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --restart unless-stopped \
  ollama/ollama

Breaking this down:

-d — run in the background (detached)
--name ollama — give the container a name so we can reference it easily
--gpus all — pass all available NVIDIA GPUs into the container (requires the NVIDIA Container Toolkit from Step 04)
-v ollama:/root/.ollama — persist model files using a Docker volume (models survive container restarts)
-p 11434:11434 — expose Ollama's API on port 11434
--restart unless-stopped — auto-restart on reboot

Verify it's running and can see the GPU:

terminalbash

$ docker ps
CONTAINER ID   IMAGE           STATUS          NAMES
a1b2c3d4e5f6   ollama/ollama   Up 2 minutes    ollama

$ docker exec -it ollama nvidia-smi
GeForce GTX 1050 Ti   ✓

STEP 07

Pull your first model — Mistral 7B

Now tell Ollama to download Mistral. Models are pulled on first request and stored in the Docker volume, so they persist across restarts.

terminalbash

$ docker exec -it ollama ollama pull mistral
pulling manifest
pulling 2af3b81862c6... ████████████████ 4.1 GB
success

Download size

Mistral 7B is approximately 4.1GB. This will take a while depending on your connection. Don't interrupt it.

Test it immediately:

terminalbash

$ docker exec -it ollama ollama run mistral
>>> Send a message (/? for help)
> Hello! Can you confirm you're running locally?
I'm running locally on your machine via Ollama. No internet connection
required for inference — I'm entirely self-contained on your hardware.

Press Ctrl+D or type /bye to exit the interactive session.

STEP 08

Pull Qwen2.5

Qwen2.5 is a strong alternative — particularly good at reasoning and code tasks. Pull it the same way:

terminalbash

$ docker exec -it ollama ollama pull qwen2.5
pulling manifest
pulling 8b5... ████████████████ 4.7 GB
success

terminalbash

$ docker exec -it ollama ollama run qwen2.5

STEP 09

Check what's installed and manage models

terminalbash

$ docker exec -it ollama ollama list
NAME              ID              SIZE    MODIFIED
mistral:latest    2ae6f6dd7a3d    4.1 GB  2 minutes ago
qwen2.5:latest    845dbda0ea48    4.7 GB  1 minute ago

To remove a model you don't want (to free up disk space):

terminalbash

$ docker exec -it ollama ollama rm mistral

STEP 10

Use the REST API (optional — for integrations)

Ollama exposes a simple HTTP API on port 11434. You can call it from any other application on your machine — or from another machine on your local network if you open the port.

terminalbash

$ curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "What is Docker in one sentence?",
  "stream": false
}'

This returns a JSON response with the model's reply. Useful if you want to build tools that talk to your local models.

Local LLMs on
Repurposed Hardware

Your old laptop as a private AI server

The full walkthrough

Install Ubuntu 24.04 LTS

Update the system

Install NVIDIA drivers

Install the NVIDIA Container Toolkit

Install Docker

Deploy Ollama via Docker — with GPU support

Pull your first model — Mistral 7B

Pull Qwen2.5

Check what's installed and manage models

Use the REST API (optional — for integrations)

What's running and why

Why this matters

Local LLMs onRepurposed Hardware

Your old laptop as a private AI server

The full walkthrough

Install Ubuntu 24.04 LTS

Update the system

Install NVIDIA drivers

Install the NVIDIA Container Toolkit

Install Docker

Deploy Ollama via Docker — with GPU support

Pull your first model — Mistral 7B

Pull Qwen2.5

Check what's installed and manage models

Use the REST API (optional — for integrations)

What's running and why

Why this matters

Local LLMs on
Repurposed Hardware