Running Large Language Models Locally with Ollama
Ollama is an open-source solution that enables users to run large language models (LLMs) directly on their personal computers. It supports a variety of open-source LLMs, such as Llama 3, DeepSeek R1, Mistral, Phi-4, and Gemma 2, allowing them to operate without an internet connection. This approach improves security, safeguards privacy, and offers complete control over model customization and performance optimization.
Key Features of Ollama
Ollama comes with a built-in model repository, making it easy to find, download, and run LLMs locally. It also supports OpenWebUI, providing a user-friendly graphical interface for those who prefer not to use the command line. The platform is compatible with Linux, Windows, and macOS, eliminating the need for cloud-based APIs to execute models.
Setting Up Ollama and Running LLMs
This guide outlines the steps required to install Ollama and configure large language models (LLMs) with all necessary dependencies on a local workstation.
Downloading and Installing Ollama
Ollama is designed to run on Linux, macOS, and Windows, allowing users to install it seamlessly using the official release package or script. Follow the steps below to install the latest version of Ollama on your system.
Step 1: Open a Terminal
Begin by launching a new terminal session on your system.
Step 2: Install Ollama on Linux
To download and install Ollama on Linux, execute the following command:
$ curl -fsSL https://ollama.com/install.sh | sh
Step 3: Verify Installation
After installation, confirm that Ollama has been successfully installed by checking the version:
$ ollama -v
Expected output:
ollama version is 0.5.12
Step 4: List Available Models
To see all models available on your local machine, run:
$ ollama list
Managing Ollama as a System Service on Linux
When installed on Linux, Ollama creates a system service called ollama.service
to manage its operation. Follow these steps to check the service status and configure it to start at boot.
Check Ollama Service Status
To verify that Ollama is running, use:
$ sudo systemctl status ollama
Expected output:
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-02-26 13:33:41 UTC; 5min ago
Main PID: 27138 (ollama)
Tasks: 6 (limit: 2269)
Memory: 32.2M (peak: 32.7M)
CPU: 63ms
CGroup: /system.slice/ollama.service
└─27138 /usr/local/bin/ollama serve
Enable Ollama to Start at Boot
To configure Ollama to start automatically when your system boots up, execute:
$ sudo systemctl enable ollama
Restart Ollama Service
If necessary, restart the Ollama service using:
$ sudo systemctl restart ollama
Optional: Install AMD GPU ROCm Drivers for Ollama
For systems using AMD GPUs, download and install the ROCm-supported Ollama version:
$ curl -L https://ollama.com/download/ollama-linux-amd64-rocm.tgz -o ollama-linux-amd64-rocm.tgz
$ sudo tar -C /usr/ -xzf ollama-linux-amd64-rocm.tgz
Installing Ollama on macOS
To install Ollama on macOS:
- Visit the official Ollama website.
- Click “Download” and select the latest macOS package.
- Extract the downloaded
.zip
file. - Move
Ollama.app
to theApplications
folder.
Verify Installation
To confirm that Ollama is installed correctly, open a terminal and run:
$ ollama -v
$ ollama list
$ ollama serve
Installing Ollama on Windows
To install Ollama on Windows:
- Visit the official Ollama website.
- Download the latest
.exe
file. - Run the installer and click “Install” to complete the setup.
Verify Installation
After installation, open Windows PowerShell and run:
> ollama -v
> ollama list
> ollama serve
Downloading Large Language Models (LLMs) with Ollama
Ollama allows users to fetch models using the ollama pull
command. Follow these steps to download and run models locally.
Step 1: Pull a Model
Use the command below to fetch a model from the Ollama repository:
$ ollama pull [model]
Example: Download Mistral
To fetch the Mistral model, run:
$ ollama pull mistral
Example: Download DeepSeek-R1 with 1.5B Parameters
To retrieve the DeepSeek-R1-Distill-Qwen model, execute:
$ ollama pull deepseek-r1:1.5b
Example: Download Llama 3.3
Llama 3.3 is a large model (~40GB). Ensure sufficient storage before proceeding:
$ ollama pull llama3.3
Step 2: Verify Downloaded Models
To check which models have been downloaded, use:
$ ollama list
Example output:
NAME ID SIZE MODIFIED
llama3.3:latest a6eb4748fd29 42 GB 21 seconds ago
deepseek-r1:1.5b a42b25d8c10a 1.1 GB 4 minutes ago
mistral:latest f974a74358d6 4.1 GB 26 minutes ago
Using Ollama to Run AI Models
Ollama allows users to execute, pull, and initialize large language models directly from its repository or from locally stored models. Before running a model, ensure that your system meets the required hardware specifications. Follow the steps below to test models and analyze their performance on your workstation.
Step 1: List Available Models
Check all models currently installed on your machine by running the following command:
$ ollama list
Step 2: Run a Model
To execute a model, use the ollama run
command. For example, to run the Qwen 2.5 instruct model with 1.5B parameters, use:
$ ollama run qwen2.5:1.5b
Step 3: Provide a Prompt
Once the model is running, enter a prompt.
The model will generate a response in the terminal.
Step 4: Exit the Model
To exit Ollama, enter the following command:
>>> /bye
Running a Different Model
You can also execute a model that is already available on your workstation. For instance, if you previously downloaded the DeepSeek R1 model, you can run it with:
$ ollama run deepseek-r1:1.5b
Step 6: Enter a Prompt
For a test case, enter a mathematical challenge, such as:
Prompt: Generate a recursive fractal pattern description using only mathematical notations and symbolic logic.
Step 7: Observe the Model’s Response
The AI will process the request and output a structured mathematical description.
Okay, so I need to create a recursive fractal pattern using only mathematical notation and symbols. Hmm, that's an interesting challenge. Let me think about what I know about fractals and how they can be represented mathematically.
...................................
Step 8: Exit Ollama
To close the session, type:
>>> /bye
Why Use Ollama to Run AI Models?
Different models have unique strengths and are designed for various tasks. Running models locally using Ollama allows users to benchmark their efficiency and effectiveness. The ollama run
command enables you to execute available models instantly, while the ollama pull
command fetches the latest versions from the official Ollama repository.
Managing Models with Ollama
Handling multiple large language models (LLMs) on your system requires effective management. Ollama provides commands to list, view details, stop, and remove models. Follow the steps below to manage models on your workstation.
Step 1: List Available Models
To see all models currently stored on your system, run:
$ ollama list
Step 2: Display Model Details
To view detailed information about a specific model, such as Llama 3.3, use:
$ ollama show llama3.3
Expected output:
Model
architecture llama
parameters 70.6B
context length 131072
embedding length 8192
quantization Q4_K_M
Parameters
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
LLAMA 3.3 COMMUNITY LICENSE AGREEMENT
Llama 3.3 Version Release Date: December 6, 2024
Step 3: Stop a Running Model
If a model is actively running, stop it using the following command:
$ ollama stop [model-name]
For example, to stop the DeepSeek R1 model:
$ ollama stop deepseek-r1:1.5b
Step 4: Remove an Unused Model
To delete a model from your system, run:
$ ollama rm mistral
Expected output:
deleted 'mistral'
Setting Ollama Environment Variables
Ollama provides environment variables to fine-tune its behavior and optimize performance. Below are some commonly used variables:
- OLLAMA_HOST: Defines the Ollama server address.
- OLLAMA_GPU_OVERHEAD: Allocates VRAM for GPU processing.
- OLLAMA_MODELS: Sets a custom directory for storing models.
- OLLAMA_KEEP_ALIVE: Determines how long models remain loaded in memory.
- OLLAMA_DEBUG: Enables debugging output.
- OLLAMA_FLASH_ATTENTION: Activates performance optimizations.
- OLLAMA_NOHISTORY: Disables session history logging.
- OLLAMA_NOPRUNE: Prevents model cleanup during system boot.
- OLLAMA_ORIGINS: Configures access permissions for remote connections.
Configuring Ollama Variables on Linux
To set environment variables for Ollama on Linux:
Step 1: Open the Service File
$ sudo vim /etc/systemd/system/ollama.service
Step 2: Add Environment Variables
Insert the following lines under the [Service]
section:
[Service]
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"
Step 3: Apply Changes
$ sudo systemctl daemon-reload
$ sudo systemctl restart ollama
Setting Ollama Variables on macOS
To configure environment variables on macOS:
$ launchctl setenv OLLAMA_HOST "0.0.0.0"
$ ollama serve
Setting Ollama Variables on Windows
To configure variables on Windows:
- Open the Windows search menu and search for “Environment Variables”.
- Select “Edit the System Variables”.
- Click “Environment Variables”.
- Click “New” to create a new entry.
- Enter the variable name and value.
- Click “OK” to save the settings.
- Click “Apply” to finalize the changes.
Conclusion
You have successfully installed and configured Ollama to run large language models on your local system. Whether running models locally or on a remote machine, Ollama provides an efficient solution. Use environment variables to enhance performance and allow remote access. For further information, refer to the Ollama GitHub repository.