Running Large Language Models Locally with Ollama

Ollama is an open-source solution that enables users to run large language models (LLMs) directly on their personal computers. It supports a variety of open-source LLMs, such as Llama 3, DeepSeek R1, Mistral, Phi-4, and Gemma 2, allowing them to operate without an internet connection. This approach improves security, safeguards privacy, and offers complete control over model customization and performance optimization.

Key Features of Ollama

Ollama comes with a built-in model repository, making it easy to find, download, and run LLMs locally. It also supports OpenWebUI, providing a user-friendly graphical interface for those who prefer not to use the command line. The platform is compatible with Linux, Windows, and macOS, eliminating the need for cloud-based APIs to execute models.

Setting Up Ollama and Running LLMs

This guide outlines the steps required to install Ollama and configure large language models (LLMs) with all necessary dependencies on a local workstation.

Downloading and Installing Ollama

Ollama is designed to run on Linux, macOS, and Windows, allowing users to install it seamlessly using the official release package or script. Follow the steps below to install the latest version of Ollama on your system.

Step 1: Open a Terminal

Begin by launching a new terminal session on your system.

Step 2: Install Ollama on Linux

To download and install Ollama on Linux, execute the following command:

$ curl -fsSL https://ollama.com/install.sh | sh

Step 3: Verify Installation

After installation, confirm that Ollama has been successfully installed by checking the version:

Expected output:

Step 4: List Available Models

To see all models available on your local machine, run:

Managing Ollama as a System Service on Linux

When installed on Linux, Ollama creates a system service called ollama.service to manage its operation. Follow these steps to check the service status and configure it to start at boot.

Check Ollama Service Status

To verify that Ollama is running, use:

$ sudo systemctl status ollama

Expected output:

● ollama.service - Ollama Service
    Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
    Active: active (running) since Wed 2025-02-26 13:33:41 UTC; 5min ago
Main PID: 27138 (ollama)
    Tasks: 6 (limit: 2269)
    Memory: 32.2M (peak: 32.7M)
        CPU: 63ms
    CGroup: /system.slice/ollama.service
            └─27138 /usr/local/bin/ollama serve

Enable Ollama to Start at Boot

To configure Ollama to start automatically when your system boots up, execute:

$ sudo systemctl enable ollama

Restart Ollama Service

If necessary, restart the Ollama service using:

$ sudo systemctl restart ollama

Optional: Install AMD GPU ROCm Drivers for Ollama

For systems using AMD GPUs, download and install the ROCm-supported Ollama version:

$ curl -L https://ollama.com/download/ollama-linux-amd64-rocm.tgz -o ollama-linux-amd64-rocm.tgz
$ sudo tar -C /usr/ -xzf ollama-linux-amd64-rocm.tgz

Installing Ollama on macOS

To install Ollama on macOS:

  1. Visit the official Ollama website.
  2. Click “Download” and select the latest macOS package.
  3. Extract the downloaded .zip file.
  4. Move Ollama.app to the Applications folder.

Verify Installation

To confirm that Ollama is installed correctly, open a terminal and run:

$ ollama -v
$ ollama list
$ ollama serve

Installing Ollama on Windows

To install Ollama on Windows:

  1. Visit the official Ollama website.
  2. Download the latest .exe file.
  3. Run the installer and click “Install” to complete the setup.

Verify Installation

After installation, open Windows PowerShell and run:

> ollama -v
> ollama list
> ollama serve

Downloading Large Language Models (LLMs) with Ollama

Ollama allows users to fetch models using the ollama pull command. Follow these steps to download and run models locally.

Step 1: Pull a Model

Use the command below to fetch a model from the Ollama repository:

Example: Download Mistral

To fetch the Mistral model, run:

Example: Download DeepSeek-R1 with 1.5B Parameters

To retrieve the DeepSeek-R1-Distill-Qwen model, execute:

$ ollama pull deepseek-r1:1.5b

Example: Download Llama 3.3

Llama 3.3 is a large model (~40GB). Ensure sufficient storage before proceeding:

Step 2: Verify Downloaded Models

To check which models have been downloaded, use:

Example output:

NAME                ID              SIZE      MODIFIED
llama3.3:latest     a6eb4748fd29    42 GB     21 seconds ago
deepseek-r1:1.5b    a42b25d8c10a    1.1 GB    4 minutes ago
mistral:latest      f974a74358d6    4.1 GB    26 minutes ago

Using Ollama to Run AI Models

Ollama allows users to execute, pull, and initialize large language models directly from its repository or from locally stored models. Before running a model, ensure that your system meets the required hardware specifications. Follow the steps below to test models and analyze their performance on your workstation.

Step 1: List Available Models

Check all models currently installed on your machine by running the following command:

Step 2: Run a Model

To execute a model, use the ollama run command. For example, to run the Qwen 2.5 instruct model with 1.5B parameters, use:

Step 3: Provide a Prompt

Once the model is running, enter a prompt.

The model will generate a response in the terminal.

Step 4: Exit the Model

To exit Ollama, enter the following command:

Running a Different Model

You can also execute a model that is already available on your workstation. For instance, if you previously downloaded the DeepSeek R1 model, you can run it with:

$ ollama run deepseek-r1:1.5b

Step 6: Enter a Prompt

For a test case, enter a mathematical challenge, such as:

Prompt: Generate a recursive fractal pattern description using only mathematical notations and symbolic logic.

Step 7: Observe the Model’s Response

The AI will process the request and output a structured mathematical description.

Okay, so I need to create a recursive fractal pattern using only mathematical notation and symbols. Hmm, that's an interesting challenge. Let me think about what I know about fractals and how they can be represented mathematically.
...................................

Step 8: Exit Ollama

To close the session, type:

Why Use Ollama to Run AI Models?

Different models have unique strengths and are designed for various tasks. Running models locally using Ollama allows users to benchmark their efficiency and effectiveness. The ollama run command enables you to execute available models instantly, while the ollama pull command fetches the latest versions from the official Ollama repository.

Managing Models with Ollama

Handling multiple large language models (LLMs) on your system requires effective management. Ollama provides commands to list, view details, stop, and remove models. Follow the steps below to manage models on your workstation.

Step 1: List Available Models

To see all models currently stored on your system, run:

Step 2: Display Model Details

To view detailed information about a specific model, such as Llama 3.3, use:

Expected output:

 Model
   architecture        llama
   parameters          70.6B
   context length      131072
   embedding length    8192
   quantization        Q4_K_M

 Parameters
   stop    "<|start_header_id|>"
   stop    "<|end_header_id|>"
   stop    "<|eot_id|>"

 License
   LLAMA 3.3 COMMUNITY LICENSE AGREEMENT
   Llama 3.3 Version Release Date: December 6, 2024

Step 3: Stop a Running Model

If a model is actively running, stop it using the following command:

$ ollama stop [model-name]

For example, to stop the DeepSeek R1 model:

$ ollama stop deepseek-r1:1.5b

Step 4: Remove an Unused Model

To delete a model from your system, run:

Expected output:

Setting Ollama Environment Variables

Ollama provides environment variables to fine-tune its behavior and optimize performance. Below are some commonly used variables:

  • OLLAMA_HOST: Defines the Ollama server address.
  • OLLAMA_GPU_OVERHEAD: Allocates VRAM for GPU processing.
  • OLLAMA_MODELS: Sets a custom directory for storing models.
  • OLLAMA_KEEP_ALIVE: Determines how long models remain loaded in memory.
  • OLLAMA_DEBUG: Enables debugging output.
  • OLLAMA_FLASH_ATTENTION: Activates performance optimizations.
  • OLLAMA_NOHISTORY: Disables session history logging.
  • OLLAMA_NOPRUNE: Prevents model cleanup during system boot.
  • OLLAMA_ORIGINS: Configures access permissions for remote connections.

Configuring Ollama Variables on Linux

To set environment variables for Ollama on Linux:

Step 1: Open the Service File

$ sudo vim /etc/systemd/system/ollama.service

Step 2: Add Environment Variables

Insert the following lines under the [Service] section:

[Service]
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"

Step 3: Apply Changes

$ sudo systemctl daemon-reload
$ sudo systemctl restart ollama

Setting Ollama Variables on macOS

To configure environment variables on macOS:

$ launchctl setenv OLLAMA_HOST "0.0.0.0"
$ ollama serve

Setting Ollama Variables on Windows

To configure variables on Windows:

  1. Open the Windows search menu and search for “Environment Variables”.
  2. Select “Edit the System Variables”.
  3. Click “Environment Variables”.
  4. Click “New” to create a new entry.
  5. Enter the variable name and value.
  6. Click “OK” to save the settings.
  7. Click “Apply” to finalize the changes.

Conclusion

You have successfully installed and configured Ollama to run large language models on your local system. Whether running models locally or on a remote machine, Ollama provides an efficient solution. Use environment variables to enhance performance and allow remote access. For further information, refer to the Ollama GitHub repository.

Source: vultr.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Apache Airflow on Ubuntu 24.04 with Nginx and SSL

Apache, Tutorial

This guide provides step-by-step instructions for installing and configuring the Cohere Toolkit on Ubuntu 24.04. It includes environment preparation, dependency setup, and key commands to run language models and implement Retrieval-Augmented Generation (RAG) workflows. Ideal for developers building AI applications or integrating large language models into their existing projects.

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Install Ruby on Rails on Debian 12 – Complete Guide

Linux Basics, Tutorial

This guide provides step-by-step instructions for installing and configuring the Cohere Toolkit on Ubuntu 24.04. It includes environment preparation, dependency setup, and key commands to run language models and implement Retrieval-Augmented Generation (RAG) workflows. Ideal for developers building AI applications or integrating large language models into their existing projects.

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Install VeraCrypt on Ubuntu 24.04 for Secure Encryption

Security, Tutorial

This guide provides step-by-step instructions for installing and configuring the Cohere Toolkit on Ubuntu 24.04. It includes environment preparation, dependency setup, and key commands to run language models and implement Retrieval-Augmented Generation (RAG) workflows. Ideal for developers building AI applications or integrating large language models into their existing projects.