How to Run Ollama with Large Language Models Locally Using Docker


If you interested in exploring the capabilities of large language models like Ollama, but don't want to rely on cloud services or complex setup processes? In this tutorial, we'll guide you through the process of running Ollama with Docker, allowing you to access these powerful models from the comfort of your own machine.


Prerequisites:

Before we dive in, make sure you have the following:

  • Docker installed on your machine
  • A compatible GPU (CPU-only option available)

Step 1: Choose Your GPU

Ollama supports three types of GPUs: CPU-only, NVIDIA, and AMD. Choose the option that best suits your needs:

  1. CPU-only: Perfect for those without a GPU or who want to test the model without a GPU.
  2. NVIDIA: For those with an NVIDIA GPU, this option provides the best performance.
  3. AMD: For those with an AMD GPU, this option allows you to run Ollama using the ROCm runtime.

Step 2: Install the Necessary Tools (NVIDIA GPU Only)

If you're using an NVIDIA GPU, you'll need to install the NVIDIA Container Toolkit. Follow these steps:

    Install with Apt (Ubuntu-based systems):
  1. Configure the repository:
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg and curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  2. Update the package list: sudo apt-get update
  3. Install the NVIDIA Container Toolkit packages: sudo apt-get install -y nvidia-container-toolkit
  4. Install with Yum or Dnf (RPM-based systems):
  5. Configure the repository: curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
  6. Install the NVIDIA Container Toolkit packages: sudo yum install -y nvidia-container-toolkit</li>
  7. Configure Docker to use the Nvidia driver: sudo nvidia-ctk runtime configure --runtime=docker and sudo systemctl restart docker

Step 3: Run the Ollama Container

Once you've installed the necessary tools (if using an NVIDIA GPU), run the Ollama container using the following command:

  1. CPU-only: docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  2. NVIDIA GPU: docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  3. AMD GPU: docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

Step 4: Run a Model Locally

Now that the container is running, you can execute a model using the following command:

docker exec -it ollama ollama run llama3

This will run the llama3 model using the Ollama container. You can replace llama3 with any other model name to test different models.

That's it! With these simple steps, you can now run Ollama with large language models locally using Docker. Experiment with different models and configurations to unlock the full potential of Ollama.


Reference: https://hub.docker.com/r/ollama/ollama

Facing issues? Have Questions? Post them here! I am happy to answer!

Author Info:

Rakesh (He/Him) has over 14+ years of experience in Web and Application development. He is the author of insightful How-To articles for Code2care.

Follow him on: X

You can also reach out to him via e-mail: rakesh@code2care.org

Copyright © Code2care 2024 | Privacy Policy | About Us | Contact Us | Sitemap