Skip to main content

Running a Chatbot

This document provides step-by-step instructions on setting up and running a LLaMA chatbot on your Hyperstack virtual machine. By following these instructions, you can configure your environment, install the necessary tools, and deploy the chatbot.


Prerequisites

  • These instructions are for a Debian-based Linux virtual machine, like Ubuntu or Debian.

Prepare the operating system

Open a terminal on your virtual machine, update the package list, and install the necessary packages which include Docker:

sudo apt update
sudo apt install nvidia-utils-515 nvidia-driver-515 docker.io git-lfs

Install Docker Compose

Install Docker Compose by running the following commands:

VERSION=$(curl --silent https://api.github.com/repos/docker/compose/releases/latest | grep -Po '"tag_name": "\K.*\d')
DESTINATION=/usr/local/bin/docker-compose

sudo curl -L https://github.com/docker/compose/releases/download/${VERSION}/docker-compose-$(uname -s)-$(uname -m) -o $DESTINATION
sudo chmod +x /usr/local/bin/docker-compose

These commands download and install Docker Compose, a tool used for defining and running multi-container Docker applications. They retrieve the latest version of Docker Compose, place it in a designated location, and make it executable.

Install Nvidia Toolkit for Docker

Configure Nvidia Toolkit for Docker with the following commands:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

These commands are responsible for installing and configuring the Nvidia Toolkit for Docker. They ensure that Docker containers have access to Nvidia GPU resources by setting up the necessary package repositories, installing the toolkit, and configuring the runtime environment.

Download the LLaMA Chatbot web user interface

Execute the following commands to download and install the web UI components necessary for the LLaMA chatbot.

git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-web-ui
mkdir installers
cd installers
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-debian11-11-7-local_11.7.1-515.65.01-1_amd64.deb

These commands download and set up the web user interface (UI) for the LLaMA chatbot. They clone the necessary files and components from a repository, create required directories, and fetch additional dependencies.

For additional information about the text generation web UI, click here.

Download the chatbot model

Navigate to the "models" directory and clone the chatbot model by running the following commands:

cd ../models
git-lfs clone https://huggingface.co/decapoda-research/llama-13b-hf/tree/main

Prepare Docker files

  1. Create a Dockerfile by running the following commands:
nano Dockerfile
  1. In the text editor that opens, paste the following content into the Dockerfile:
FROM python:3.10.6-slim-bullseye
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y git software-properties-common gnupg
COPY . /app
WORKDIR /app
RUN dpkg -i /app/installers/cuda-repo-debian11-11-7-local_11.7.1-515.65.01-1_amd64.deb
RUN cp /var/cuda-repo-debian11-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
RUN add-apt-repository contrib
RUN apt-get update
RUN apt-get -y install cuda \
&& apt -y remove nvidia-* \
&& rm -rf /var/cuda-repo-debian11-11-6-local
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/google_translate/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /app/extensions/silero_tts/requirements.txt
CMD python server.py --auto-devices --cai-chat --load-in-8bit --bf16 --listen --listen-port=8888

This Dockerfile initializes an image based on Python 3.10.6 and Debian Bullseye, setting up essential packages, including Git and CUDA support.

  1. Save and exit the text editor (press Ctrl + O, then Enter, and Ctrl + X).

Create a Docker Compose file

  1. Create a Docker Compose file by running the following command:
nano docker-compose.yml
  1. In the text editor, paste the following content into the docker-compose.yml file:
version: "3.3"
services:
text-generation-webui:
build: .
ports:
- "8889:8888"
stdin_open: true
tty: true
volumes:
- .:/app
command: python server.py --auto-devices --cai-chat --model "llama-7b-hf" --listen --listen-port=8888 --gpu-memory 15 15 15 15
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

This Docker Compose file sets up the environment for the LLaMA chatbot. It defines the necessary configurations, specifically GPU memory allocation, with "15 15 15 15" indicating 15GB of memory for each of the 4 available GPUs.

  1. Save and exit the text editor (press Ctrl + O, then Enter, and Ctrl + X).

Update the requirements file

If you have a "requirements.txt" file, open it and append the following lines to it:

--extra-index-url https://download.pytorch.org/whl/cu117
torchaudio
torch==1.13.1+cu117
torchvision==0.14.1+cu117

These lines append package versions, including PyTorch and related libraries, ensuring compatibility and enabling the LLaMA chatbot to function as expected.

Run the chatbot

Execute the following command to start the chatbot within the Docker container:

docker-compose up

Chatbot user interface

Chatbot UI image 1

Chatbot UI image 2

To learn about the various features of the text generation web UI, click here.


Back to top