You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Docker compose build will install Pytorch 2.5.1 with CUDA libraries automatically every time, it will take many disk space and many time. but you can't see the Nvidia GPU in container without properly setting up, please read the details at How to Install PyTorch on the GPU with Docker. My solution is to run those LLM & Embedding models with Ollama or vLLM outside backend container, and access them through endpoints. Thus only CPU version of PyTorch is needed for backend container, no need of CUDA libraries any longer.
So I modify backend's Dockerfile as follow, torch 2.3.1+cpu is O.K, and the matching torchvision 0.18.1+cpu and torchaudio 2.3.1+cpu too.
Installing them before packages in requirements.txt is enough.
FROM python:3.10-slim
WORKDIR /code
ENV PORT 8000
EXPOSE 8000
# Install dependencies and clean up in one layer
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libmagic1 \
libgl1-mesa-glx \
libreoffice \
cmake \
poppler-utils \
tesseract-ocr && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Set LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
# Copy requirements file and install Python dependencies
COPY requirements.txt /code/
# --no-cache-dir --upgrade
# Install PyTorch, torchvision, and torchaudio for CPU only
RUN pip install torch==2.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html \
torchvision==0.18.1+cpu -f https://download.pytorch.org/whl/torch_stable.html \
torchaudio==2.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install -r requirements.txt
# Copy application code
COPY . /code
# Set command
CMD ["gunicorn", "score:app", "--workers", "2","--threads", "2", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
~
The result backend image is 4GB only, which is 13GB before.
(base) root@10-60-136-78:~# docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
llm-graph-builder-frontend latest 2183b1722a12 5 hours ago 55.8MB
llm-graph-builder-backend latest 31779e605998 6 hours ago 4.07GB
Best regards
Jean
The text was updated successfully, but these errors were encountered:
Ideally we could do this via configuration and perhaps a multi-stage docker-file, so that the user doesn't need to deal with all these details.
I think if you use an external embedding model you wouldn't need pytorch at all for the embedding???
The only place where I think it might be needed is the unustructured.io document loaders, but not sure. And those should bring the dependencies in themselves?
Otherwise we can look into supporting external unstrucutred.io usage with API-KEY ... as an option.
Yes, I use an external embedding model through calling Ollama or other provider's endpoint online, so I don't need the ability of GPU within the container. Many cloud services can provide powerful embedding and LLM models online, so we don't need those expensive GPUs locally for research purpose. And in case we need a GPU locally, we can call it through Ollama, vLLM and so on, so don't need it within the container at all.
Docker compose build will install Pytorch 2.5.1 with CUDA libraries automatically every time, it will take many disk space and many time. but you can't see the Nvidia GPU in container without properly setting up, please read the details at How to Install PyTorch on the GPU with Docker. My solution is to run those LLM & Embedding models with Ollama or vLLM outside backend container, and access them through endpoints. Thus only CPU version of PyTorch is needed for backend container, no need of CUDA libraries any longer.
So I modify backend's Dockerfile as follow, torch 2.3.1+cpu is O.K, and the matching torchvision 0.18.1+cpu and torchaudio 2.3.1+cpu too.
Installing them before packages in requirements.txt is enough.
The result backend image is 4GB only, which is 13GB before.
Best regards
Jean
The text was updated successfully, but these errors were encountered: