Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for building more efficient backend image requires less disk space and time #1037

Open
icejean opened this issue Jan 26, 2025 · 2 comments
Assignees

Comments

@icejean
Copy link

icejean commented Jan 26, 2025

Docker compose build will install Pytorch 2.5.1 with CUDA libraries automatically every time, it will take many disk space and many time. but you can't see the Nvidia GPU in container without properly setting up, please read the details at How to Install PyTorch on the GPU with Docker. My solution is to run those LLM & Embedding models with Ollama or vLLM outside backend container, and access them through endpoints. Thus only CPU version of PyTorch is needed for backend container, no need of CUDA libraries any longer.
So I modify backend's Dockerfile as follow, torch 2.3.1+cpu is O.K, and the matching torchvision 0.18.1+cpu and torchaudio 2.3.1+cpu too.
Installing them before packages in requirements.txt is enough.

FROM python:3.10-slim
WORKDIR /code
ENV PORT 8000
EXPOSE 8000
# Install dependencies and clean up in one layer
RUN apt-get update && \
   apt-get install -y --no-install-recommends \
       libmagic1 \
       libgl1-mesa-glx \
       libreoffice \
       cmake \
       poppler-utils \
       tesseract-ocr && \
   apt-get clean && \
   rm -rf /var/lib/apt/lists/*
# Set LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
# Copy requirements file and install Python dependencies
COPY requirements.txt /code/
# --no-cache-dir --upgrade 

# Install PyTorch, torchvision, and torchaudio for CPU only
RUN pip install torch==2.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html \
    torchvision==0.18.1+cpu -f https://download.pytorch.org/whl/torch_stable.html \
    torchaudio==2.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html

RUN pip install -r requirements.txt
# Copy application code
COPY . /code
# Set command
CMD ["gunicorn", "score:app", "--workers", "2","--threads", "2", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
~                                                                           

The result backend image is 4GB only, which is 13GB before.

(base) root@10-60-136-78:~# docker image list
REPOSITORY                   TAG       IMAGE ID       CREATED         SIZE
llm-graph-builder-frontend   latest    2183b1722a12   5 hours ago     55.8MB
llm-graph-builder-backend    latest    31779e605998   6 hours ago     4.07GB

Best regards
Jean

@jexp
Copy link
Contributor

jexp commented Feb 12, 2025

Ideally we could do this via configuration and perhaps a multi-stage docker-file, so that the user doesn't need to deal with all these details.

I think if you use an external embedding model you wouldn't need pytorch at all for the embedding???

The only place where I think it might be needed is the unustructured.io document loaders, but not sure. And those should bring the dependencies in themselves?

Otherwise we can look into supporting external unstrucutred.io usage with API-KEY ... as an option.

@icejean
Copy link
Author

icejean commented Feb 12, 2025

Yes, I use an external embedding model through calling Ollama or other provider's endpoint online, so I don't need the ability of GPU within the container. Many cloud services can provide powerful embedding and LLM models online, so we don't need those expensive GPUs locally for research purpose. And in case we need a GPU locally, we can call it through Ollama, vLLM and so on, so don't need it within the container at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants