You can read our post breaking down the approach here.
- A VM or Machine running a VNC server (e.g., TigerVNC). VNC server details (host, port, password) are configured when connecting via the UI or when running workflows.
OPENAI_API_KEY
environment variable set with your OpenAI API key.HF_TOKEN
environment variable set with a Hugging Face token. This is used for the grounding model inference.- (Optional)
OSATLAS_ENDPOINT_OVERRIDE
environment variable to specify a custom OS-ATLAS endpoint. If not set, it defaults to the public Hugging Face Spacemaxiw/OS-ATLAS
or a pre-configured local URL if you've modified the source.- The OS-ATLAS model can be run locally using an NVIDIA GPU with sufficient VRAM (see original Hugging Face Space for details: https://huggingface.co/spaces/maxiw/OS-ATLAS/tree/main).
- For Apple Silicon users, a local inference option is available in the
os_atlas_run_local
directory (see below).
You can use the provided Dockerfile
to build and run a Debian-based Linux desktop environment with XFCE and TigerVNC. This is useful for testing or if you don't have a separate VNC server.
-
Build the Docker image:
podman build -t linux-vnc-desktop .
(You can replace
podman
withdocker
if you prefer) -
Run the Docker container:
podman run -d --rm --name linux-vnc -p 5901:5901 -e VNC_PASSWORD=123456 linux-vnc-desktop
- This command starts the container in detached mode (
-d
), removes the container when it exits (--rm
), names the containerlinux-vnc
, maps port5901
on your host to port5901
in the container, and sets the VNC password to123456
. - You can then connect to this VNC server at
127.0.0.1:5901
with the password123456
.
- This command starts the container in detached mode (
- Start the server:
uv run planar dev
- VNC Viewer: Open http://localhost:8000 in your browser.
- Enter your VNC server details (e.g.,
127.0.0.1:5901
) and password (if any, defaults are often used if not specified in UI). - Click "Start Stream" to connect and view the VNC session.
- Enter your VNC server details (e.g.,
- Local OS-ATLAS Inference (for Apple Silicon):
- Navigate to the
os_atlas_run_local
directory. - Ensure you have a Mac with Apple Silicon and sufficient VRAM (approx. 20GB).
- Run the local Gradio app:
uv run app.py
- This will start a local server listening on all addresses(http://0.0.0.0:7080) that can be used as the
OSATLAS_ENDPOINT_OVERRIDE
.
- Navigate to the
- Workflows: Open your Planar development environment (e.g., https://staging.app.coplane.dev/local-development/dev-planar-app/workflows/) to run workflows like
perform_computer_task
orhighlight_ui_element
.- These workflows will prompt for VNC server details (host:port and password) when executed.
- If using the local OS-ATLAS server, ensure
OSATLAS_ENDPOINT_OVERRIDE
is set accordingly (e.g.,http://127.0.0.1:7080
) in your environment where the Planar app is running, or modifyplanar_computer_use/grounding.py
to use this endpoint.