A question about splitting LLM work across personal devices

The writer says they have several devices, including a MacBook, PC, and iPhone, and want to distribute LLM inference across them. They mention two routers, vllmAthena and llmproxy, and ask which one might fit their zero trust infrastructure. They also ask for an open source option like Tailscale that does not require sending all keys to the vendor.

Key points

  • The writer wants to split LLM inference across a MacBook, PC, and iPhone.
  • They are comparing vllmAthena and llmproxy as possible routers.
  • They describe vllmAthena as using small models to choose routing paths.
  • They want the setup to fit a zero trust infrastructure.
  • They also want an open source Tailscale-like option that does not require trusting a vendor with all keys.

Quick term guide

LLM inference
The process where an already trained AI model reads input and generates an answer.
zero trust infrastructure
A security setup that checks access every time instead of assuming anything is trusted.
infrastructure
The technical systems that keep a website or app running.
open source
Software whose code is available for people to view and often modify.
Tailscale
A tool that lets you securely access your home devices over the internet, as if they were on the same local network
Architecture
The overall structure and organization of a software project.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
model routing
The practice of sending tasks to different LLMs based on their complexity and cost.
Read original