A question about tools that let local AI control a computer

The poster says local vision language models may be smart enough to be given control of the cursor inside a secure sandbox. They ask what computer-control harnesses are available for this. The post is a question in r/LocalLLaMA.

Key points

Quick term guide

vision language models
AI models that can understand both images or screens and text.
secure sandbox
A restricted space where software can run with less risk to the rest of the system.
computer-control harnesses
Tools that connect an AI model to screen viewing, clicking, typing, and other computer actions.
r/LocalLLaMA
A Reddit community focused on running AI language models on personal hardware.
LocalLLaMA
A Reddit community about AI models that people can often run on their own computers.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
production-ready
Stable enough to be used by real users in a live service.
production
The live version of a service that real users use.
Read original