A low-cost RAG demo answers soccer questions with sources

A free RAG demo answers questions using international soccer data. It covers full match details from the 2022 World Cup, Euro 2024, and Copa America 2024, including shots, expected goals, and scorers. The 2026 World Cup schedule is already loaded, and results are added as matches are played.

Each answer points back to the match record it used, so the evidence can be checked directly. The setup is intentionally simple. The data is split into small chunks, turned into embeddings, stored as vectors in SQLite with the sqlite-vec extension, retrieved by top matches for a question, and then passed to an LLM.

It does not need a separate vector database service or a heavy framework. It runs on free open data and can run fully local with Ollama.

Key points

  • The demo uses free open soccer data to answer questions.
  • It covers 2022 World Cup, Euro 2024, and Copa America 2024 match details.
  • Every answer links back to the match record it used.
  • SQLite and sqlite-vec replace a separate vector database service.
  • Ollama can run the system locally, which can lower operating cost.

Quick term guide

expected goals
A soccer measure that estimates how likely a shot is to become a goal.
embeddings
A way of converting text into numbers so that similar meanings can be found and compared mathematically.
embedding
A way to turn text meaning into numbers so similar text can be found.
sqlite-vec
An extension that lets SQLite search vector data such as embeddings.
extension
A small add-on installed in a browser to add new features.
vector database
A special type of storage that saves text as numbers so similar meanings can be found quickly, commonly used for AI memory
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
infrastructure
The technical systems that keep a website or app running.
Read original