How to translate XLIFF files without corrupting XML tags
XLIFF is a standard file format used to translate app and website text into multiple languages. Accidentally modifying the XML tags inside these files during translation breaks the software. This post covers how to translate safely while keeping those tags intact.
XLIFF (XML Localization Interchange File Format) is the standard file type that developers use to hand off translatable text — things like button labels and menu items — to translators. Inside the file, the actual text sits alongside XML tags like '<trans-unit>' and '<source>'. If a translation tool or an AI model rewrites or deletes those tags, the file becomes invalid and the app can no longer read it.
The post shares practical strategies to avoid this problem: replacing tags with placeholders before translating, using dedicated CAT tools that automatically lock tags, or giving an LLM an explicit instruction to never touch anything inside angle brackets. For anyone building an automated translation pipeline with AI, this is a common pitfall worth knowing about upfront.
Key points
- XML tags inside XLIFF files must never be changed during translation or the software breaks
- Replacing tags with placeholder symbols before translating is a reliable way to protect them
- When using an LLM to translate, explicitly tell it to leave anything inside angle brackets alone
- Dedicated CAT tools protect tags automatically, making them safer for XLIFF work
Quick term guide
- XLIFF
- A standard file format used to share translatable text between developers and translators.
- XML tags
- Short code snippets wrapped in angle brackets like '<tag>' that define the structure of a file.
- developers
- Developers are people who build software, apps, or websites.
- AI model
- A program that can understand prompts and produce text, code, or answers.
- AI Mode
- A Google Search feature that uses AI to answer longer, more detailed questions.
- share
- A server folder made available to apps or other devices.
- placeholder
- A temporary symbol or token used to stand in for something else so it doesn't get changed by mistake.
- CAT tool
- Specialist translation software that helps translators work faster and protects file formatting like tags.