Live translation for RTA Dubai powered by Eleven Labs
Serving Dubai's multilingual population required innovation. Discover how our AI-powered booth translation system transformed RTA customer service delivery.
Breaking Down Language Barriers in Public Service
In today's fast-paced world, effective communication transcends language barriers. We at Jay Softworks are thrilled to announce our strategic collaboration with The Road and Transport Authority (RTA) of Dubai and Eleven Labs—leveraging AI models such as GPT, OSS Scribe, and cutting-edge voice models to introduce live translation into various applications.
The Challenge: Serving a Truly Global City
Dubai stands as one of the most linguistically diverse cities in the world, with residents and visitors speaking dozens of languages daily. For the RTA, which serves millions of customers annually at service booths across the emirate, this diversity presented both an opportunity and a challenge: how to deliver exceptional customer service when agents and customers don't share a common language.
Traditional solutions—hiring multilingual staff or relying on translation apps—proved inadequate. Staff multilingualism, while valuable, couldn't cover all language combinations. Manual translation apps interrupted the natural flow of conversation and created friction in the customer experience.
The RTA needed something better: a solution that was invisible, instant, and intelligent.
Our Solution: The AI-Powered Booth Translation System
Working closely with RTA and powered by Eleven Labs' state-of-the-art voice AI technology, we developed a custom real-time translation system designed specifically for the unique demands of service booth environments.

How It Works
The system operates on a brilliantly simple principle: hands-free, bidirectional translation that feels natural to both customer and agent.
For the Customer:
- Approach the booth and speak naturally in any of 32 supported languages
- The system automatically detects the language—no manual selection needed
- Hear the agent's response in your own language through the external speaker
For the Agent:
- Choose your preferred working language (English or Arabic)
- Hold a button to capture customer speech
- Receive instant translation through your internal speaker
- Respond naturally, knowing your words will reach the customer in their language
The Technology Stack
Our system combines multiple cutting-edge AI technologies into a seamless experience:
Speech Recognition & Transcription: Real-time audio processing powered by advanced speech-to-text models converts spoken words into text with remarkable accuracy, even in noisy booth environments.
Language Detection: Our AI automatically identifies which of 32 languages the customer is speaking—from Arabic to Vietnamese, Tamil to Turkish—without requiring any manual input.
Contextual Translation: This is where the magic happens. Rather than simple word-for-word translation, the system maintains a conversation buffer of the last three message exchanges. This contextual memory allows for more accurate, nuanced translations that understand idioms, technical terms, and conversation flow.
Natural Voice Synthesis: Powered by Eleven Labs' industry-leading text-to-speech technology, translations are delivered in natural-sounding voices that preserve the tone and intent of the original message.
All of this happens in under two seconds—fast enough that conversations flow naturally.
Why This Matters: Beyond Translation
This isn't just about converting words from one language to another. It's about fundamentally reimagining public service delivery in a multilingual world.
Dignity in Service
Every customer deserves to be understood in their own language. Our system ensures that language is never a barrier to accessing essential government services. Whether you speak Hindi, Romanian, or Tagalog, you can interact with RTA services with the same ease as a native Arabic or English speaker.
Efficiency at Scale
With 32 languages supported out of the box (expandable to 74 on demand), a single service booth can effectively serve customers who would previously have required multiple specialized agents or lengthy waiting times for translation assistance.
Focus on What Matters
By handling translation automatically, agents can focus on what they do best: solving problems, providing guidance, and delivering excellent service. The cognitive load of language barriers is eliminated, allowing for more empathetic, effective customer interactions.
The Engineering Behind Simplicity
Creating something this simple to use required solving complex engineering challenges.
Directional Audio Design
The system uses a carefully designed stereo speaker configuration. The left speaker faces the customer (outside the booth), delivering agent translations. The right speaker faces the agent (inside the booth), delivering customer translations. This spatial separation ensures clarity and prevents audio confusion.
Contextual Memory Management
The AI doesn't just translate individual sentences in isolation. It maintains conversation context, using previous exchanges to inform current translations. This is crucial for understanding pronouns, references, and technical terminology that might otherwise be ambiguous.
When a conversation ends, the system automatically clears its memory after eight minutes of inactivity. Agents can also manually reset the context with a simple double-click, ensuring each new customer starts with a fresh conversation state.
Robust Connectivity
Operating in a government service environment demands reliability. The system supports three connectivity options—WiFi, Ethernet, and cellular SIM card—with automatic failover to ensure uninterrupted service. A secure WireGuard VPN enables remote support and updates without compromising security.
Zero-UI Philosophy
Perhaps the most distinctive aspect of our design is what's not there: screens, menus, and complex interfaces. The entire system operates through just two buttons and audio feedback. This zero-UI approach eliminates training time, reduces errors, and keeps interactions focused on the human conversation rather than the technology.
Real-World Impact
The RTA AI-Powered Booth Translation System is currently in proof of concept deployment, with early testing showing promising results:
- Universal Access: Initial trials demonstrate that customers can communicate effectively in any of the 32 supported languages, eliminating the need for language-specific service windows
- Streamlined Service: Early observations suggest potential for improved queue efficiency by removing language-matching requirements from the service process
- Agent Confidence: Preliminary feedback from staff indicates reduced anxiety when serving customers in unfamiliar languages
- Natural Interactions: The system's contextual AI and sub-2-second response times enable conversations that flow naturally, pointing toward improved problem resolution potential
As the POC progresses, we're gathering comprehensive data on system performance, user satisfaction, and operational impact to inform full-scale deployment across RTA facilities.
Security and Privacy First
In developing a system that processes customer conversations, we prioritized data security and privacy:
- End-to-End Encryption: All network communications are fully encrypted
- Zero Cloud Logging: Customer conversations are never permanently stored on remote servers
- Local Processing Priority: Where possible, processing happens on-device
- Automatic Data Purge: All conversation data is cleared at session end
- Secure Training Data: Any data used for model improvement is stored only on local devices, never transmitted to cloud storage
Looking Forward
This deployment represents just the beginning. The underlying technology—combining real-time speech recognition, intelligent translation, and natural voice synthesis—has applications far beyond service booths.
We envision a future where language barriers disappear across healthcare, education, emergency services, and countless other domains. Where technology doesn't replace human connection but instead enables it across linguistic divides.
Our collaboration with RTA and Eleven Labs demonstrates what's possible when government agencies, technology providers, and AI innovators work together with a shared vision: making the world more accessible, one conversation at a time.
About the Partnership
Jay Softworks brings expertise in custom AI implementation and hardware integration, designing systems that solve real-world problems with elegant simplicity.
The Road and Transport Authority (RTA) of Dubai is committed to providing world-class service to one of the world's most diverse populations, continuously innovating to meet the needs of all residents and visitors.
Eleven Labs provides the voice AI technology that makes natural, multilingual communication possible, with text-to-speech and speech-to-text models that set the industry standard for quality and naturalness.
Together, we're proving that the future of public service is multilingual, accessible, and powered by AI that serves humanity.
For more information about the RTA AI-Powered Booth Translation System or to explore how similar technology could transform your organization, contact us.