Turn audio & video into text — 98+ languages
Upload or record audio and video, choose the processing model, and turn voice into multilingual text within seconds.
Accurate transcription
High-quality speech-to-text in 98+ languages and translation into 98+ languages.
Real-time capture
Record microphone, headphones, or system audio with live progress and minute tracking.
Lectures for students
Turn lecture recordings into ready-made notes. Perfect for exam prep.
Clip editing
Create short clips from long videos right in our editor.
Community and support
Collaborative chat, proposals board, and moderation tools that keep the platform safe.
Integrate into your app
Connect transcription and media processing to your product via REST API and keys in your account — built for developers.
Polyglot Voice Geography
The more popular the language, the higher the base confidence of the model. Rare languages are supported, but processing may take a bit longer.
Chinese (Simplified)
zh
English
en
French
fr
German
de
Japanese
ja
Portuguese
pt
Russian
ru
Spanish
es
Afrikaans
af
Albanian
sq
Amharic
am
Arabic
ar
Armenian
hy
Assamese
as
Azerbaijani
az
Bashkir
ba
Basque
eu
Belarusian
be
Bangla
bn
Bosnian
bs
Breton
br
Bulgarian
bg
Burmese
my
Catalan
ca
Technical requirements
Audio/video upload and audio recording
- Supported audio and video formats: 3g2, 3gp, aac, aif, aiff, avi, flac, flv, m2ts, m4a, m4v, mkv, mov, mp3, mp4, mpeg, mpg, mpga, mts, oga, ogg, ogv, opus, ts, wav, webm, wmv, wma. You can upload a file or paste a video link (YouTube and other sources supported by the downloader). Main limit: up to 5 minutes of processed audio. For best stability we recommend MP3, WAV, M4A, MP4, or WEBM.
- Record audio from your microphone, headphones, or system sound. Maximum 5 minutes on the free plan.
- Select the same languages (e.g., Russian → Russian, English → English, etc.) to get a text transcript as quickly as possible, skipping the translation step. Perfect for students recording lectures in any supported language.
- Fast model supports only English for translation and live recording. Use mid/accurate models for other languages.
- If the recording language is set to auto, recognition may take longer than with an explicit language; for more predictable translation, pick the language manually.
If audio is longer than 5 minutes, split it into parts or use paid recording minutes to avoid trimming.
Partner ads
Audio, video and speech workflows in one place
Polyglot Voice is built for people who need more than a simple transcript. The platform combines audio to text, video transcription, translation, subtitle workflows, clip creation, dubbing preparation and media utilities in one flow. This makes it useful for creators, students, researchers, marketers and multilingual teams that work with spoken content every day.
Instead of moving between separate tools for transcription, subtitles, translation and media conversion, you can upload once, choose the right workflow and export the result for publishing, studying, archiving or repurposing.
How it works
- 1. Upload audio or video, or record speech in real time.
- 2. Choose the language workflow: transcript, translation, subtitles or dubbing prep.
- 3. Export the result as text, subtitle-friendly output or a reusable media asset.
Best for
- Students turning lectures into notes
- Creators converting video into subtitles and clips
- Teams translating interviews, meetings and training media
- Developers using the API for automated pipelines
Supported workflows
Audio to text, video to text, speech to text, subtitle generation, translation, clip extraction, format conversion and audio extraction from video.
Why users choose it
It combines multilingual coverage, export flexibility and a creator-friendly workflow instead of forcing separate tools for each step.
Popular guides
Frequently asked questions
Can I convert audio and video to text online?
Yes. Polyglot Voice is designed for audio-to-text and video-to-text workflows with support for multilingual transcription and export-friendly results.
Can I translate speech into another language?
Yes. You can use transcript and translation workflows together to turn spoken content into translated text for subtitles, notes and publishing.
Is it useful for lectures, interviews and podcasts?
Yes. The workflow is especially useful for lectures, interviews, meetings, podcasts and creator content that needs searchability, subtitles or repurposing.
Do you support many languages?
The platform is built around broad language support, including the ability to work with many spoken input languages and multilingual output workflows.