Яндекс.Метрика
Polyglot Voice

Turn audio & video into text — 98+ languages

Upload or record audio and video, choose the processing model, and turn voice into multilingual text within seconds.

Accurate transcription

High-quality speech-to-text in 98+ languages and translation into 98+ languages.

Real-time capture

Record microphone, headphones, or system audio with live progress and minute tracking.

Lectures for students

Turn lecture recordings into ready-made notes. Perfect for exam prep.

Clip editing

Create short clips from long videos right in our editor.

Community and support

Collaborative chat, proposals board, and moderation tools that keep the platform safe.

Integrate into your app

Connect transcription and media processing to your product via REST API and keys in your account — built for developers.

Polyglot Voice Geography

The more popular the language, the higher the base confidence of the model. Rare languages are supported, but processing may take a bit longer.

Chinese (Simplified)

zh

English

en

French

fr

German

de

Japanese

ja

Portuguese

pt

Russian

ru

Spanish

es

Afrikaans

af

Albanian

sq

Amharic

am

Arabic

ar

Armenian

hy

Assamese

as

Azerbaijani

az

Bashkir

ba

Basque

eu

Belarusian

be

Bangla

bn

Bosnian

bs

Breton

br

Bulgarian

bg

Burmese

my

Catalan

ca

Technical requirements

Audio/video upload and audio recording

  • Supported audio and video formats: 3g2, 3gp, aac, aif, aiff, avi, flac, flv, m2ts, m4a, m4v, mkv, mov, mp3, mp4, mpeg, mpg, mpga, mts, oga, ogg, ogv, opus, ts, wav, webm, wmv, wma. You can upload a file or paste a video link (YouTube and other sources supported by the downloader). Main limit: up to 5 minutes of processed audio. For best stability we recommend MP3, WAV, M4A, MP4, or WEBM.
  • Record audio from your microphone, headphones, or system sound. Maximum 5 minutes on the free plan.
  • Select the same languages (e.g., Russian → Russian, English → English, etc.) to get a text transcript as quickly as possible, skipping the translation step. Perfect for students recording lectures in any supported language.
  • Fast model supports only English for translation and live recording. Use mid/accurate models for other languages.
  • If the recording language is set to auto, recognition may take longer than with an explicit language; for more predictable translation, pick the language manually.

If audio is longer than 5 minutes, split it into parts or use paid recording minutes to avoid trimming.

Partner ads

Audio, video and speech workflows in one place

Polyglot Voice is built for people who need more than a simple transcript. The platform combines audio to text, video transcription, translation, subtitle workflows, clip creation, dubbing preparation and media utilities in one flow. This makes it useful for creators, students, researchers, marketers and multilingual teams that work with spoken content every day.

Instead of moving between separate tools for transcription, subtitles, translation and media conversion, you can upload once, choose the right workflow and export the result for publishing, studying, archiving or repurposing.

How it works

  1. 1. Upload audio or video, or record speech in real time.
  2. 2. Choose the language workflow: transcript, translation, subtitles or dubbing prep.
  3. 3. Export the result as text, subtitle-friendly output or a reusable media asset.

Best for

  • Students turning lectures into notes
  • Creators converting video into subtitles and clips
  • Teams translating interviews, meetings and training media
  • Developers using the API for automated pipelines

Supported workflows

Audio to text, video to text, speech to text, subtitle generation, translation, clip extraction, format conversion and audio extraction from video.

Why users choose it

It combines multilingual coverage, export flexibility and a creator-friendly workflow instead of forcing separate tools for each step.

Frequently asked questions

Can I convert audio and video to text online?

Yes. Polyglot Voice is designed for audio-to-text and video-to-text workflows with support for multilingual transcription and export-friendly results.

Can I translate speech into another language?

Yes. You can use transcript and translation workflows together to turn spoken content into translated text for subtitles, notes and publishing.

Is it useful for lectures, interviews and podcasts?

Yes. The workflow is especially useful for lectures, interviews, meetings, podcasts and creator content that needs searchability, subtitles or repurposing.

Do you support many languages?

The platform is built around broad language support, including the ability to work with many spoken input languages and multilingual output workflows.