How Audio and Video Streams Work
A video file is not a single blob of data. It is a container (MP4, MKV, AVI, MOV) that holds multiple independent streams: a video stream, one or more audio streams, and sometimes subtitles or chapter markers. Each stream is encoded with its own codec independently of the others.
When you "extract audio from video," you are telling the converter to ignore the video stream entirely and keep only the audio stream. The process is called demuxing — separating the multiplexed streams. If the output format differs from the source audio codec, the audio is also transcoded (re-encoded) into the target format.
How Convertio handles it: When you upload a video file and select MP3 as output, the backend automatically detects the audio_extract conversion type. It first verifies your video has an audio track using ffprobe, then extracts it with the -vn flag (strip video, keep audio).
What Audio Codecs Are Inside Video Files?
Different video containers use different audio codecs. Knowing what codec your source video uses helps you choose the right output bitrate:
| Video Format | Common Audio Codec | Typical Bitrate | Source Examples |
|---|---|---|---|
| MP4 | AAC | 128–256 kbps | iPhone recordings, YouTube downloads, screen captures |
| MKV | AAC, AC3, DTS, FLAC, Opus | 128–1,500+ kbps | Blu-ray rips, anime, media libraries |
| AVI | MP3, PCM | 128–320 kbps | Legacy camcorder footage, older downloads |
| MOV | AAC, PCM | 128–256 kbps | iPhone/Mac recordings, Final Cut exports |
| WebM | Opus, Vorbis | 64–160 kbps | Browser recordings, web video |
| WMV | WMA | 128–192 kbps | Windows legacy recordings |
| FLV | AAC, MP3 | 64–128 kbps | Older Flash video files |
Convertio supports audio extraction from 17 video formats: MP4, MKV, AVI, MOV, WebM, WMV, FLV, M4V, 3GP, OGV, TS, MTS, M2TS, MPG, MPEG, VOB, and 3G2. All can output to MP3.
Choosing the Right MP3 Bitrate
The single most important rule when extracting audio: do not exceed the source audio bitrate. Most video files contain audio at 128–256 kbps. Encoding that to 320 kbps MP3 only inflates the file without adding any detail the source did not have.
| Source Audio Bitrate | Recommended MP3 Bitrate | Reasoning |
|---|---|---|
| 64–96 kbps | 96–128 kbps | Low-quality source; higher output wastes space |
| 128 kbps | 128–192 kbps | Match source; slight bump accounts for codec differences |
| 192–256 kbps | 192–256 kbps | Good quality source; match for transparent results |
| Lossless (FLAC, PCM) | VBR V0 or 320 kbps CBR | Lossless source; maximum MP3 quality is justified |
Rule of thumb: If you do not know the source bitrate, use 192 kbps. It covers the vast majority of video audio tracks without wasting space or losing noticeable quality.
Demuxing vs Transcoding: What Happens to Your Audio
There are two fundamentally different things that can happen when extracting audio from video:
| Method | What Happens | Quality Impact | When Used |
|---|---|---|---|
| Demux (stream copy) | Audio stream is copied out of the container unchanged | Zero loss — bit-for-bit identical | When source codec matches desired output (rare for MP3) |
| Transcode (re-encode) | Audio is decoded, then re-encoded into the target codec | Minimal loss at 192+ kbps; one generation of lossy encoding | When converting AAC/Opus/AC3 to MP3 (the common case) |
Since most video files contain AAC audio and you want MP3 output, the audio must be transcoded. This means one generation of lossy-to-lossy conversion. At 192 kbps and above, the quality impact is negligible for virtually all listening scenarios.
Stream copy (lossless extraction) only works when the source audio is already in the target format. For example, some AVI files contain MP3 audio — those can be extracted without re-encoding. But this is uncommon.
Common Use Cases
- Save music from video files: extract audio from concert recordings, music videos, or downloaded video files where you only need the soundtrack.
- Extract podcast audio from video recordings: many podcasters record in video format (Zoom, OBS) and need to extract the audio track for their podcast feed.
- Lecture and presentation audio: extract speech from recorded lectures, webinars, or conference talks for listening on the go.
- Voice memo extraction: iPhone Voice Memos are M4A, but screen recordings and video messages are MP4 — extract audio when you only need the sound.
- Audio for editing: pull the audio track from raw footage to edit it separately in an audio editor, then sync it back later.
How to Extract Audio with Convertio
- Upload your video file using the converter widget above. Supported formats: MP4, MKV, AVI, MOV, WebM, WMV, FLV, and more.
- Select MP3 as output. The converter detects the video input and automatically switches to audio extraction mode.
- Choose your bitrate. Open encoding options to set bitrate. Use 192 kbps as a safe default, or 128 kbps for speech content.
- Convert and download. The audio track is extracted, transcoded to MP3, and ready for download. Files are auto-deleted within 2 hours.
No audio track? If your video file contains no audio stream (e.g., screen recordings with audio disabled, silent GIFs converted to video), Convertio will detect this and display an error rather than producing a silent file.
What NOT to Do
- Do not convert to 320 kbps from a 128 kbps source. You get a file 2.5× larger with zero quality improvement. The extra bits are filled with padding, not missing audio detail.
- Do not extract audio and then re-encode it again. Each lossy encoding cycle degrades quality. Extract once at the right bitrate and keep that file.
- Do not assume higher numbers are always better. A 128 kbps MP3 from a 128 kbps AAC source can sound very good. A 320 kbps MP3 from the same source sounds identical but wastes storage.