☠️ Kill the Tube

🎬 What we mean by “the tube”

A thriller of haunting cup reflections, phantoms of phase distortion, and why you don’t hear them — but they’re there.

In Russian studio slang, we call this truba — literally “the pipe” or “the tube”. The English equivalent is what audio engineers call boxiness or cup coloration — that pipe-like cavity coloration that bleeds into everything you hear. We’ll call it “the tube” throughout this article — a parasitic time-domain distortion caused by reflections inside the headphone cup.

There’s no formal definition for it and no mention in the technical literature. Anyone whose work involves sound knows what we mean: parasitic phase-and-frequency distortion inside the headphone cup, where a clean signal at the input ends up sounding like it was pushed through a piece of tin pipe.

The paradox of the tube is that it’s invisible on the usual graphs. The frequency response can look perfectly flat, and the headphones still “tube”. You put them on and you immediately hear that something is wrong with these compared to the previous pair. But you can’t put your finger on what exactly is different. On a frequency response graph both pairs can look identical. Technically it’s the same instrument. To the ear — different. That’s the tube. It’s felt, not described. Over time the brain learns to hear it as background noise that consumes processing resources — hours of work in such headphones are more tiring than they should be.

This article is about the physics of the tube. Where it comes from, why it doesn’t show up on the usual graphs, why you still hear it anyway, and what we did in M1 to zero it out.

⚡ Where it comes from

The planar magnetic membrane is in constant motion. Excursion is on the order of 1–2 millimeters at the lowest frequencies; at high frequencies the amplitude is already microscopic. Oscillations run across the entire audio range — from tens of cycles per second in the bass to tens of thousands at the upper highs. On every cycle the membrane radiates sound in both directions at once — half the energy goes toward the ear, the other half goes back into the cup.

And that’s where it gets interesting.

🎾 Analogy — ball and wall

Throw a ball into a pillow — it stays in the pillow. Throw it into a wall — it bounces back. A sound wave is the same ball. If there’s an absorbing material behind the membrane, the wave dies out. If there’s a hard wall, it bounces back.

Behind the membrane in a headphone cup there’s a back wall. And side walls. And a complex geometry of mounting hardware. Every one of these surfaces is a potential “wall” the sound bounces off of. The sound wave from the back side of the membrane hits the cup walls and returns through multiple paths. What ends up at your ear is not one signal but the original signal plus its echo, arriving fractions of a millisecond later.

💧 Analogy — ripples in water

Throw two stones into water near each other, one slightly after the other. Two sets of ripples meet on the surface. Where one wave’s crest meets another’s crest — the height doubles. Where a crest meets a trough — they cancel each other out, and at that point the water doesn’t move at all. This phenomenon is called interference.

With sound — the same thing. The original wave and its reflection inside the cup meet. At some frequencies they add up, at others they cancel. What comes out of the cup into the ear is no longer the original signal but a distorted copy with peaks and dips at frequencies that weren’t in the recording. Plotted as a graph, this looks like a jagged comb pattern — the comb filter.

Comb filter

But the distorted frequency picture is only half the problem.

🥁 Analogy — a drummer behind the beat

In a band, the drummer sets the timing. If he hits an eighth note late on every beat — the whole groove collapses. The notes are the same, the sound is the same, but something doesn’t line up. Group delay works the same way — it’s the time offset with which different frequencies reach the ear. Ideally, all frequencies should arrive simultaneously. In reality — they don’t. And when one frequency lags another by a millisecond, the brain hears it as “smearing”, “lack of focus”, “muddiness”. The attack of a hit stops being a point in time — it spreads out.

🐚 Analogy — a seashell

Put a shell to your ear and you hear “the sound of the ocean”, which isn’t actually any ocean — it’s resonance of the air inside the shell’s cavity. The bigger the shell, the lower the hum. The smaller the shell, the higher. Same principle as resonance inside a headphone cup: a sound wave enters a cavity and starts oscillating at a frequency determined by the geometry. The simplest formula for a cavity open at one end (and a headphone cup is exactly that — a cavity open toward the ear):

f = c / (4 · L)

Where c is the speed of sound in air (343 m/s), L is the depth of the cavity in meters.

What the formula gives in practice

Cavity	Depth	Resonance	Frequency zone
Seashell	5 cm	~1.7 kHz	Lower mids
Headphone cup	2 cm	~4.3 kHz	Upper mids / vocal formant zone
Headphone cup	1 cm	~8.5 kHz	Lower highs

This is a heavily simplified model — the formula describes an ideal rigid resonator with no losses. In a real cup it’s more complex: the membrane itself vibrates and absorbs part of the energy, the earpad acts as a soft boundary, wall materials partially damp reflections, the shape is far from regular geometry. But the idea is correct: every cup has its own resonant frequency at which sound rings louder than it should. And if that frequency isn’t damped by design, it will color every sound in the recording with its own hum.

Interference and its consequence — the comb filter — group delay, resonance: all of these together are the tube. Not a single thing, but a whole complex of phenomena. And all of them happen in the time domain, not at isolated frequencies.

📊 Why you don’t see it on the frequency response

Frequency response is a still photo of WHICH frequencies the headphones reproduce. The impulse is HOW those frequencies arrive and decay over time.

You can’t tell from a single frame of a football game whether anyone scored or whether the kick missed. You can’t tell from the frequency response what the membrane is doing over time.

To see the tube, you need graphs that show time-domain behavior.

🛀 Analogy — clapping in a bathroom

Clap your hands in a carpeted room — you get a short “clap” and silence. Clap in a tiled bathroom — “clap” and reverberation, echo, a hum that takes about a second to die. The loudness of the clap is the same. But the tail is completely different.

The impulse response (IR) is exactly that kind of graph. Feed an ideal short impulse into the headphones — and look at how the membrane reproduced it and how it died down afterward. In good headphones, the membrane returns to rest almost immediately after the main peak. In bad ones, it keeps twitching for several more milliseconds — and those twitches are the tube, visible in plain sight.

M1 impulse response, left channel, measured in REW.

Impulse response is a characteristic of the entire signal chain (DAC → amplifier → headphones → microphone), not of the membrane alone. For an IR to show the properties of the headphones specifically, the measurement chain must use a quality amplifier with low output impedance and high damping factor. Otherwise the graph primarily characterizes the amplifier: a low-impedance headphone membrane is poorly damped under high output impedance and keeps oscillating by inertia. The measurements presented above were captured in conditions where the contribution of the other chain elements to the IR is negligible. For context: if you measured headphones from well-known brands in the same quality category — but at 10× the price — through the same signal chain, their impulse response would often be slower.

This is a real M1 impulse response, left channel, measured in REW. The main peak is sharp — that’s the attack. Right after it, a negative bounce down to −60% — natural return motion of the membrane. After 100 µs, a residual positive bounce of about +20%. Then a series of oscillations with amplitude under 10%. By one millisecond the level is already below 5% of the main peak (around −26 dB). After that — almost a flat line.

In absolute numbers, this is one of the fastest impulse decays currently available in monitoring headphones. Most well-known models in this category decay substantially slower.

If you measure the period of those residual oscillations — it’s about 100–150 µs. That corresponds to a frequency of 7–10 kHz. And that’s exactly the frequency predicted by the formula for our cup geometry. The physics didn’t go anywhere — the 7 kHz resonance physically arises. But thanks to damping, its energy in the impulse response lives for less than a millisecond. That’s what “zeroed-out tube” means — not the absence of reflections, but the absence of their accumulation in time.

🎹 Analogy — an out-of-tune piano

There’s another graph called waterfall (or CSD, or burst decay). It’s like a piano on which you press every key in succession and watch how each note decays. On an ideal piano, every note decays equally smoothly with no foreign overtones. On an out-of-tune piano, one note rings longer than the others, another adds parasitic ringing, another dies too quickly. Same with headphones: waterfall shows how energy at each frequency dies over time. Long tails at some frequencies and fast decay at others — that’s the “out-of-tune piano”, that’s the tube.

M1 waterfall — measurements by Boitsov

On the M1 waterfall you can see the main energy (the red-orange zone) decays virtually evenly across the entire range within 5–6 cycles. There are no “hung notes” in problem zones. Slight oscillation around 4–5 kHz is the same first cup mode we saw in the impulse, but pushed 20–30 dB below the main energy.

Looking at the frequency response alone, we wouldn’t have seen this at all. On an FR graph, M1 looks flat. The time-domain graphs show what kind of work made that “flat” possible.

M1 frequency response

🧠 Psychoacoustics: how the brain hears time

You might ask: if the tube sits in the range of tenths of a dB or a few milliseconds, how is it possible to hear it at all?

It is possible. The mechanism is just not what you’re used to thinking it is.

👁️ Analogy — binocular vision

We have two eyes. Each sees its own slightly different picture. The brain compares them and from the difference extracts information about distance, volume, and space. With one eye, we’d see the world flat.

Hearing works the same way. We have two ears. A sound source that isn’t directly in front of us reaches one ear slightly before the other. That difference is sometimes a fraction of a millisecond. The brain compares the arrival of sound at the left and right ear and from that difference builds the auditory space: where the source is, how far away it is, whether it’s moving.

When headphones add the tube — they break those time relationships. Not loudness, not frequency — time itself. And the brain feels it: “something’s wrong with the space”. The stage “compresses”, instruments “merge”, attacks “smear”. You can’t explain why — because you can only explain it if you know what to look for.

And one more thing. The brain constantly compares what it hears with what it expects to hear. It has a huge accumulated database of how real impacts, voices, instruments, and rooms sound. When something in a recording doesn’t match expectations — the brain spends resources to figure it out. Not consciously. At the level of background processing. And that’s exactly why bad headphones make you tired — the brain is working at its limit, trying to make sense of a stream of data in which something doesn’t add up. With good headphones you don’t get tired — because there’s nothing to figure out, everything is in its place.

🎶 Hearing is training

“I’m not a sound engineer, I won’t hear the difference.”

This is the most common thing we hear from people who learn about the tube for the first time. And it’s not true.

🔧 Analogy — a piano tuner

A tuner can tell what’s wrong with an instrument within seconds. Not because he has “golden ears”. But because over a lifetime he has tuned a thousand pianos and knows exactly how every possible defect sounds. Physiologically, his hearing is the same as anyone else’s. The difference is in training.

The same applies to hearing the tube. The first time you put on headphones without it, the most common first impression is: “something’s strange, the sound feels kind of dry”. That’s because your brain has gotten used to compensating for the tube as background noise, and now its absence feels like “something is missing”.

After a few days of work the habit shifts. After a few weeks — you go back to your old headphones and hear the tube as clearly as a piano tuner hears a detuning. Just because you’ve trained yourself to listen in that direction.

And that’s exactly why veteran producers are so sensitive about this topic. They don’t hear any special frequencies — they have the same physiology as anyone else. They’ve just spent thousands of hours working with sound and learned to tell when something sounds “off”. This skill has nothing to do with innate “golden ears” — it’s pure practice.

🛠️ What we did in M1

When the team sat down to design the M1, the main problem we needed to solve wasn’t framed as “make yet another pair of headphones with a good frequency response”. The market is full of those. The main problem was — to zero out the tube. Without that, you can’t build the kind of “see-equals-hear microscope” a producer needs to control complex fast-tempo production — high BPM, dense synthesis, interlocking drums, where every millisecond in the mix matters.

That immediately determined the design choices. Each of them is a compromise with physics, and each was selected empirically, through measurements and listening. You can’t get reference-grade driver sound from pure theory — full theoretical modeling of a headphone is not analytically tractable. The Finite Element Method (FEM — where a complex shape is broken down by computer into millions of small elements and the behavior of each is calculated) is a powerful tool, but it solves the problem with major simplifications. A real cup with a real membrane on a real ear, especially at high frequencies where diffraction, partial membrane modes, and complex membrane behavior as a radiator come into play, produces deviations from the model that nothing but listening can close.

🍞 Analogy — baking bread

A bit more yeast — different crumb structure. Changed the oven temperature — different crust. Switched flour — rewrite the recipe. Many parameters, all interdependent, and the only way to find out whether it worked is when the bread is baked and you cut it open.

Same with a driver: change the membrane tension and you have to readjust the damping. Tweak the damping and the bass sensitivity changes. Change the suspension geometry and you redo the magnetic system. And the only way to verify whether it worked is one and the same: you have to taste it. With bread — literally. With a driver — listen to it.

Over a year of M1 development we produced an enormous number of drawings, ten construction prototypes of the headphone itself, and 64 membrane iterations. Every membrane was not only measured — it went through listening tests. The test material was dozens of reference tracks, half of which is the personal production output of one of the creators. These are tracks where every millisecond is known down to the instrument: which transient where, which phase pattern where, what exactly should be happening in every section of the wave. And by what specifically dropped out or got distorted in each new driver iteration, we could tell which way to turn the next adjustment.

The specific solutions that made it into the final version

1. An oval cup shape with no sharp angles

In a rectangular or cylindrical cavity, sound bounces between parallel walls and accumulates as standing waves — the most efficient mechanism for forming the tube. In a cup with smoothed geometry, sound reflections scatter in different directions, never returning twice to the same point.

Everyday analogy: echo in a rectangular room with hard parallel walls lives long and hums — anyone who’s been in an empty room before renovation has heard it. In a room with skewed non-parallel walls (the way recording studios are professionally designed) the echo dies almost instantly.

2. A precisely tuned venting path

This is a collective name for the entire system of excess pressure release and resonance reduction: where the energy generated behind the membrane goes, how it passes through the construction elements, and in what volume it exits.

In the full-size planar market, cup depth lies in the range of 18–26 mm based on available measurements — which gives a first cup mode in the area of 3–4 kHz.

In the M1, total depth is 12 mm (3 mm earpad + 9 mm from membrane to back wall). This shifts the resonance from the critical 3–4 kHz to ~7 kHz. At 7 kHz the wavelength is shorter — substantially easier to damp with thin acoustic materials. Sound at this frequency, even if it managed to accumulate at all, would pass through the damping layers and lose energy before becoming an audible resonance.

3. Even force distribution across the membrane and control of partial modes

A membrane of 46×60 mm isn’t a point source — it’s an area. If the magnetic field is distributed unevenly across it, different sections of the membrane move with different amplitudes. At certain frequencies the membrane stops moving as a single piston and breaks up into independently oscillating zones — these are partial modes. Each such mode adds its own ringing.

In the M1, the magnetic system is designed so that force is distributed as evenly as possible — this removes partial modes as a source of coloration.

Together these solutions produce what you see in the impulse response: a clean main peak, fast decay, no long tails. And what you see in the waterfall: even decay across the entire range with no long-ringing “notes”.

🎯 In place of a conclusion: see = hear

Modern production is work with the visible waveform. Open a DAW, zoom in, and you can drill all the way down to an individual sample. At a 48 kHz sample rate that’s a resolution of 0.02 milliseconds per pixel. Which means a producer today literally sees on screen events lasting tenths of a millisecond — transients, phase conflicts, zigzags. Sees a phase mismatch between the kick and the bass in the 60–100 Hz region — those couple-of-sample offsets that make the sub lose impact. Sees a conflict between the snare and the mid-bass at 200–400 Hz — in the zone where these instruments overlap and mask each other. Sees where there’s a clean sustain and where a micro-click appeared.

This is historically a very recent capability. Before convenient DAWs in the early 2000s, this resolution was either unavailable entirely, or available only on extremely expensive studio gear. The producer of the 80s–90s could hear the problem, but couldn’t precisely see it. Real-time analyzers that show honest millisecond resolution of what’s happening to the sound right now only became widespread around the mid-2000s.

And here’s the paradox that explains why this problem still hasn’t been solved at scale. Over those decades, sound engineers learned to see what’s happening to the sound at millisecond resolution. But the headphone manufacturers — especially the big legendary brands — barely involve those engineers in full-cycle development over the same years. Large corporations are inertial, and their R&D is tuned for the mass market and impressive, audiophile-pleasing sound, not for accurate monitoring. Development relies on engineering intuition, on the personal preferences of the creators, on testing with random focus groups aimed at selling more — not at making it more accurate. Random hits on good characteristics happen for individual models, but they’re random, in the absence of a systematic approach.

This gap hits both sides — producers and listeners alike. The producer spends years moving from one “legendary” model to another, in each one re-training their internal ADC, raising their perceptual thresholds — but never reaches that “see = hear” state, because no one set up the task systematically. The audiophile, lacking the skill to tell truth from coloration, also keeps spending forever: bought “legendary” headphones, listened for a year, retrained their ADC inside that specific tube, started hearing its limitations, felt that something was missing — went out for the next pair, in which a different kind of tube sits. And so it goes in a loop, year after year. The industry is perfectly tuned for this cycle — every new model is sold as a “new level”, but in reality it’s just a different coloration at a different price.

Now the situation for the modern sound engineer looks like this:

👁️ Seeing at millisecond resolution

Any modern DAW gives you this capability out of the box.

👂 Hearing at the same resolution

99% of headphones on the market don’t let you do this.

It comes down to the headphones. They show “the big picture” — does the stage feel right, is the instrument recognizable. But 2–4-millisecond errors in a dense mix get drowned in them. Drowned in the tube. And the producer ends up in a strange position: eyes see the problem, but the ears can’t catch it, because the headphones smear that difference into their own coloration.

When the tube is zeroed out — visual and auditory resolution finally synchronize. You see a glitch — you hear it in the headphones. You see a 2-ms phase conflict in the bass — you hear that exact thing, not “something wrong with the bass, need to figure it out”. The information the DAW gives your eyes starts to match the information the headphones give your ears.

That’s where the name “audio microscope” came from. A microscope gives the eyes a resolution they don’t naturally have. We have headphones that give the ears a resolution equivalent to what the DAW has long given the eyes. Hearing finally catches up with sight.

In that sense, the tube is not just an acoustic defect. It’s the gap between what the producer sees on the screen and what they’re capable of hearing in reality.

That gap is what we closed in M1.

✨ A final word

M1 is not a classic commercial product. The idea grew out of the practical needs of a producer who has worked in neurofunk for thirty years: high BPM, dense synthesis, jewelry-grade sound design, interlocking rhythmic parts. With material like that, headphones with the tube turn every mix into a fight against background noise that isn’t actually in the signal.

Over those thirty years many “legendary” models from major audiophile brands passed through our hands — each promised studio monitoring and each in practice colored the sound in its own way, making it more impressive, more pleasant, “warmer”, but not the truth. The closest thing to honest reproduction turned out to be the Fostex RP MK3. Even those had their downsides because of the plastic cup and Kapton membrane: light tubeness, a smeared midrange around 300 Hz, insufficient sub depth, hyped highs that can trigger tinnitus in some listeners, and a stiff, hard headband. M1 was built with the goal of preserving honest monitoring while removing those weaknesses.

Expensive planars from respected manufacturers are great headphones, and for classical, jazz, audiophile vinyl they work beautifully. But for modern EDM production they color the sound, making it sweeter to perceive. That’s normal and acceptable for the listener. For the producer it means the final mix is balanced against one acoustic picture, while the listener on their headphones or speakers will hear a completely different one.

When it became clear that the instrument worked — the decision was made to set up production for colleagues and students at an accessible price. The goal: that a producer without a million-dollar studio has the ability to control sound at the same level as a producer with top-tier gear. To compete not with finances, but with hands and ideas.

M1 is idea-driven headphones. From user to user. Built with personal experience and the solution to personal pain — not just a desire to scam money out of the buyer.

🎧 For producers

With M1, your music will get better — you’ll hear what’s actually happening in it. Every phase mistake, every uncleaned artifact, every click and rustle — visible on the analyzer and audible at the same time.

🎵 For audiophiles

In M1, music becomes the truth. Not “warmer”, not “airier”, not “more musical” — but the way the sound engineer recorded it. The truth has to be accepted. Once you accept it, you don’t go back. Because once you hear what a good mix really sounds like, the difference between it and a bad mix becomes obvious. And every other pair of headphones starts to sound like varying degrees of coloration — some pleasant, some not — the truth distorted.

Blog