How the Fourier Transform Converts Sound Into Frequencies

Why This Piece Exists

of the Fourier Rework — extra like an instinct piece primarily based on what I’ve realized from it and its utility in sound frequency evaluation. The aim right here is to construct instinct for a way the Fourier Rework helps us get to frequency area options from time area options. We received’t get into heavy math and derivations; as a substitute, we’ll attempt to simplify the that means conveyed by the complicated equations.

Earlier than we get into the Fourier Rework, it is best to have a fundamental understanding of how digital sound is saved — particularly sampling and quantization. Let me shortly cowl it right here so we’re on the identical web page.

Sound in the actual world is a steady wave — air strain altering easily over time. However computer systems can’t retailer steady issues. They want numbers, discrete values. To retailer sound digitally, we do two issues.

First, sampling — we take “snapshots” of the sound wave’s amplitude at common intervals. What number of snapshots per second? That’s the sampling fee. CD-quality audio takes 44,100 snapshots per second (44.1 kHz). For speech in ML pipelines, 16,000 per second (16 kHz) is widespread and largely enough. I’ve labored with 16 kHz speech knowledge extensively, and it captures just about all the things that issues for speech. The important thing concept is that we’re changing a clean steady wave right into a sequence of discrete deadlines.

Second, quantization — every snapshot must report how loud the wave is at that second, and with how a lot precision. That is the bit depth. With 16-bit audio, every amplitude worth may be considered one of 65,536 attainable ranges (2¹⁶). That’s greater than sufficient for the human ear to note any distinction from the unique. With solely 8-bit, you’d have simply 256 ranges — the audio would sound tough and grainy as a result of the hole between the true amplitude and the closest storable worth (this hole is known as quantization error) turns into audible.

After sampling and quantization, what we now have is a sequence of numbers — amplitude values at evenly spaced time steps — saved within the laptop. That’s our time area sign. That’s g(t). And that’s what the Fourier Rework takes as enter.

I’ve spent a superb period of time working hands-on with audio knowledge preprocessing and mannequin coaching, largely coping with speech knowledge. Whereas this piece builds all the things from first rules, lots of what’s written right here comes from truly operating into this stuff in actual pipelines, not simply textbook studying.

Additionally a promise — no AI slop right here. Let’s get into it.

The Setup: What We’re Beginning With

The unique audio sign — for complicated sounds (together with harmonic ones) just like the human voice or musical devices — is usually made up of a mixture of frequencies: constituent frequencies, or a superposition of frequencies.

The continual sound we’re speaking about is within the time area. It might be an amplitude vs. time graph. That’s how the sampled factors from the unique sound are saved in a pc in digital format.

The Fourier Rework (FT) is the mechanism by which we convert that graph from the time area (X-axis → Time, Y-axis → Amplitude) right into a frequency area illustration (X-axis → Frequency, Y-axis → Amplitude of contribution).

Determine 1: Time area sign transformed to frequency area by way of FT, displaying peaks at 300 Hz and 700 Hz (Generated by google nano banana)

For those who’ve ever used librosa.stft() or np.fft.rfft() in your ML pipeline and puzzled what’s truly taking place below the hood whenever you go from uncooked audio to a spectrogram — that is it. The Fourier Rework is the inspiration beneath all of it.

Let’s discuss extra at an instinct degree about what we’re aiming for and the way the Fourier Rework delivers it. We’ll attempt to perceive this in an organized approach.

Our Purpose

We wish to discover the values of these frequencies whose mixture makes up the unique sound. By “unique sound,” I imply the digital sign that we’ve saved by sampling and quantization by way of an ADC into our digital system. In easier phrases – we wish to extract the constituent frequencies from which the complicated sound consists.

It’s analogous to having a bucket by which all colors are combined, and we wish to segregate the constituent colors. The bucket combined with colors is the unique audio sign. The constituent colors are the constituent frequencies.

We would like a graph that simply tells us which frequencies have what amplitude of contribution in making the unique sound. The x-axis of that graph ought to have all of the frequency values, and the y-axis ought to have the amplitude of contribution corresponding to every frequency. The frequencies which might be truly current within the sign will present up as peaks. Every part else will likely be close to zero.

Our enter could be the amplitude-time graph, and the output could be the amplitude-frequency graph from the Fourier Rework.

It’s apparent that since these graphs look so completely different, there could be arithmetic concerned. And to be trustworthy, superior mathematical instruments just like the Fourier Rework and complicated numbers are used to transform from our enter (time area graph) to our output (frequency area graph). However to get the instinct of why the Fourier Rework does the job accurately, it’s important to know what the Fourier Rework does such that our aim is achieved. Then we’ll get to know how it helps us obtain it at an instinct degree.

The WHAT, the HOW, and the WHY.

The WHAT: What Does FT Really Do?

In answering the WHAT, we don’t must see what math is occurring inside — we simply wish to know what enter it takes and what output it provides. We are going to deal with it like a black field.

Right here’s the factor: the enter to the FT is the complete unique audio sign g(t), the whole time area waveform. We consider the FT at a selected frequency worth f, and the output for that frequency f is a single complicated quantity. This complicated quantity is known as the Fourier coefficient for frequency f.

The subsequent query is: what is that complicated quantity that the FT outputs? What can we get from it?

From this complicated quantity, we extract two issues:

Magnitude = √(Real² + Imaginary²) — this tells us the amplitude of contribution of frequency f within the unique sign. A excessive magnitude means f is strongly current within the unique audio. A low magnitude means it’s barely there or not there in any respect.

Section = arctan(Imaginary / Actual) — this tells us the section offset of that frequency part. It signifies the place in its cycle that frequency begins. We’ll speak about section correctly later; don’t fear about it proper now. Simply know that this data additionally comes out of the identical complicated quantity.

What occurs is that we do that for each frequency we care about. For every f, we get one complicated quantity, extract the magnitude, and plot it. The gathering of all these (frequency, magnitude) pairs provides us the frequency area graph. That’s the WHAT.

Let’s see HOW that complicated quantity truly comes about — what’s the mechanism contained in the FT that produces it?

The HOW: How Does FT Compute This?

Right here’s the place issues get actually stunning, imagine me.

The Winding Machine

The core concept is that we wrap the unique sign round a circle within the complicated aircraft. The pace at which we wrap is determined by the enter frequency f.

Mathematically, for a given frequency f, we compute:

g(t) · e^(−2πift)

at each cut-off date t, and plot the end result on the complicated aircraft (actual axis, imaginary axis). Let’s break this down, as a result of it’s important to know find out how to visualize and interpret what’s taking place right here.

Right here’s an vital factor to visualise: within the unique g(t) graph, as time t will increase, we’re merely transferring from left to proper alongside the time axis — it’s a straight line, and we by no means come again. However within the complicated aircraft, we’re transferring in a circle across the origin (0,0). As time progresses, we maintain coming again to the identical angular positions — each time one full loop is accomplished, we begin over from the identical angle. The pace at which one full circle is accomplished is determined by f: one full rotation occurs when 2πtf = 2π, which suggests t·f = 1, so it takes 1/f seconds to finish one loop. Increased f → quicker looping. Decrease f → slower looping.

The time area graph is a one-way journey left to proper. The complicated aircraft graph is a round journey that retains looping — and the speed of looping is managed by the enter frequency f.

You would possibly assume: since we maintain coming again to the identical angular positions, does the second loop hint the very same path as the primary? Within the time area, every particular person constituent frequency is a repeating sine wave, proper? The 300 Hz part repeats each 1/300 seconds, the 700 Hz part repeats each 1/700 seconds. Every one individually has a clear repeating sample. After we wind g(t) across the complicated aircraft, shouldn’t the trail from 0 to T (one interval, T = 1/f) and from T to 2T be precisely the identical? Shouldn’t the loops overlap completely?

No. And this can be a delicate however vital factor to know early.

The person constituent frequencies inside g(t) do repeat — sure. However g(t) itself is just not a single frequency. It’s a superposition of a number of frequencies combined collectively. Regardless that the angular place within the complicated aircraft resets each 1/f seconds (the e^(−2πift) half completes one full loop), the space from the origin — which is g(t) — is completely different at time t versus time t + 1/f. That’s as a result of g(t) has different frequency elements in it that don’t repeat on the identical fee as f. The worth of g(t) on the identical angular place modifications from one loop to the following.

Every loop traces a barely completely different path within the complicated aircraft. Because of this, once we compute the Centre of Mass later, we compute it over the complete path for the total period — not only one loop. If g(t) occurred to be a single pure sine wave at precisely frequency f and nothing else, then sure, each loop could be an identical. However for any real-world sign with a number of frequencies, every loop is completely different, and we have to think about all of them.

Maintain this in thoughts — it’ll make extra sense as soon as we get to the COM part beneath.

At any specific time t:

g(t) is the amplitude of the unique sign at that second — this turns into the space from the origin within the complicated aircraft. Consider it because the magnitude of a fancy quantity.

e^(−2πift) provides the angle — particularly, an angle of (−2πtf) radians measured clockwise from the constructive actual axis.

At every time t, we’re putting a degree at distance g(t) from the origin, at an angle decided by 2πtf.

As time progresses, the angle retains rotating (as a result of t will increase), and the space from the origin retains altering (as a result of g(t) modifications with the audio sign). The result’s a path — a curve within the complicated aircraft.

We are able to interpret this as wrapping or winding the unique sound sign g(t) round a circle, the place the pace of winding relies upon upon the enter frequency f. Increased f means the curve wraps round quicker. Decrease f means slower wrapping. One full circle is accomplished when t·f = 1, so the time interval of 1 full rotation is 1/f.

To visualise how this winding occurs at completely different frequencies, see this video — it’ll present the complicated graph form within the complicated aircraft at completely different frequencies → 3Blue1Brown — However what’s the Fourier Rework? (https://www.youtube.com/watch?v=spUNpyF58BY). Among the best assets on the market for constructing this instinct.

The Centre of Mass (COM)

Right here’s the place the magic occurs. As soon as we now have this wound-up curve within the complicated aircraft, we calculate its Centre of Mass (COM).

Consider the wound-up curve as if it has uniform mass density, like a wire. The COM is the only level that represents the common place of the complete curve. We would like the coordinates (Actual, Imaginary) of this COM. Let’s see how we truly calculate this.

Our unique sound g(t), as a digitally saved sign in a pc, received’t be steady — we might have sampled factors of the unique sound. The corresponding sampled factors could be there on the complicated aircraft too after making use of g(t)·e^(−2πift). The extra sampled factors there are within the unique audio, the extra corresponding factors there could be on the complicated aircraft.

A fast word earlier than the formulation: what we’ve been discussing thus far — the winding, the round movement, the COM — all of that’s the identical whether or not we’re speaking in regards to the steady model (with integrals) or the discrete model (with summations). The core idea of what the Fourier Rework does doesn’t change. Don’t get confused whenever you see a summation (Σ) in a single system and an integral (∫) in one other — they’re doing the identical factor conceptually. Summation is for our finite sampled factors; the integral is for the theoretical steady case. For constructing instinct, you’ll be able to consider both one — the thought is an identical. Simply completely different instruments for a similar job.

For our discrete digital sign with N sampled factors, the COM coordinates are:

COM = (1/N) Σ g(t_n) · e^(-2πit_n·f)

That is the discrete model – and that is precisely what’s taking place whenever you name np.fft.rfft() or np.fft.fft() in Python. It’s computing this winding + COM calculation for all frequencies without delay. That one perform name is doing this whole course of throughout each frequency bin concurrently.

Now simply think about if this isn’t completed digitally. In that case, we don’t want sampled factors and we are able to work on a steady perform. Meaning we may have infinite steady factors of unique audio and corresponding infinite factors on the complicated aircraft. As a substitute of summation, we are able to combine:

ĝ(f) = ∫ g(t) · e^(-2πift) dt

Integration over limits → t₁ and t₂ (time period of unique sound), integration over → g(t)·e^(-2πift), and the output is the complicated Fourier coefficient for that frequency f. That is the continual Fourier Rework system. In apply we at all times work with the discrete model since we’re coping with digital audio, however the steady kind is sweet to know as a result of it reveals the identical concept with out the distraction of indices and array lengths.

One factor price noting – the boundaries t₁ and t₂ matter. The ultimate COM you get truly is determined by how a lot of the sign you’re together with. A unique time phase may give a special COM for a similar frequency. For this text, we’re making use of FT to the total sign, so t₁ and t₂ are merely the beginning and finish of our whole audio. However whenever you later get into STFT (Quick-Time Fourier Rework), you’ll see that intentionally selecting quick time segments and making use of FT to every one is strictly the thought – and that’s the place window dimension turns into a design choice.

Now once we get the COM coordinates, we calculate its distance from the origin:

Magnitude = √(Real² + Imaginary²)

This magnitude is the amplitude of contribution of frequency f within the unique audio sign. That’s what will get plotted because the y-value for this frequency within the frequency area graph.

The instinct for what this magnitude means: if the COM is at a major distance from the origin, that frequency has a robust contribution within the unique sign. If the COM is sitting close to or across the origin, that frequency is barely current or not current in any respect. The space from origin is straight telling us how a lot that frequency issues.

And bear in mind what we mentioned earlier in regards to the loops not overlapping – that is the place it pays off. The COM averages over all these barely completely different loops, and that averaging is what makes the non-matching frequencies cancel out (their contributions level in several instructions throughout loops and sum to close zero) whereas the matching frequencies pile up (their contributions persistently level in the identical route throughout loops).

Why the COM Works: The Key Perception

That is the half that makes the entire thing click on. Learn this rigorously.

When the winding frequency f matches a constituent frequency of the sign, one thing particular occurs. The wound-up curve turns into lopsided — the factors pile up on one facet of the complicated aircraft. The COM lands removed from the origin. Excessive magnitude. We detect that frequency.

When f does not match any constituent frequency, the wound-up curve distributes roughly evenly across the origin. Factors on one facet get cancelled out by factors on the other facet. The COM lands close to the origin. Low magnitude. That frequency isn’t actually current.

Match → lopsided → COM removed from origin → peak within the frequency area.

No match → balanced → COM close to origin → flat within the frequency area.

That’s it. That’s how the Fourier Rework figures out what frequencies are inside the unique sign.

Labored Instance: Strolling By means of the Numbers

Let’s make this concrete with precise numbers. That is the place the instinct turns into rock strong — belief me on this one.

Setup: Suppose our unique audio sign is:

g(t) = sin(2π·300·t) + sin(2π·700·t)

It is a sign made up of precisely two frequencies: 300 Hz and 700 Hz. In the actual world, this would possibly sound like two pure tones taking part in concurrently. We all know the reply already — the frequency area graph ought to present peaks at 300 and 700, and nothing else. Let’s see if the FT will get it proper.

We apply the Fourier Rework at three frequencies: f = 300 Hz, f = 700 Hz, and f = 500 Hz.

*Determine 3: When winding frequency matches (300 Hz, 700 Hz), the curve turns into lopsided. COM Vector (pink arrow) factors removed from origin – excessive magnitude* (generated by google nano banana)

FT at f = 300 Hz (a constituent frequency)

We wind g(t) across the complicated aircraft at 300 rotations per second.

Take into consideration what occurs — the 300 Hz part of g(t) is rotating at the very same pace as our winding. Due to this, the 300 Hz a part of the sign persistently lands on the identical facet of the complicated aircraft. It doesn’t cancel itself out. The wound-up curve turns into closely lopsided in a single route.

What in regards to the 700 Hz part? It’s rotating at a special pace than our 300 Hz winding. Over time, it traces out a roughly symmetric path across the origin and averages out to close zero. It doesn’t contribute to the lopsidedness.

End result: The COM is way from the origin. The magnitude is excessive. The frequency area graph will get a tall peak at f = 300 Hz. Appropriate — 300 Hz is certainly a constituent frequency.

FT at f = 700 Hz (the opposite constituent frequency)

Similar logic, simply reversed. The 700 Hz part of g(t) matches the winding pace, so it piles up on one facet. The 300 Hz part, being at a special pace, averages out.

End result: The COM is way from the origin. Excessive magnitude. A tall peak at f = 700 Hz. Appropriate once more.

FT at f = 500 Hz (NOT a constituent frequency)

*Determine 4: FT at f = 500 Hz (non-constituent). Wound-up curve distributes evenly. COM close to origin – magnitude close to zero* (generated by google nano banana)

We wind g(t) at 500 rotations per second. Right here’s the factor — neither the 300 Hz part nor the 700 Hz part matches this winding pace. Each of them hint roughly symmetric paths across the origin within the complicated aircraft. Nothing piles up persistently on one facet. Every part simply cancels out; the curve is just about centered across the origin.

End result: The COM could be very near the origin. The magnitude is close to zero. The frequency area graph is flat at f = 500 Hz — accurately telling us this frequency is just not current within the sign.

The Frequency Area Graph

After doing this for all frequencies, our frequency area graph would present precisely two sharp peaks — one at 300 Hz and one at 700 Hz — with all the things else close to zero. We now have efficiently decomposed g(t) into its constituent frequencies. That’s the Fourier Rework doing its job.

The color bucket analogy holds completely: we had a mix (300 Hz + 700 Hz combined collectively within the time area), and the Fourier Rework segregated the constituent colors.

Seeing It in Code

For many who wish to see this working in Python — right here’s the labored instance in precise code. It’s actually just a few strains:

import numpy as np

# Create the sign: 300 Hz + 700 Hz
sr = 8000  # sampling fee
t = np.linspace(0, 1, sr, endpoint=False)  # 1 second of audio
g = np.sin(2 * np.pi * 300 * t) + np.sin(2 * np.pi * 700 * t)

# Apply Fourier Rework - that is doing the winding + COM for all frequencies without delay
fft_result = np.fft.rfft(g)

# Get magnitudes (amplitude of contribution for every frequency)
magnitudes = np.abs(fft_result)

# Get the frequency values corresponding to every bin
freqs = np.fft.rfftfreq(len(g), d=1/sr)

# The peaks in magnitudes will likely be at 300 Hz and 700 Hz
# Every part else will likely be close to zero

That’s it. np.fft.rfft(g) is doing the complete winding + COM course of we mentioned above – for each frequency bin concurrently. The np.abs() extracts the magnitude (distance of COM from origin), and the np.angle() would provide the section offset in the event you wanted it. The rfft particularly provides you solely the helpful half of the spectrum (as much as the Nyquist frequency) for the reason that different half is a mirror – in the event you’ve learn the aliasing article, you realize why.

Section: The Hidden Variable

Let’s speak about one thing that confused me for some time — the section. This idea is less complicated to know if you have already got some understanding of section and section distinction by way of waves and sinusoidal alerts, however I’ll attempt to clarify what I understood.

I do know lots of ML audio pipelines work with magnitude spectrograms solely and throw the section away fully. That’s tremendous for a lot of duties — however understanding what section is and what you’re discarding provides you a deeper understanding of the sign. And there are duties the place section issues (speech synthesis, audio reconstruction, vocoder design), so this part is price studying even in the event you’re solely doing magnitude-based characteristic extraction proper now.

The COM we get from the FT is a fancy quantity. It has a magnitude (distance from the origin) and in addition an angle related to it:

Section = arctan(Imaginary(COM) / Actual(COM))

That angle tells us the section offset of the frequency part f because it exists inside the unique sign. In easy phrases, it tells you the place in its cycle that frequency part begins at t = 0.

A False impression I Had

I initially thought that for constituent frequencies, this section would at all times be 0. If a frequency is a part of the unique sign, the COM ought to simply lie on the actual axis, proper? Section 0, most sync, all that. It is smart intuitively, no?

That’s not true, and right here’s why.

If the unique sign is g(t) = sin(2π·300·t + π/4), the frequency 300 Hz is totally a constituent frequency — it’s actually the one frequency within the sign. However its section offset is π/4, not 0. The 300 Hz part doesn’t begin at zero amplitude at t = 0; it begins shifted by π/4.

The FT will accurately output a excessive magnitude at f = 300 Hz, and the angle of the complicated quantity will likely be π/4, recovering the precise section with which the 300 Hz part exists within the sign.

Section is 0 provided that the part occurs to begin at precisely the fitting reference level at t = 0. In any other case, it may be something. The magnitude tells you ways a lot of that frequency is current. The section tells you the place in its cycle it begins. Each items of knowledge come from the identical complicated quantity.

In code, you’d get these individually:

magnitude = np.abs(fft_result)    # how a lot of every frequency
section = np.angle(fft_result)      # the place in its cycle every frequency begins

Whenever you compute a magnitude spectrogram (which is what most ML pipelines do), you’re preserving the primary and discarding the second. Now a minimum of you realize what you’re throwing away.

For Non-Constituent Frequencies

For frequencies that aren’t a part of the unique sign (like f = 500 Hz in our labored instance), the magnitude is close to zero. The section you get on this case is basically meaningless – it’s the angle of a near-zero vector pointing in some arbitrary route. Consider it as noise. The route doesn’t imply something when the vector has no size.

It’s fairly intuitive when you concentrate on it: for a non-constituent frequency, regardless of the COM coordinates come out to be, they’re so near the origin that the angle is simply numerical noise, not significant details about the sign.

Why FT Handles Section Robotically (This One Actually Confused Me)

Okay, so this can be a delicate level that took me some time to get. And I wish to clarify it clearly as a result of it’s the sort of factor that bugs you when you begin serious about it.

Right here’s the query: the FT solely takes frequency f as enter, proper? We don’t give it a section angle. However for a selected enter frequency, we may get completely different correlations if we fluctuate the section alignment between our take a look at wave and the unique sign. So how does FT discover the “finest” section – the one that offers the utmost attainable magnitude for enter frequency f?

The reply: FT doesn’t search or optimize over section in any respect. It doesn’t must.

Right here’s why, and the secret is Euler’s system:

e^(-2πift) = cos(2πft) – i·sin(2πft)

After we compute FT at frequency f, we’re concurrently correlating the sign with each cos(2πft) and sin(2πft). The actual a part of the output captures the cosine correlation. The imaginary half captures the sine correlation.

Now right here’s the vital factor – any sinusoid at frequency f with any arbitrary section φ may be decomposed as:

A·cos(2πft + φ) = A·cos(φ)·cos(2πft) – A·sin(φ)·sin(2πft)

No matter what section the part has within the unique sign, the FT routinely captures it:

The actual half picks up A·cos(φ) — the cosine correlation. The imaginary half picks up A·sin(φ) — the sine correlation. Magnitude = √(real² + imag²) = A — the true amplitude, no matter φ. Angle = arctan(imag/actual) = φ — recovers the precise section.

It’s like measuring the size of a vector by projecting it onto each the x-axis and y-axis. Irrespective of which route the vector factors, you at all times recuperate its full size by √(x² + y²). The complicated exponential is testing all phases concurrently as a result of cosine and sine collectively cowl all attainable section angles — they’re orthogonal to one another.

No optimization. No looking. No iterating over section values. Simply the truth that cosine and sine are orthogonal and collectively they seize any section. The mathematics does it in a single shot.

That is the place I lastly understood why complicated numbers are used right here and never simply common correlation with a single sine wave. Euler’s system is doing one thing very intelligent — it’s correlating with two issues without delay, and the complicated quantity neatly packages each outcomes collectively.

Placing It All Collectively

Right here is the total image of how we get from the time area to the frequency area:

*Determine 5: The total FT pipeline: Sign → Choose Freq → Wind → Discover COM → Plot → Repeat for all f* (generated by google nano banan)

1. Take the unique audio sign g(t) — our time area knowledge

2. Choose a frequency f

3. Wind g(t) across the complicated aircraft at pace f utilizing g(t)·e^(−2πift)

4. Calculate the COM of the wound-up curve

5. The space of the COM from the origin → amplitude of contribution of f

6. The angle of the COM → section offset of f

7. Plot the purpose (f, magnitude) on the frequency area graph

8. Repeat for all frequencies

The frequencies which might be truly current within the unique sign produce lopsided winding → COM removed from the origin → peaks within the graph. Frequencies that aren’t current produce balanced winding → COM close to the origin → flat areas.

After doing this throughout all frequencies, we now have the whole frequency area graph. The peaks inform us the constituent frequencies of the unique sound. That’s the Fourier Rework — decomposing a fancy sign into its constructing blocks.

The mathematics is a instrument to justify the instinct — the actual understanding is within the winding, the Centre of Mass, and the way in which the complicated exponential handles section routinely by Euler’s system. As soon as these three issues click on, you get the Fourier Rework at an instinct degree, and the heavy math derivations are simply formalizing what you already perceive. And as soon as this clicks, you’ll see the FT all over the place in sign processing, and it’ll all begin making sense.

The WHY

Why does the Fourier Rework work? The intuitive reply is what we’ve constructed by this whole piece – matching frequencies create lopsided windings, non-matching frequencies create balanced ones that cancel out. The winding machine is basically a correlation detector – it measures how a lot the unique sign correlates with a pure sinusoid at every frequency. Excessive correlation means COM removed from origin which provides a peak, low correlation means COM close to origin and we get a flat area within the graph.

At its core, why this works rigorously would require heavy math derivation involving orthogonality of sinusoidal features and properties of complicated exponentials – which isn’t the aim of this piece. However the instinct we’ve constructed ought to be greater than sufficient to know what’s taking place and why the output is smart. It really works!

What Comes Subsequent

This piece covers the continual/conceptual Fourier Rework — the inspiration. In apply, whenever you work with digital audio in ML pipelines, you’re utilizing the DFT (Discrete Fourier Rework) and its quick implementation, the FFT. And whenever you compute spectrograms, you’re utilizing the STFT (Quick-Time Fourier Rework), which applies the FT to small overlapping home windows of the sign — that’s the place window dimension N, hop size, and overlap are available in. However that’s a subject for an additional writeup.

All of that builds straight on high of what we lined right here. The winding machine, the COM, the magnitude and section — it’s the identical mechanism, simply utilized to quick chunks of audio as a substitute of the entire thing without delay. If this piece clicked for you, the remainder will observe naturally. I would write in regards to the DFT and STFT intimately later.

Thanks for the endurance in the event you’ve learn this far, and because of Grammarly for serving to with the modifying.

Be happy to succeed in out with any questions:

E-mail: [email protected]

Twitter: @r4plh

GitHub: github.com/r4plh

LinkedIn: linkedin.com/in/r4plh

Source link

How the Fourier Transform Converts Sound Into Frequencies

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

Thoughts on the major design overhaul of Apple’s OSes with “Liquid Glass” UI elements, which will launch at WWDC and set the stage for fresh hardware products (Mark Gurman/Bloomberg)

Why I Admire Walt Downing’s Volunteerism

The SharePoint hacks increase pressure on Microsoft’s Secure Future Initiative, which the company calls an ambitious effort expected to take years to complete (Andrew Martin/Bloomberg)

How the Fourier Transform Converts Sound Into Frequencies

Why This Piece Exists

The Setup: What We’re Beginning With

Our Purpose

The WHAT: What Does FT Really Do?

The HOW: How Does FT Compute This?

The Winding Machine

The Centre of Mass (COM)

Why the COM Works: The Key Perception

Labored Instance: Strolling By means of the Numbers

FT at f = 300 Hz (a constituent frequency)

FT at f = 700 Hz (the opposite constituent frequency)

FT at f = 500 Hz (NOT a constituent frequency)

The Frequency Area Graph

Seeing It in Code

Section: The Hidden Variable

A False impression I Had

For Non-Constituent Frequencies

Why FT Handles Section Robotically (This One Actually Confused Me)

Placing It All Collectively

The WHY

What Comes Subsequent

Related Posts