Sound of Interference

We present a novel electromagnetic (EM) side-channel attack that enables acoustic eavesdropping on devices using modern MEMS microphones. These microphones transmit audio via pulse-density modulation (PDM), where each harmonic of the digital pulses retains acoustic data. Using simple FM demodulation with standard radio receivers, an attacker can remotely recover the audio heard by the microphone—without any software compromise or physical access.

We validate the attack through real-world tests on various PDM microphones and devices, including laptops and smart speakers. The attack achieves up to 94.2% digit recognition accuracy from 2 meters away, even through a 25 cm concrete wall. Using speech-to-text APIs not trained on EM signals, we recover speech with as little as 14% error on the Harvard Sentences dataset. Comparable results are obtained using a low-cost copper tape antenna. We also show that existing defenses like resampling are ineffective and propose a new hardware mitigation based on clock randomization.

To appear in USENIX 2025. The artifacts of this work, data collected through the SoI attack, and scripts to fine-tune and evaluate speech recognition and transcription models are available at Zenodo.

Read the preprint

Attack Principles

Recovering baseband digital signals typically requires wideband, strong coupling—e.g., TEMPEST Comeback used 25 MHz for low-rate serial data, far exceeding audio bandwidth. We investigate whether original audio can be recovered from the narrow sub-bands more commonly accessible to attackers through simulation.

Using MATLAB/Simulink with the Mixed Signal Blockset, we simulate a linear chirp (1–100 Hz at 100 Hz/s) and find that the original signal persists as FM around each harmonic. This indicates attackers could exploit these harmonics to eavesdrop on microphone-captured audio.

Simulation Results — Feasibility analysis results in the simulation environment (a)–(c) and in the real-world environment (d)–(f). (a) and (d) are the original acoustic signal. (b) and (e) are spectrograms of the narrow-band EM leakage, wherein the trace of peak frequency has strong correlation to the original chirp signal. (c) and (f) the acoustic signals recovered by applying FM demodulation to the narrow-band EM leakage.

To validate our simulation, we extract EM signals from a Lenovo Thinkpad T480's microphones using a probe antenna on the back panel, while a speaker emits 64 dB speech-like audio. FM demodulation reveals the chirp’s frequency sweep, confirming the attack, despite some high-frequency attenuation and minor distortion.

We replay the Harvard Sentences and evaluate intelligibility using three transcription models—HuBERT (4.6% WER), Microsoft STT (3.1%), and OpenAI STT (2.6%)—demonstrating high reconstruction accuracy.

Attacker Capability

We assess the attacker's ability to extract signals by measuring SNR under varying volume, antenna types, orientations, and distances. Using a probe, loop, and Yagi antenna, we record up to 25 dB PSNR near the device and 11.59 dB at 25 cm, with the Yagi antenna outperforming others at longer ranges.

Antenna Orientation — Examined antenna configurations (A_Loop and A_Yagi) for the SNR evaluation.

SNR Antenna — SNR over distance for different antenna configurations. Optimal quality achieved for A_Loop perpendicular in short-distance scenarios and A_Yagi horizontal for long-range attacks.

Reducing volume from 64 dB to 58 dB reveals a threshold effect: audio quality significantly degrades when PSNR drops below 0 dB beyond 10 cm.

Sound Quality Impact — Sound quality degradation over increasing distances.

Evaluation

Behind-the-wall scenario: We assess the accuracy of executing the attack in two adjacent rooms separated by a plaster wall (≈15 cm thick). The victim's laptop is placed on one side, and the attacker places the A_Loop antenna on the other side. We evaluate distances of 15, 20, and 25 cm.

The speaker classification accuracy reaches 99% at 20 cm and drops to 97.3% at 25 cm, confirming the high intelligibility of the recorded leakage. We achieve a word error rate of 6.5%.

Behind-the-wall Results — Classification accuracy and word error rate (WER) in behind-the-wall scenarios.

Original audio sample

Reconstructed audio using loop antenna

Reconstructed audio using cheap copper foil antenna

Long Distance Evaluation: We analyze intelligibility over a 1-meter distance using a horizontal A_Yagi at 461.887 MHz. The classification accuracy reaches 96.0% at 1 meter and drops to 91.6% at 2 meters. Beyond 4 meters, recovery accuracy declines significantly.

Long Distance Results — Classification accuracy with the A_Yagi antenna placed in an adjacent room.

Room Scenarios — Evaluated room scenarios, including different victim device orientations, occlusion, and wall materials/thicknesses.

Attack Generality

To evaluate the generality of the vulnerability, we test multiple devices with PDM microphones, including laptops (Lenovo L580, ASUS Chromebook, Redacted), a smart speaker (Google Home), and a headset (Jabra Evolve2 40 SE).

Tested laptops show consistent performance with >98% classification accuracy and ≤19% WER. The smart speaker and Redacted Laptop achieve ≥86% speaker and ≥90.3% digit classification, but with higher WER. The Jabra headset shows limited vulnerability (10% digit, 13.8% speaker classification, 100% WER), yet remains susceptible with an STOI of 0.5490 using the AProbe antenna.

Original audio sample (Harvard Sentences Dataset)

Lenovo ThinkPad T480

Lenovo ThinkPad L580

Chromebook C204MA

Google Home

Redacted Laptop

Jabra Evolve2 40 SE Headset

Countermeasures

Sampling Rate Randomization: Shielding is insufficient to fully mitigate PDM leakage, and other defenses like encryption or EM blinding incur power and performance costs. We explore sampling rate randomization as a mitigation, evaluating performance at 8, 16, 32, 40, and 48 kHz. Digit classification accuracy remains 96% at 8 kHz and 97% at 16 kHz, with WER improving slightly across all rates.

Clock Randomization: Spread-Spectrum Clocking (SSC) mitigates SoI by randomizing clock signals, making EM side-channels harder to exploit. We test deviations of 0.0%, 0.1%, 0.3%, and 1.0%, finding that as deviation increases, EM signal quality deteriorates (STOI drops from 0.7254 to 0.0490 at 1.0%). Meanwhile, the microphone’s performance remains stable (≈0.98 STOI), demonstrating SSC's effectiveness.

Acknowledgments

This research was funded by the JSPS KAKENHI Grant Number 22H00519, JST CREST JPMJCR23M4, and a gift from Meta. We thank the anonymous reviewers for their insightful feedback, Kohei Doi for the initial exploration that stimulated this work, and Daniel Olszewski & Tyler Tucker for proofreading support.