Sound of Interference

Electromagnetic Eavesdropping Attack on Digital Microphones Using Pulse Density Modulation

Arifu Onishi
Sri Hrushikesh Varma Bhupathiraju*
Rishikesh Bhatt*
Sara Rampazzi*
Takeshi Sugawara

We present a novel electromagnetic (EM) side-channel attack that enables acoustic eavesdropping on devices using modern MEMS microphones. These microphones transmit audio via pulse-density modulation (PDM), where each harmonic of the digital pulses retains acoustic data. Using simple FM demodulation with standard radio receivers, an attacker can remotely recover the audio heard by the microphone—without any software compromise or physical access.


We validate the attack through real-world tests on various PDM microphones and devices, including laptops and smart speakers. The attack achieves up to 94.2% digit recognition accuracy from 2 meters away, even through a 25 cm concrete wall. Using speech-to-text APIs not trained on EM signals, we recover speech with as little as 14% error on the Harvard Sentences dataset. Comparable results are obtained using a low-cost copper tape antenna. We also show that existing defenses like resampling are ineffective and propose a new hardware mitigation based on clock randomization.


To appear in USENIX 2025. The artifacts of this work, data collected through the SoI attack, and scripts to fine-tune and evaluate speech recognition and transcription models are available at Zenodo.

Attack Demonstration


Demonstration of the attack in a real-world scenario, where a cheap antenna made with copper foil is placed across the wall from the target device.
Demonstration of the attack in a real-world scenario, where a cheap antenna made with copper foil is placed across the wall from the target device.

Attack Principles


Recovering baseband digital signals typically requires wideband, strong coupling—e.g., TEMPEST Comeback used 25 MHz for low-rate serial data, far exceeding audio bandwidth. We investigate whether original audio can be recovered from the narrow sub-bands more commonly accessible to attackers through simulation.

Using MATLAB/Simulink with the Mixed Signal Blockset, we simulate a linear chirp (1–100 Hz at 100 Hz/s) and find that the original signal persists as FM around each harmonic. This indicates attackers could exploit these harmonics to eavesdrop on microphone-captured audio.

Simulation Results
Feasibility analysis results in the simulation environment (a)–(c) and in the real-world environment (d)–(f). (a) and (d) are the original acoustic signal. (b) and (e) are spectrograms of the narrow-band EM leakage, wherein the trace of peak frequency has strong correlation to the original chirp signal. (c) and (f) the acoustic signals recovered by applying FM demodulation to the narrow-band EM leakage.


To validate our simulation, we extract EM signals from a Lenovo Thinkpad T480's microphones using a probe antenna on the back panel, while a speaker emits 64 dB speech-like audio. FM demodulation reveals the chirp’s frequency sweep, confirming the attack, despite some high-frequency attenuation and minor distortion.

We replay the Harvard Sentences and evaluate intelligibility using three transcription models—HuBERT (4.6% WER), Microsoft STT (3.1%), and OpenAI STT (2.6%)—demonstrating high reconstruction accuracy.

ThinkPad Experiment


Original audio sample from the Harvard Sentences dataset
Audio signal reconstructed from EM Radiation using probe antenna

Attacker Capability


We assess the attacker's ability to extract signals by measuring SNR under varying volume, antenna types, orientations, and distances. Using a probe, loop, and Yagi antenna, we record up to 25 dB PSNR near the device and 11.59 dB at 25 cm, with the Yagi antenna outperforming others at longer ranges.

Antenna Orientation
Examined antenna configurations (ALoop and AYagi) for the SNR evaluation.
SNR Antenna
SNR over distance for different antenna configurations. Optimal quality achieved for ALoop perpendicular in short-distance scenarios and AYagi horizontal for long-range attacks.

Reducing volume from 64 dB to 58 dB reveals a threshold effect: audio quality significantly degrades when PSNR drops below 0 dB beyond 10 cm.

Sound Quality Impact
Sound quality degradation over increasing distances.

Evaluation


Behind-the-wall scenario: We assess the accuracy of executing the attack in two adjacent rooms separated by a plaster wall (≈15 cm thick). The victim's laptop is placed on one side, and the attacker places the ALoop antenna on the other side. We evaluate distances of 15, 20, and 25 cm.

The speaker classification accuracy reaches 99% at 20 cm and drops to 97.3% at 25 cm, confirming the high intelligibility of the recorded leakage. We achieve a word error rate of 6.5%.

Behind-the-wall Scenario
Attacker eavesdropping on the target laptop through a 15 cm plasterboard wall.
Behind-the-wall Results
Classification accuracy and word error rate (WER) in behind-the-wall scenarios.
Original audio sample
Reconstructed audio using loop antenna
Reconstructed audio using cheap copper foil antenna

Long Distance Evaluation: We analyze intelligibility over a 1-meter distance using a horizontal AYagi at 461.887 MHz. The classification accuracy reaches 96.0% at 1 meter and drops to 91.6% at 2 meters. Beyond 4 meters, recovery accuracy declines significantly.

Long Distance Results
Classification accuracy with the AYagi antenna placed in an adjacent room.
Room Scenarios
Evaluated room scenarios, including different victim device orientations, occlusion, and wall materials/thicknesses.

Attack Generality


To evaluate the generality of the vulnerability, we test multiple devices with PDM microphones, including laptops (Lenovo L580, ASUS Chromebook, Redacted), a smart speaker (Google Home), and a headset (Jabra Evolve2 40 SE).

Tested laptops show consistent performance with >98% classification accuracy and ≤19% WER. The smart speaker and Redacted Laptop achieve ≥86% speaker and ≥90.3% digit classification, but with higher WER. The Jabra headset shows limited vulnerability (10% digit, 13.8% speaker classification, 100% WER), yet remains susceptible with an STOI of 0.5490 using the AProbe antenna.

Original audio sample (Harvard Sentences Dataset)
Lenovo ThinkPad T480
Lenovo ThinkPad L580
Chromebook C204MA
Google Home
Redacted Laptop
Jabra Evolve2 40 SE Headset

Countermeasures


Sampling Rate Randomization: Shielding is insufficient to fully mitigate PDM leakage, and other defenses like encryption or EM blinding incur power and performance costs. We explore sampling rate randomization as a mitigation, evaluating performance at 8, 16, 32, 40, and 48 kHz. Digit classification accuracy remains 96% at 8 kHz and 97% at 16 kHz, with WER improving slightly across all rates.

Clock Randomization: Spread-Spectrum Clocking (SSC) mitigates SoI by randomizing clock signals, making EM side-channels harder to exploit. We test deviations of 0.0%, 0.1%, 0.3%, and 1.0%, finding that as deviation increases, EM signal quality deteriorates (STOI drops from 0.7254 to 0.0490 at 1.0%). Meanwhile, the microphone’s performance remains stable (≈0.98 STOI), demonstrating SSC's effectiveness.

Acknowledgments


This research was funded by the JSPS KAKENHI Grant Number 22H00519, JST CREST JPMJCR23M4, and a gift from Meta. We thank the anonymous reviewers for their insightful feedback, Kohei Doi for the initial exploration that stimulated this work, and Daniel Olszewski & Tyler Tucker for proofreading support.