We present a novel electromagnetic (EM) side-channel attack that enables acoustic eavesdropping on devices using modern MEMS microphones. These microphones transmit audio via pulse-density modulation (PDM), where each harmonic of the digital pulses retains acoustic data. Using simple FM demodulation with standard radio receivers, an attacker can remotely recover the audio heard by the microphone—without any software compromise or physical access.
We validate the attack through real-world tests on various PDM microphones and devices, including laptops and smart speakers. The attack achieves up to 94.2% digit recognition accuracy from 2 meters away, even through a 25 cm concrete wall. Using speech-to-text APIs not trained on EM signals, we recover speech with as little as 14% error on the Harvard Sentences dataset. Comparable results are obtained using a low-cost copper tape antenna. We also show that existing defenses like resampling are ineffective and propose a new hardware mitigation based on clock randomization.
Recovering baseband digital signals typically requires wideband, strong coupling—e.g., TEMPEST Comeback used 25 MHz for low-rate serial data, far exceeding audio bandwidth. We investigate whether original audio can be recovered from the narrow sub-bands more commonly accessible to attackers through simulation.
Using MATLAB/Simulink with the Mixed Signal Blockset, we simulate a linear chirp (1–100 Hz at 100 Hz/s) and find that the original signal persists as FM around each harmonic. This indicates attackers could exploit these harmonics to eavesdrop on microphone-captured audio.
To validate our simulation, we extract EM signals from a Lenovo Thinkpad T480's microphones using a probe antenna on the back panel, while a speaker emits 64 dB speech-like audio. FM demodulation reveals the chirp’s frequency sweep, confirming the attack, despite some high-frequency attenuation and minor distortion.
We replay the Harvard Sentences and evaluate intelligibility using three transcription models—HuBERT (4.6% WER), Microsoft STT (3.1%), and OpenAI STT (2.6%)—demonstrating high reconstruction accuracy.
We assess the attacker's ability to extract signals by measuring SNR under varying volume, antenna types, orientations, and distances. Using a probe, loop, and Yagi antenna, we record up to 25 dB PSNR near the device and 11.59 dB at 25 cm, with the Yagi antenna outperforming others at longer ranges.
Reducing volume from 64 dB to 58 dB reveals a threshold effect: audio quality significantly degrades when PSNR drops below 0 dB beyond 10 cm.
Behind-the-wall scenario: We assess the accuracy of executing the attack in two adjacent rooms separated by a plaster wall (≈15 cm thick). The victim's laptop is placed on one side, and the attacker places the ALoop antenna on the other side. We evaluate distances of 15, 20, and 25 cm.
The speaker classification accuracy reaches 99% at 20 cm and drops to 97.3% at 25 cm, confirming the high intelligibility of the recorded leakage. We achieve a word error rate of 6.5%.
Long Distance Evaluation: We analyze intelligibility over a 1-meter distance using a horizontal AYagi at 461.887 MHz. The classification accuracy reaches 96.0% at 1 meter and drops to 91.6% at 2 meters. Beyond 4 meters, recovery accuracy declines significantly.
To evaluate the generality of the vulnerability, we test multiple devices with PDM microphones, including laptops (Lenovo L580, ASUS Chromebook, Redacted), a smart speaker (Google Home), and a headset (Jabra Evolve2 40 SE).
Tested laptops show consistent performance with >98% classification accuracy and ≤19% WER. The smart speaker and Redacted Laptop achieve ≥86% speaker and ≥90.3% digit classification, but with higher WER. The Jabra headset shows limited vulnerability (10% digit, 13.8% speaker classification, 100% WER), yet remains susceptible with an STOI of 0.5490 using the AProbe antenna.
Sampling Rate Randomization: Shielding is insufficient to fully mitigate PDM leakage, and other defenses like encryption or EM blinding incur power and performance costs. We explore sampling rate randomization as a mitigation, evaluating performance at 8, 16, 32, 40, and 48 kHz. Digit classification accuracy remains 96% at 8 kHz and 97% at 16 kHz, with WER improving slightly across all rates.
Clock Randomization: Spread-Spectrum Clocking (SSC) mitigates SoI by randomizing clock signals, making EM side-channels harder to exploit. We test deviations of 0.0%, 0.1%, 0.3%, and 1.0%, finding that as deviation increases, EM signal quality deteriorates (STOI drops from 0.7254 to 0.0490 at 1.0%). Meanwhile, the microphone’s performance remains stable (≈0.98 STOI), demonstrating SSC's effectiveness.
This research was funded by the JSPS KAKENHI Grant Number 22H00519, JST CREST JPMJCR23M4, and a gift from Meta. We thank the anonymous reviewers for their insightful feedback, Kohei Doi for the initial exploration that stimulated this work, and Daniel Olszewski & Tyler Tucker for proofreading support.