What sleep science says about screenless fitness trackers

When Massimiliano de Zambotti and his colleagues at SRI International put the Oura Ring against laboratory polysomnography in 2019, the finding that lodged in the scientific record wasn’t the number the company wanted to lead with. The ring detected sleep with 96 percent sensitivity across 41 healthy adolescents and young adults, catching nearly every minute a person was genuinely asleep. Wakefulness was another matter entirely: the device correctly identified it only 48 percent of the time. Half the moments a wearer lay still but awake, staring at a ceiling or scrolling a phone, the tracker reported sleep. That asymmetry, brilliant at sensing unconsciousness and nearly blind to quiet wakefulness, has shaped every debate about screenless fitness trackers since. It reads differently now that Google has thrown a $99.99 wristband into a category already defined by that tension.

Published in Behavioral Sleep Medicine, the de Zambotti study was never an indictment of Oura. It was a clear-eyed description of what a photoplethysmography sensor on a finger can and cannot do. PPG detects the autonomic shifts that accompany sleep onset, when heart rate slows, respiratory rate steadies, and peripheral blood flow changes. These signals are strong enough to flag sleep-wake transitions with high sensitivity. Distinguishing motionless wakefulness from light sleep, however, requires EEG, which no consumer device has. A ring can catch nearly every minute of sleep. Optical sensors alone cannot correctly label every minute of stillness that isn’t sleep. The physics of the sensor imposes the ceiling, and that ceiling hasn’t moved since 2019.

What followed de Zambotti was unusually transparent for a product category that generates billions in revenue. Oura kept funding independent validation, and the sleep-research community saw consumer wearables as a cohort-scale data opportunity too useful to ignore. In 2024, Thomas Svensson and colleagues at the University of Tokyo published the most comprehensive validation to date in Sleep Medicine, testing the Oura Ring Gen3 with the updated OSSA 2.0 algorithm against PSG in 96 healthy adults. Overall sleep-wake classification accuracy landed at 91.7 to 91.8 percent, with sensitivity holding at 94.4 to 94.5 percent. The ring did not significantly differ from PSG on total sleep time, sleep onset latency, or time spent in light and deep sleep. It overestimated REM by roughly 17 minutes and underestimated N3 by roughly 20, biases consistent across PPG-based devices and largely attributable to the same autonomic-versus-cortical gap that de Zambotti documented five years earlier.

Notably absent from Svensson 2024 is the wake-specificity figure, the 48 percent headline from 2019, reported nowhere as a standalone metric. For anyone reading these studies as a consumer rather than a researcher, the omission is meaningful. Improvement is real but difficult to quantify without the counterpart number, and the omission itself signals which metrics the validation community considers actionable and which it treats as known limitations of the modality.

That gap, between what a PPG sensor can detect and what a consumer wants to know, is the quiet architecture under every screenless tracker launch of the past three years. And the launches keep coming.

On May 7, 2026, Google unveiled the Fitbit Air, a $99.99 wristband with no screen, no haptics beyond a single notification LED, and a sensor suite (optical heart rate, accelerometer, gyroscope, skin temperature, SpO2) that mirrors the hardware Whoop and Oura have been shipping for years. It pairs with the Google Health app and a Gemini-powered Health Coach that costs $9.99 per month after a three-month trial. Battery life is seven days. The band, including strap, weighs 12 grams. By a considerable margin, this is the cheapest entry point into continuous physiological monitoring from a major platform company, entering a market defined by two sharply different business models. Whoop charges $239 per year for its Peak membership and includes the band. Oura sells the Gen4 ring starting at $349 with an optional $5.99 monthly subscription for advanced features. Fitbit Air decouples hardware from software entirely: $99.99 buys the band, and the AI coaching layer is an opt-in add-on.

Will Ahmed, Whoop’s founder and CEO, has been making a version of the same argument since before the category had a name. “If it has a screen, then it’s a watch,” he told the Wall Street Journal in the week of the Fitbit Air launch. “If it’s a watch, then you can’t wear two watches.” The logic is plain and backed by behavioral data manufacturers collect internally: Whoop users wear the device 24 hours a day at retention rates the company says exceed 80 percent at 12 months. Removing the display is not a cost decision. It is the feature that makes continuous wearing possible without competing for wrist real estate against the Apple Watch or Garmin a user already owns.

The research record, however, asks a question the marketing does not answer. If the device is always on, and the data it produces carries known error margins, what is the user actually getting? Answers that emerge from the published validation literature are more nuanced than accuracy percentages convey. A screenless tracker’s value does not hinge on whether it matches PSG on any given night; nothing short of EEG will. The question is whether its output is consistent enough, over weeks and months, to surface patterns the wearer can act on. Dean Miller and his group at Central Queensland University demonstrated exactly this in a 2022 multi-device study in Sensors: across 53 adults wearing six different wearables, the between-night variability in sleep-stage estimates was often larger than the between-device differences. Which device someone wore mattered less than the fact of wearing one consistently. The form factor that enables consistency, with no charging breaks and nothing to take off, may be the variable that actually drives the outcome.

Market data suggests consumers are drawing the same conclusion. U.S. purchases of screenless fitness trackers grew 88 percent year over year from 2024 to 2025, according to Circana data reported by the Journal. Smart ring purchases jumped 195 percent in the same window. Forecasts for 2026 project another 67 percent growth.

The sensitivity-specificity trade-off no marketing page will show you

For a consumer, the numbers that matter most are not concordance coefficients or epoch-by-epoch agreements. They are the two questions every wearable user eventually asks: did I really sleep that poorly, and is my resting heart rate actually trending down? On nocturnal heart rate variability, the metric underpinning readiness and recovery scores across Whoop, Oura, and now Fitbit Air, a 2025 multi-device validation in Physiological Reports offers the most direct comparison available. Across 536 nights and 13 adults, Oura’s Gen4 ring achieved a concordance correlation coefficient of 0.99 against ECG for nocturnal HRV. Whoop 4.0 hit 0.94. Both figures are strong enough for longitudinal trend tracking. Neither device is a medical instrument, and neither claims to be.

The clinical-population caveat is starker and gets far less attention. Ingo Fietze and colleagues at Charité in Berlin published a study in Nature Scientific Reports in 2025 that ran the Oura ring against PSG in sleep-lab patients, people with suspected disorders rather than the healthy volunteers who populate the Svensson and de Zambotti cohorts. Four-stage sleep classification accuracy fell to 53.18 percent. The ring could not reliably distinguish N1 from N2 from N3 from REM in symptomatic individuals. This does not invalidate the consumer use case, but it draws a bright line that no product page will volunteer: screenless trackers are population-level tools, not diagnostic instruments. A healthy person curious about their sleep architecture is getting informative data. Someone with symptoms needs a lab.

The split between healthy-volunteer validation (91.7 percent accuracy in Svensson 2024) and patient-cohort validation (53 percent in Fietze 2025) is the single most important fact about this product category, and it appears in precisely zero marketing materials from any of the three companies. Seven years of published validation studies, most of them funded or co-authored by the companies whose devices are being tested, represents an unusual degree of transparency for consumer electronics. What the literature says and what the average buyer believes remain different stories.

Why the form factor matters more than the numbers

One reading of the evidence goes like this: the devices are directionally accurate but not precise, the algorithms are proprietary and change without notice, the hardware revision cycle outpaces the validation cycle, and the clinical-grade bar remains distant. All of it is true. And yet the category is growing at 67 percent.

What that growth indicates is that accuracy, past a certain floor, is not what drives adoption. The behavioral loop does. A device always on the body, one that never demands attention, surfacing a single recovery score in the morning, creates a fundamentally different relationship than a smartwatch buzzing with notifications and asking to be charged every 18 hours. Omair Khaliq Sultan, writing in Digital Trends, called this “letting the device serve you” rather than the reverse. The phrase has the ring of marketing, but the behavioral mechanism is real: notification fatigue is measurable, and a screenless tracker by definition cannot add to it.

Accountability matters as well. When a Whoop or Oura user sees a recovery score of 42 after a night of drinking, the number is not medically precise, but the directional signal is loud enough to change behavior. Multiple studies in the broader behavior-change literature, outside the narrow device-validation canon, find that passive self-monitoring, even with imperfect instruments, produces measurable improvements in sleep duration and subjective quality. Effect sizes are modest but replicable, and they do not require clinical-grade hardware to emerge.

Google’s entry with the Fitbit Air validates the category in the way only a platform company’s participation can. The Health Coach AI, powered by Gemini and trained on the combined data of Fitbit’s install base numbering in the tens of millions, represents a bet that the interpretative layer matters more than the sensor layer going forward. If the raw PPG signal from a $99 band is good enough to feed an AI that can tell a user what it means, differentiation moves from hardware to software. That is a different thesis from Whoop’s, which has invested heavily in proprietary sensor calibration and in-house validation, and from Oura’s, which has built the deepest peer-reviewed validation record of any consumer wearable company. Three business models, one sensor modality, and a shared dependence on the fact that consumers do not actually need clinical precision to find the data worth paying for.

The de Zambotti finding from 2019, 96 percent sensitivity paired with 48 percent specificity, was less a verdict on one ring and more a preview of the entire category’s epistemological ceiling. Seven years and a dozen validation studies later, the ceiling has not moved much. But the category under it has filled with devices, millions of users, and now a tech giant willing to sell the band at cost and make money on the AI that interprets it. The science did not need to be flawless. It needed to be consistent enough for the market to decide it was good enough.

References

de Zambotti M, Rosas L, Colrain IM, et al. The sleep of the ring: comparison of the ŌURA sleep tracker against polysomnography. Behavioral Sleep Medicine. 2019;17(2):124-136. https://doi.org/10.1080/15402002.2017.1300587
Svensson T, Madhawa K, Ta HN, et al. Validity and reliability of the Oura Ring Gen3 with OSSA 2.0 for sleep parameter estimation in healthy adults. Sleep Medicine. 2024;115:88-97. https://doi.org/10.1016/j.sleep.2024.01.020
Miller DJ, Sargent C, Roach GD. A validation of six wearable devices for estimating sleep, heart rate and heart rate variability. Sensors. 2022;22(17):6317. https://doi.org/10.3390/s22176317
Miller DJ, Lastella M, Scanlan AT, et al. A validation study of the WHOOP strap against polysomnography to assess sleep. Journal of Sports Sciences. 2020;38(22):2631-2636. https://doi.org/10.1080/02640414.2020.1797448
Validation of nocturnal resting heart rate and heart rate variability in consumer wearable devices. Physiological Reports. 2025;13(8):e70297. https://doi.org/10.14814/phy2.70297
Fietze I, et al. Wearable finger ring trackers for diagnostic sleep measurement: a comparative validation study in clinical populations. Nature Scientific Reports. 2025;15:93774. https://doi.org/10.1038/s41598-025-93774-z

Screenless fitness trackers are winning as Fitbit Air joins Whoop and Oura

The sensitivity-specificity trade-off no marketing page will show you

Why the form factor matters more than the numbers

References

More from Fitness

The case for screenless fitness trackers: what a 2019 sleep lab proved

Cold Plunges and Your Brain: What the Science Actually Shows

Too Many Endurance Athletes Are Racing on Too Few Carbs, Study Finds