Classification of Infant Sleep–Wake States from Natural Overnight In-Crib Sleep Videos

Abstract

Infant sleep is critical for healthy development, and disruptions in sleep patterns can have profound implications for infant brain maturation and overall well-being. Traditional methods for monitoring infant sleep often rely on intrusive equipment or time-intensive manual annotations, which hinder their scalability in clinical and research applications. We present our dataset, SmallSleeps, which includes 152 hours of overnight recordings of 17 infants aged 4–11 months captured in real-world home environments. Using this dataset, we train a deep learning algorithm for classification of infant sleep–wake states from short 90 s video clips drawn from natural, overnight, in-crib baby monitor footage, based on a two-stream spatiotemporal model which integrates rich RGB frames with optical flow features. Our binary classification algorithm was trained and tested on "pure" state clips featuring a single state dominating the timeline (i.e., over 90% sleep or over 90% wake) and achieves over 80% precision and recall. We also perform a careful experimental study of the result of training and testing on "mixed" clips featuring specified levels of heterogeneity, with a view towards applications to infant sleep segmentation and sleep quality classification in longer, overnight videos, where local behavior is often mixed. This local-to-global approach allows for deep learning to be effectively deployed on the strength of tens of thousands of video clips, despite a relatively modest sample size of 17 infants.

Towards a computer vision based approach

Problem: Traditional methods are intrusive, expensive, and uncomfortable, limiting their widespread use

Gold standard for sleep monitoring is polysomnography (PSG)
PSG involves recording physiological signals using multiple contact sensors
PSG can cause discomfort and disrupt natural sleep cycles

Solution: Using computer vision techniques, we aim to develop a non-invasive, scalable system for infant sleep—wake classification from naturalistic overnight videos

Spatiotemporal models provide a scalable and accessible alternative for analyzing infant sleep—wake states
Home in on the strengths of computer vision to address the complexities of real-world infant behavior, moving toward a future of seamless, data-driven infant health monitoring

Overall Pipeline

Two-Stream Network Architecture

Processes 90-second video clips through parallel RGB and optical flow streams.
Each stream consists of two main stages:

1. Feature Extraction

Uses 3D ConvNet to extract spatiotemporal features from K sequential frames.

2. Classification

Incorporates self-attention mechanisms for temporal dependency modeling.
Applies temporal averaging and fully connected layers with ReLU activation.

Modality Selection/Fusion

Allows flexible stream combination: RGB alone, flow alone, or RGB + flow.
Final classification into Sleep/Wake states.

The architecture highlights two main processing stages: feature extraction (green blocks) and classification (purple blocks).

SmallSleeps Dataset Creation

Over 152 hours of overnight recordings recorded at 10 frames per second of 17 infants aged 4–11 months

Video cameras were sent to infants' homes and baby monitors were set up in cribs by caregivers and activated for overnight recordings
Behavioral coding employed to annotate sleep and wake states, allowing for non-invasive data collection

Each video reviewed in near-real-time by two research assistants who placed time markers for start and end of waking-like states
True wake states then inferred from the aggregation of both sets of codes

Feature Space Analysis

Our t-SNE visualization of the feature space reveals distinct patterns in how RGB and optical flow features capture sleep-wake states. The optical flow feature space shows clear separation between sleep and wake states, with subject-specific clusters that maintain their distinctiveness. This visualization helps explain the superior performance of optical flow features in our classification tasks, particularly in capturing the subtle movement patterns that distinguish wake states from sleep.

Impact

Contribution: Developed a spatiotemporal deep-learning model for infant sleep-wake classification trained on our own annotated real-world overnight infant sleep dataset

Impact: Provides a non-invasive, scalable solution for infant sleep monitoring, bridging the gap between research and real-world applications

BibTeX

@InProceedings{Moezzi_2025_WACV, author = {Moezzi, Shayda and Wan, Michael and Manne, Sai Kumar Reddy and Mathew, Amal and Zhu, Shaotong and Galoaa, Bishoy and Hatamimajoumerd, Elaheh and Grace, Emma and Rowan, Cassandra B and Zimmerman, Emily and Taylor, Briana J and Hayes, Marie J and Ostadabbas, Sarah}, title = {Classification of Infant Sleep-Wake States from Natural Overnight In-Crib Sleep Videos}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {42-51} }