Imaging through scattering media holds significant application potential in remote sensing, biomedical diagnostics, and industrial detection. However, conventional imaging systems fail to maintain robustness and generalization across diverse environments. Here, we demonstrate a photon-level single-pixel imaging system that exploits data-domain alignment to overcome this limitation. By coupling a physical preprocessing module with a deep neural network, the system translates scattering-induced degradations from different scattering media into a unified data domain, preserving the essential structure of the optical information. Under natural fog and rain conditions, the proposed method clearly reconstructs the fine target details at a distance of 150 m, demonstrating strong robustness across the scattering medium. With 0.088 photons per pattern per pixel in a single measurement, the 256 × 256 dynamic imaging is well reconstructed. These results establish a generalizable framework for photon-level imaging in diverse scattering media and highlight its promise for robust optical imaging under extreme atmospheric conditions.
- Advanced Photonics Nexus
- Vol. 5, Issue 2, 026007 (2026)
Abstract
Keywords
Video Introduction to the Article
1 Introduction
Imaging through scattering media plays a critical role in applications such as underwater rescue, autonomous driving, and biomedical imaging.1
Single-pixel imaging (SPI), owing to its unique imaging mechanism and broad spectral response, offers an effective balance between physical and computational methods.17,18 The single-point detection scheme, combined with efficient post-processing algorithms, enables SPI to suppress scattering-induced noise and enhance image quality when imaging through turbid media.19 Photon-level SPI technology is the key breakthrough of traditional SPI in the face of extremely low-light scenarios.20
In this work, a photon-level single-pixel imaging technique through scattering media based on data-domain alignment is proposed. This approach significantly suppresses the noise introduced by multiple scattering and improves the robustness of cross-media imaging. A preprocessing module is used to perform initial normalization on reconstructed images obtained through different scattering media, minimizing the data-domain gap among various types of degraded images. Then, the normalized images are processed by the proposed histogram prior compensation network (HPCnet), which incorporates histogram-based priors to compensate for medium-specific degradation while preserving physical consistency. As a result, the system can recover fine structural details under various real scattering media and shows strong generalization capability. The results demonstrate that the method can achieve dynamic imaging at a frame rate of 15 frames per second (fps) under a number of 0.088 photons per pattern per pixel. The design is compatible with both active and passive illumination in SPI, ensuring flexible deployment in a wide range of scattering scenarios. When integrated with multimodal sensors such as automotive radar and light detection and ranging systems, it can further enhance the robustness of environmental perception. This fusion can reduce the impact of weather variations on imaging results and provides reliable support for intelligent driving in complex environments, as shown in Fig. 1(a). The proposed technique is expected to become an important approach for addressing perception challenges in adverse weather conditions for intelligent driving systems.
Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you!Sign up now
![]()
Figure 1.Principle of deep photon-level single-pixel imaging through scattering media. (a) Schematic of scattering-media imaging scenarios for intelligent driving. (b) Workflow of the photon-level single-pixel imaging process through scattering media. (c) TVAL reconstruction results under various scattering noise conditions (original), preprocessing results using the PODC algorithm, and enhancement results obtained with U-Net and HPCnet, respectively. U-Net is trained on the original image dataset, whereas U-Net+ refers to the direct use of PODC preprocessing together with U-Net. This configuration is employed to evaluate the effectiveness of the PODC module when combined with a standard network architecture.
2 Methods
The proposed photon-level single-pixel imaging process through scattering media is as follows. First, image data are acquired using photon-level single-pixel imaging through various scattering media. Then, a deep neural network based on data-domain alignment is applied to enhance the image quality, as illustrated in Fig. 1(b).
The basic principle of photon-level SPI is to encode the incident light field using a spatial light modulator and to record the corresponding light intensity with a single-pixel detector. The target image is then recovered through an inverse reconstruction algorithm36
A prior-oriented denoising compensation (PODC) algorithm is introduced for domain normalization in our imaging system. By performing statistical normalization on the local neighborhood of each pixel in the original image, the pixel distribution is mapped to a unified prior distribution, thereby enhancing the representation of structural features. The proposed PODC not only improves the clarity of image details but also aligns degraded images obtained under different scattering media into the same data domain.39 This alignment facilitates more effective feature extraction by the neural network and improves its robustness. Specifically, for any pixel in the image, the local mean and the local standard deviation are computed within its neighborhood40
Based on the local mean and local standard deviation, each local region of the image is normalized as
To further improve the stability of domain alignment and suppress the enhancement of abnormal pixels, an optional nonlinear control operation is incorporated into PODC
However, relying on statistical normalization is insufficient to fully recover the texture details of the image. To better suppress the noise induced by multiple scattering and to improve the quality of the reconstructed images, a preprocessing-aware module is incorporated into the U-Net. Based on this modification, a specialized image enhancement network, termed HPCnet, is designed to adapt to the aligned data in Fig. 1(b). The computational efficiency of different networks is compared in Table S2 in the Supplementary Material.41,42 Specifically, a lightweight feature extraction module is first processes the PODC-pretreated reconstruction to obtain initial features. These features are then passed into the backbone network for hierarchical encoding and decoding. The backbone adopts a symmetric encoder–decoder architecture, consisting of three downsampling modules and three upsampling modules. The numbers of channels in the convolutional layers are 64, 128, and 256, respectively. Skip connections between the encoder and decoder are used to preserve feature information at different scales.
The training dataset consists of both experimental and simulated data. The experimental dataset contains 5,000 images acquired through scattering media with different concentrations, including fat emulsion, sediment suspension, and fog. The simulated dataset is generated from the experimental images by applying rotation, scaling, and adding a small amount of synthetic noise, with 20 simulated images produced from each experimental image. In total, the final dataset contains more than 100,000 images. This large-scale dataset ensures sufficient diversity for training and improves the generalization capability of the proposed network. Furthermore, the robustness of HPCnet arises from two key design aspects. First, the PODC preprocessing substantially reduces the domain discrepancy across different scattering media, guiding the network to learn structure-dependent representations rather than medium-specific degradation patterns, thereby improving cross-media generalization.43 Then, the training dataset incorporates a physics-informed scattering degradation model, providing physically meaningful supervisory signals that enhance the interpretability and physical consistency of the learned features.44
The MS-SSIM is adopted as the loss function, which is defined as45
As the MS-SSIM is less than 1, the final loss function is defined as
3 Results
To quantitatively evaluate the robustness and generalization ability of the proposed imaging system under different scattering conditions, simulation tests are conducted. A series of synthetic degradation models of scattering are constructed, including fog and turbid water models (see Fig. S2 in the Supplementary Material).46,47 The parameter denotes no degradation, whereas represents complete degradation where the target information is indistinguishable. A degradation type is considered dominant when its proportion exceeds 0.5, whereas the proportions of other noise types remain below 0.5. The scattering medium degradation model is applied to the resolution chart at varying proportions, and then the original images are obtained by TVAL, as shown in the first column of Fig. 1(c).
The degraded images are then enhanced using the proposed PODC algorithm, U-Net, and the designed HPCnet, as shown in the second to fifth columns of Fig. 1(c). The training sets of PODC and U-Net are -dominant original images, whereas the training sets of U-Net+ and HPCnet are -dominant images preprocessed by PODC. First, the enhancement results of different algorithms and neural networks on the -dominated original images are compared. It can be observed that U-Net+ achieves significantly better enhancement than U-Net, effectively reconstructing the digits disturbed by noise, which can be attributed to the PODC preprocessing. Furthermore, the proposed HPCnet achieves markedly better enhancement than U-Net+, fully reconstructing the detailed digit information. This result demonstrates that the proposed network offers superior adaptability.
To further verify the robustness of the proposed imaging system through different scattering media, enhancement performance is compared for - and -dominated images. The enhancement performance of both U-Net and U-Net+ decreases when the dominant noise type changes. Residual noise is still present in the reconstructed images, and the digits are not clearly recovered. In contrast, the results produced by HPCnet show clearly reconstructed digits while maintaining a clean background. Furthermore, to quantitatively demonstrate the high robustness of the proposed imaging system, the PSNR and MS-SSIM of the reconstructed images are calculated in Figs. 1(d) and 1(e). As the noise type changes, the PSNR of the images enhanced by HPCnet remains above 19.01 dB, whereas the MS-SSIM remains above 0.9. These results demonstrate that the proposed imaging system can effectively suppress noise under different scattering conditions and achieve high robustness in cross-media image reconstruction.
To verify the feasibility of the imaging system, an experimental setup is constructed as shown in Fig. S3 in the Supplementary Material. A 532 nm laser (MGL-III-532 nm) is used as the illumination source. The signal light reflected from the target passes through the scattering medium and is collected by an imaging lens, which projects it onto a digital micromirror device (DMD, UPOLabs HDSLM136D70-DDR). The modulated signal is then detected by a single-photon avalanche diode (SPAD, Siminics SPD500), and the ground truth images of targets are obtained by a CCD. Because quasi-static and dynamic scattering media impose markedly different effects on light propagation, the imaging system requires different parameter settings for sampling rate, frame rate, and illumination conditions under different scattering media. To accommodate imaging across multiple scattering environments and dynamic target scenarios, we systematically analyze the influence of pattern playback frame rate and illumination intensity on the quality of reconstructed images at different sampling rates.
Figure 2(a) shows the effect of the playback frame rate on the quality of the reconstructed image under sampling rates ranging from 1.0% to 4.0%. Due to the playback frame rate limitation of the DMD, the maximum achievable imaging frame rates corresponding to sampling rates of 1.0%, 2.0%, 3.0%, and 4.0% are 30.4, 15.2, 10.1, and 7.6 fps, respectively. As the playback frame rate increases, the image quality decreases across all sampling rates. This degradation mainly results from the shorter display time of each pattern, which reduces the number of photons captured by the SPAD. The corresponding reconstructed images are shown in Fig. S4 in the Supplementary Material. At the same frame rate, increasing the sampling rate does not yield higher-quality reconstructed images. This is because once the imaging quality reaches saturation, additional patterns only introduce more environmental noise.
![]()
Figure 2.Robustness verification of the proposed system under different scattering coefficients. (a) Effect of the imaging frame rate on the MS-SSIM of original images at different sampling rates. (b) Effect of the number of photons per pattern per pixel on the MS-SSIM of original images at different sampling rates. (c) Reconstructed images obtained using SCU-Net, DPIR, U-Net+, and HPCnet under different concentrations of fat emulsion (corresponding to different scattering coefficients). The “MS-SSIM/PSNR” values are indicated below the corresponding images.
Figure 2(b) shows the effect of photon number on the MS-SSIM of reconstructed images at sampling rates ranging from 1.0% to 4.0%. When the photon number per pattern per pixel is sufficiently large, the quality difference among reconstructed images at different sampling rates becomes negligible. As the photon number decreases, the reconstruction quality at low sampling rates deteriorates rapidly. In contrast, reconstructed images at higher sampling rates exhibit a clear advantage. The corresponding reconstructed images are shown in Fig. S5 in the Supplementary Material. These experimental results show that in low-light environments, excessively low sampling rates result in the loss of fine structural details. Conversely, extremely high sampling rates limit the achievable imaging frame rate, which becomes insufficient for imaging through dynamic scattering media or dynamic targets. Therefore, a trade-off strategy balancing frame rate and sampling rate is adopted in the following experiments. For quasi-static scattering media, a higher sampling rate of 4.0% is employed to ensure the reconstruction of target details under low-light conditions. For dynamic scattering media (fog environment), a 2.0% sampling rate is employed for static targets, while a 1.0% sampling rate is employed for dynamic targets.
To verify the effectiveness of the designed PODC, comparative experiments are conducted using different preprocessing algorithms. Diluted fat emulsions of varying concentrations are used as scattering media.48 The optical thickness and scattering coefficients of the mixed solutions are controlled by adding different volumes of standard fat emulsion into a fixed amount of water, as shown in Table S1 in the Supplementary Material. Figure S8 in the Supplementary Material shows the reconstruction results using Gamma, Retinex, and PODC under different scattering coefficients.49
To further validate the robustness of the proposed imaging system under different scattering conditions, imaging experiments are performed using a resolution chart placed behind fat emulsion, fog, and sediment suspension. Figure 3(a) compares the reconstructed results of TVAL, SCU-Net, DPIR, U-Net+, and HPCnet. Under the condition of fat emulsion, the original images exhibit extremely low contrast and severely degraded edges. SCU-Net, DPIR, and U-Net+ all improve the overall contrast of the reconstructed image to a certain extent, but their image quality is still significantly worse than HPCnet. There is still a lot of speckle noise in the reconstruction images that is not effectively suppressed, and the recovery of the detail structure is limited. The corresponding PSNR and MS-SSIM of the reconstructed images are calculated in Table S3 in the Supplementary Material. As the scattering environment becomes fog and sediment suspension, the robustness of the above comparison method is significantly reduced, the reconstructed image is seriously degraded, and some target structures are difficult to identify. In contrast, the proposed method exhibits stable and consistent imaging performance under different scattering medium conditions. It can effectively suppress the speckle noise introduced by strong scattering and maintain a clear structural recovery capability in the cross-scattering environment imaging process, reflecting better robustness and generalization performance. The histogram distribution through the various scattering media is shown in Fig. S9 in the Supplementary Material, showing the adaptability of the proposed method across different scattering media.48
![]()
Figure 3.Robustness verification of the proposed system across different scattering media. (a) Imaging results of resolution charts through fat emulsion, fog, and sediment suspension (abbreviated as Sed. Susp.) using TVAL, SCU-Net, DPIR, U-Net+, and HPCnet. (b) Imaging results of speed-limit signs using TVAL, SCU-Net, DPIR, U-Net+, and HPCnet. (c) Cross-sectional profiles along the dashed lines of the reconstructed images. (d) PSNR and (e) MS-SSIM of reconstructed speed-limit signs obtained by TVAL, SCU-Net, DPIR, U-Net+, and HPCnet.
To validate the practical applicability of the proposed imaging system, long-distance outdoor experiments are conducted using an active illumination scheme. A traffic sign positioned 70 m away is selected as the target, and a fog environment is created by placing a humidifier in front of the imaging system, as shown in Fig. S10(a) in the Supplementary Material. Figure 3(b) shows the reconstructed images of the speed-limit sign obtained using TVAL, SCU-Net, DPIR, U-Net+, and HPCnet. Under dense fog environment, the original images exhibit severe noise contamination. SCU-Net improves image contrast but retains substantial noise. Images reconstructed by DPIR exhibit poor contrast, with severe smearing between target edges and the background, resulting in unclear boundaries. Although U-Net+ has a significant improvement in image contrast, it also causes the loss of structural detail information. In contrast, our imaging system reconstructs high-quality images of the speed-limit sign with clear edge details. Additional imaging results of traffic signs are provided in Fig. S10(b) in the Supplementary Material. Furthermore, Fig. 3(c) shows the intensity profiles along the central cross-sections of the reconstructed images. Compared with other networks, the proposed system produces steeper edge transitions and smoother background regions, demonstrating its dual advantage in edge enhancement and background suppression. In addition, the PSNR and MS-SSIM of the reconstructed images are calculated, as shown in Figs. 3(d) and 3(e). The proposed system achieves a maximum PSNR of 23.06 dB and an MS-SSIM of 0.83, both significantly higher than those of conventional methods.
To verify the imaging performance of the proposed system under extreme weather conditions, passive imaging of a clock located 150 m away is performed under natural illumination, as shown in Fig. 4. Figure 4(a) shows the real testing environment and the corresponding photograph of the passive imaging system. Figure 4(b) shows the reconstructed images obtained using TVAL, SCU-Net, DPIR, U-Net+, and the proposed system under different weather conditions, including heavy rain and dense fog. Due to the nonuniformity of natural illumination, the original images reconstructed directly with TVAL contain significant noise, with target details nearly submerged and difficult to recover. Under daytime heavy rain, the digits on the clock face reconstructed by SCU-Net, DPIR, and U-Net+ appear largely blurred. Under daytime dense fog, SCU-Net, DPIR, and U-Net+ all fail to fully recover the submerged edge structures of the target. Moreover, in nighttime heavy rain, the background intensity of the reconstructed images increases, leading to further degradation of image contrast. The reconstructions by SCU-Net, DPIR, and U-Net+ also exhibit more severe smearing artifacts. In contrast, benefiting from the designed preprocessing module and deep enhancement mechanism, the proposed system successfully reconstructs the complete contour and fine details of the clock under all weather conditions, demonstrating high robustness for imaging through atmospheric environments.
![]()
Figure 4.Robustness verification of the proposed system under different extreme weather conditions. (a) Passive imaging environment under natural illumination. (i) and (ii) Photograph and schematic of the imaging setup. (b) Reconstructed images of a clock located 150 m away under different weather conditions using TVAL, SCU-Net, DPIR, U-Net+, and the proposed system. (c) Extracted frames over time from the imaging of a flashing traffic light located 70 m away in a fog environment using the proposed system.
Furthermore, to evaluate the reconstruction performance of the proposed imaging system for dynamic targets, passive imaging experiments of a traffic light located 70 m away are conducted under fog environment. Figure 4(c) shows frame images extracted from the recorded video, and the corresponding dynamic traffic light video is provided in Supplementary Movie 1. Under a number of 0.088 photons per pattern per pixel, the proposed imaging scheme consistently identifies the luminous regions and reconstructs their shapes across multiple time frames, achieving an imaging frame rate of 15 fps. These results demonstrate that our system exhibits high robustness and applicability for both static and dynamic targets under natural heavy rain and dense fog, highlighting its potential for practical implementation.
4 Conclusion
In summary, a photon-level single-pixel imaging technology through scattering media based on data-domain alignment is proposed, which effectively addresses the severe image degradation encountered by conventional imaging systems under diverse scattering conditions. Comprehensive experiments demonstrate that the technology exhibits high robustness and stability across diverse scattering environments, offering clear advantages in structural detail preservation and contrast enhancement. The effects of imaging frame rate and photon number on image quality under different sampling rates are systematically analyzed, leading to the optimization of system parameters for imaging through various scattering media. Then, based on a comparison of imaging performance using various preprocessing algorithms under different scattering coefficients, the effectiveness and necessity of the designed PODC module are verified. Next, image reconstruction of a speed-limit sign located 70 m away is performed under fog conditions. Unlike conventional enhancement networks, which are often overfitted to a single scattering scenario, HPCnet explicitly models cross-domain adaptation, enabling robust generalization across diverse scattering environments. Under natural fog and rain, the proposed method clearly reconstructs the details of the clock tower at a distance of 150 m, demonstrating strong cross-media generalization and practical applicability. Remarkably, under a number of 0.088 photons per pattern per pixel, the proposed system also achieved dynamic imaging of a traffic signal with a frame rate of 15 fps. These results highlight the promise of the proposed technique for practical applications such as traffic monitoring, long-range security surveillance, underwater detection, and biomedical imaging, providing a feasible pathway for reliable target perception in low-light and strongly scattering environments.54
Acknowledgments
Acknowledgment. This work was supported by the National Natural Science Foundation of China (Grant Nos. 62305239, U23A20380, 62127817, and 6191101445), the Science and Technology Major Special Project of Shanxi Province (Grant No. 202201010101005), the National Key Research and Development Program of China (Grant No. 2022YFA1404201), and the Fundamental Research Program of Shanxi Province (Grant No. 202203021222133).
Liantuan Xiao is a professor and PhD supervisor at the College of Physics and Optoelectronic Engineering, Taiyuan University of Technology, and a distinguished professor under the Changjiang Scholars Program (Ministry of Education, China). He received his BS degree (1989), MS degree (1997), and PhD (2001) in physics from Shanxi University. His research focuses on precision measurement physics and single-photon communication and imaging. He has published over 200 papers, including in Nature Physics, Nature Communications, and Physical Review Letters.
Biographies of the other authors are not available.
References
[10] K. He et al. Single image haze removal using dark channel prior, 1956-1963(2009).
[13] J. Wang et al. Fast non-local algorithm for image denoising, 1429-1432(2006).
[27] X. Liu et al. Photon-limited single-pixel imaging. Opt. Express, 28, 8132(2020).
[40] A. Ortiz et al. Local context normalization: revisiting local normalization, 11276-11285(2020).
[45] Z. Wang et al. Multiscale structural similarity for image quality assessment, 1398-1402(2003).
[51] A. Hore, D. Ziou. Image quality metrics: PSNR vs. SSIM(2010).
[56] J. Bertolotti, O. Katz. Imaging in complex media. Nat. Phys., 18, 1008-1017(2022).
[57] G. Satat et al. Towards photography through realistic fog, 1-10(2018).

Set citation alerts for the article
Please enter your email address


AI Video Guide
AI Picture Guide
AI One Sentence


