- Advanced Photonics Nexus
- Vol. 5, Issue 3, 036015 (2026)
Abstract
1 Introduction
Photoacoustic computed tomography (PACT) is an innovative hybrid imaging technique that combines optical absorption contrast with high-resolution ultrasound detection in a single system.1
An inherent challenge in PACT is the limited-view problem,19
Over the past decade, researchers have devoted great efforts to developing hardware and computational methods to address the limited-view problem. Approaches such as using curved transducers24 and multiple receive configurations25 can widen the receiving angle. By applying sparse illumination26,27 or introducing exogenous probes28
Sign up for Advanced Photonics Nexus TOC. Get the latest issue of Advanced Photonics Nexus delivered right to you!Sign up now
Regarding computational methods, model-based reconstruction can help mitigate the limited-view problem, but is computationally intensive and not yet feasible for efficient implementation.31
Recently, deep learning approaches have been developed to address the limited-view problem.39 For instance, various deep learning approaches, including U-Net,40
Ultrasound has long been used to support PA imaging,52 including efforts to improve signal quality. A study correlated the phase of ultrasound echoes with PA waves to form a coherently combined signal with a higher signal-to-noise ratio (SNR),53 whereas others used structural priors from ultrasound or other modalities to alleviate the limited-view problem by model-based reconstruction.54 However, many demonstrations have been conducted primarily in simulation or on phantoms, and in vivo gains are often curtailed by clutter and speckle in conventional B-mode ultrasound, which reduces the effectiveness of downstream PA enhancement.
Ultrasound power Doppler (US-PD) imaging is a powerful tool for vascular imaging with a high SNR.55 The introduction of singular value decomposition (SVD) filtering56 and post-processing algorithms57,58 has enabled the detection of blood flow59 and visualization of small vessels, resulting in more detailed vascular images.60 These advancements make US-PD an effective complement to PACT. Several studies have demonstrated the advantage of dual-modal US-PD and PA imaging, particularly in complex vascular environments.61,62 Falco et al.61 used US-PD to estimate and compensate fluence. Tang et al.62 employed vessel segmentation results from the ultrasound Doppler image as a mask to analyze the PA features. Although simple integration of US-PD and PA images can utilize complementary information in both modalities, lost PA features due to limited view still cannot be effectively restored.
Here, we present a Doppler-enhanced PA network (DEPANet) that leverages deep learning to fuse rich anatomical information in US-PD with the functional data provided by PACT. In this network, multi-wavelength PA images and a co-registered US-PD image are first integrated together via channel attention and spatial attention; then, multi-scale feature fusion and cross-attention are used to prioritize full-view vascular details in the US-PA image and focus on functional information in the multi-wavelength PA images. DEPANet addresses the limited-view problem and reveals many features that are otherwise invisible. We evaluate our model on simulation, phantom, animal, and human data. The results demonstrate significant improvements in contrast-to-noise ratio (CNR) and oxygen saturation () measurement, particularly in large tilted vessels. DEPANet preserves critical information from both modalities while enhancing overall image quality, providing a more accurate and comprehensive representation of vascular structures and oxygenation.
2 Method and Materials
2.1 Doppler-Enhanced Photoacoustic Network
As illustrated in Fig. 1, DEPANet is designed to enable efficient cross-modal feature extraction and fusion, fully leveraging the complementary strengths of PA and US-PD for vascular visualization. The inputs consist of the original PA image and a binarized US-PD image, which respectively provide information on vascular optical absorption and hemodynamic structure. Both inputs are processed through identical multi-level downsampling branches, producing spatially aligned feature maps that capture intensity-based and morphological vascular characteristics at multiple scales.
![]()
Figure 1.Flowchart of the method for enhancing the PACT result by fusing power Doppler.
The first stage of the network is the Doppler-PA feature integration (DPFI) module, which integrates modality-specific features from both branches. The DPFI incorporates channel and spatial attention mechanisms63: channel attention64 is applied to the PA branch to highlight hemoglobin-related optical absorption features; spatial attention65 is applied to the US-PD branch to enhance geometric structures, particularly oblique vessels that are less prominent in PA imaging.
The second stage, multi-scale feature fusion (MSFF), implements a top-down, coarse-to-fine fusion pathway to balance global context and local detail. The third stage introduces cross-attention,66,67 where queries () from the US-PD branch guide the weighted aggregation of fused features (keys and values ), improving structural fidelity and detail restoration.
The detailed structure is shown in Fig. 2. Both PA and US-PD branches adopt progressive residual68 stacks (see details in Note S1 and Fig. S1 in the Supplementary Material): Res(3) at S1 to S2 for low-level feature extraction and Res(5) at S3 to S5 to model longer-range dependencies—yielding stable and efficient multi-scale learning. At each stage (S1 to S5), features from the two branches are integrated via a DPFI module [Fig. 2(b)]: a bottleneck convolutional block first reduces dimensionality and aggregates local features before restoring the channel dimension; the PA branch applies a two-stage channel attention mechanism [Fig. 2(c)] for coarse-to-fine information selection; the US-PD branch employs spatial attention with kernel sizes of and [Fig. 2(d)] to reinforce vessel boundaries and fine details. The outputs from both attention pathways are concatenated and passed through a three-unit residual stack to generate modality-aware feature representations.
![]()
Figure 2.Details of the DEPANet. (a) Overall architecture of the fusion network. (b) DPFI module. (c) Channel attention block. (d) Spatial attention block. (e) MSFF module.
The fusion pathway consists of four MSFF modules [Fig. 2(e)]. Each MSFF uses a transposed convolution (stride = 2) to upsample the deeper map and fuses it with the next shallower output:69
Finally, the cross-attention module, guided by vectors from the US-PD branch, selectively focuses on vessel regions. To ensure accurate dot-product attention computation, convolutions are applied before and after the attention operation to match channel dimensions.
By combining multi-scale modality-specific feature extraction, attention-driven integration, hierarchical feature fusion, and structure-aware refinement, DEPANet achieves anatomically consistent and high-contrast vascular reconstructions with low computational cost, making it particularly suitable for computationally efficient linear-array PA imaging applications.
2.2 Dataset Preparation
The vessel data are from the public photoacoustic microscopy dataset.72 We retained only the blood vessels with larger connected areas, which generates the binarization result of US-PD. This serves as input to the network, providing structural information about the vessels. In the -wave simulation environment,73 the initial sound pressure is applied to the binary structure, representing the ground truth for the network. The sound pressure then propagates outward, and the simulation transducer collects data. Key parameters are shown in Note S2 and Table S1 in the Supplementary Material. The weighted delay-and-sum (DAS) algorithm74 reconstructs the PA image, which is used as another input to the network. The preparation workflow diagram is shown in Fig. S2 in the Supplementary Material.
In total, we prepared 4184 datasets. Each dataset consists of three images: a US-PD mask, a reconstructed PA image, and an initial pressure distribution. These datasets were then used for training, validation, and testing of the deep learning model, as shown in Table 1.
| Training | Validation | Testing | All | |
| Amount | 2530 | 414 | 1240 | 4184 |
Table 1. Datasets for simulated training, validation, and testing.
2.3 Deep Learning Configuration
Deep learning optimizes the enhanced output of through an end-to-end mechanism. The following loss function is used in the optimization:
In our deep learning framework, the models were trained using a learning rate of and a batch size of 5 across 350 epochs. We used the Adam optimizer with a weight decay rate of for optimization, ensuring stable and effective training. We applied early stopping to mitigate overfitting based on validation loss.
Hyperparameters were selected using a small pilot search on the training/validation split. For the learning rate, we evaluated for 10 to 15 epochs each and chose the largest value that yielded stable optimization with monotonically decreasing validation loss (final: ). Weight decay was tested in and was selected. Batch size {1, 2, 5, 8} was compared, and 5 was chosen as a trade-off between throughput/memory and validation performance. Loss-related hyperparameters (, ) were set empirically following common practice and kept fixed across experiments.
The network was implemented using the PyTorch library (version 1.12) and executed on an NVIDIA GeForce RTX 4090 D GPU. Computational efficiency of the system allows for a single calculation to be completed in 0.141 s, with the model occupying 13.6 GB of GPU memory.
2.4 Dual-Modal Ultrasound and Photoacoustic Tomography System
The dual-modal imaging system uses a high-energy Nd:YAG pulsed laser and an optical parametric oscillator (Amplitude, SL I 20, San Francisco, California, United States) for PA excitation. The laser can generate 700- and 750-nm nanosecond pulses at 20 Hz. To minimize motion artifacts during human experiments, we incorporated an additional Nd:YAG pulsed laser (Spectra-Physics, Milpitas, California, United States) capable of emitting a 1064-nm wavelength. The laser beam is guided through a customized fiber bundle, which spreads the beam into two rectangular-shaped illumination areas on two sides of a linear ultrasound probe. The laser beams illuminate at 1.5 cm away from the end surface of the ultrasound probe. The ultrasound probe is a 128-element linear array (Verasonics, L11-4V, Kirkland, Washington, United States) and has a central frequency of 6.25 MHz with 96% bandwidth.
Ultrasound and PA data acquisition are implemented in an ultrasound research platform (Verasonics Vantage System, Kirkland, Washington, United States) at 25 MHz. The system generates three ultrasound plane waves at angles of , 0, and 10 deg. This sequence is repeated 80 times to capture sufficient US-PD and Vector Doppler data within 38.4 ms. After the ultrasound acquisition, the system triggered the laser to acquire PA data. In both phantom and rat experiments, we first acquired data at 700 nm, then switched to 750 nm for the subsequent acquisition. In human studies, we employed a synchronized dual-wavelength triggering scheme with a 5-ms interval between 700 and 1064 nm. The Verasonics system enabled continuous acquisition at both wavelengths. The dual-modality system operated at a working frequency of 20 Hz. The system and sequence diagram are shown in Fig. S3 in the Supplementary Material.
PA and ultrasound images are reconstructed using a weighted DAS algorithm. The reconstruction process is implemented using MATLAB (2021a) and executed on a computer with an Intel Core i5 processor (1.60 GHz) and 16 GB of RAM.
2.5 US-PD Imaging
To estimate the US-PD signal, the spatiotemporal structure of the data was analyzed using SVD. First, DAS beamforming was applied to temporally windowed blocks of the compounded ultrasound signal with three angles, where for and sampling interval . An SVD-based clutter filter was then used to suppress tissue components and retain the higher-temporal-frequency blood flow signal, yielding . The filtered data were subsequently integrated to form the US-PD image PDI using the following equation:
2.6 Animal Experiments
PACT in vivo experiments were performed on 10-week-old and 12-week-old male Sprague–Dawley rats for neck imaging. Anesthesia was administered using an inhaled mixture of isoflurane and air. Prior to imaging, hair was removed from the neck to the chest to ensure proper acoustic coupling. Ultrasound gel was applied between the neck and the water tank interface to maintain effective signal transmission while allowing unobstructed breathing. A heating pad set to 37°C was placed beneath the rat to maintain body temperature during the procedure. For the human studies, a 25-year-old male volunteer was recruited for arm imaging, and a 30-year-old male volunteer was recruited for neck imaging. The participant wore laser safety goggles throughout the procedure to ensure eye protection. All animal and human studies were approved by the Animal Ethics Committee of City University of Hong Kong (Protocol No. 11105021).
3 Results
3.1 Predicted Results for the Testing Set
After completing model training, we assessed the prediction accuracy by comparing the model output with the ground truth in the test, validation, and training datasets. All numerical results are reported to three significant digits for consistency, as shown in Table 2. The loss function, calculated using intensity-normalized images, yielded 0.0209 for the testing set, which is closely aligned with the values from the training (0.0191) and validation (0.0183) sets. This similarity suggests that the model works well across different datasets.
| Training | Validation | Testing | |
| Number | 2530 | 414 | 1240 |
| Loss | 0.0191 | 0.0183 | 0.0209 |
| SSIM | 0.981 | 0.980 | 0.979 |
| PSNR | 25.8 | 25.7 | 25.2 |
Table 2. Evaluation results for the simulated training, validation, and testing sets.
We further evaluated the structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) to quantify the output quality. The SSIM for the test set reached 0.979, and the PSNR was 25.2 dB. Both metrics indicate that the model effectively captures the relevant features and produces high-quality predictions with minimal distortion in the testing dataset. These results confirm that the network can effectively use the vascular structural information in US-PD to enhance the PA images across these datasets.
Given the precision of oxygen saturation () calculations is closely tied to the SNR of the PA signal, we simulated PA data using the -wave toolbox at two optical wavelengths, 700 and 750 nm, for further testing. The calculation method is shown in Note S3 in the Supplementary Material. The initial pressure intensity was set based on the absorption coefficients of the paired wavelengths. Other settings were also set based on data preparation in Table S1 in the Supplementary Material. Figure 3 illustrates the improvements observed in the simulated set.
![]()
Figure 3.Testing set results. (a) and (b) Raw simulated PA reconstructions at 700 and 750 nm. (c)
Figures 3(a) and 3(b) show the raw simulated PA images at 700 and 750 nm with high noise (SNR: 10.1 dB). As indicated by the arrows in Fig. 3(a), some PA features are distorted due to the limited view and boundary build-up. In particular, the tilted vessels have a poor SNR due to reduced signal magnitude. These distortions significantly affect calculations, as demonstrated in Fig. 3(c). As shown in Figs. 3(d) and 3(e), the trained model can restore the structural integrity. Compared with the raw images, the model prediction results are closer to the initial pressure in Figs. 3(g) and 3(h). The calculated results in Fig. 3(f) are also more consistent with the ground truth in Fig. 3(i) than the raw results.
Intensity profiles along the three dashed lines in Fig. 3(b) were evaluated for the raw image, the model output, and the ground truth [Figs. 3(j)–3(o)]. In the raw image, signals from obliquely oriented vessels were largely indistinguishable from background noise. By contrast, the model output clearly resolves these vessels, owing to US-PD-guided signal restoration and background suppression. The reconstructed vessel intensities closely match the ground truth.
We evaluated imaging across 10 randomized configurations. The SNR ranged from 5 to 40 dB. Arterial ranged from 0.95 to 1.00, whereas venous ranged from 0.65 to 0.75. Before the calculation, we compared the mean absolute error (MAE) of PA intensity in the vessel region, as shown in Fig. 3(p). The MAE of 700 nm decreased from 0.41 to 0.13. The MAE of decreased from 0.46 to 0.16. Model-predicted values were then compared with the ground truth for each vessel indicated by the purple arrows [Fig. 3(c)]. As shown in Fig. 3(q), the average MAE of the raw results in different vessels exceeded 0.15, whereas the average MAE of the model’s outputs was below 0.07, indicating a significant improvement in accuracy for quantification.
3.2 Phantom Experiments
We conducted phantom experiments to verify the DEPANet model, as illustrated in Fig. 4(a). Two tubes (BD Intramedic, Franklin Lakes, New Jersey, United States) with an inner diameter of 1.19 mm were bent to two semicircles, each with an radius. The tubes were filled with bovine blood. The blood in the left tube has , and the blood in the right one was treated with sodium dithionite () and a 5-min infusion of 99.5% gaseous carbon dioxide and thus has a of .79 Before imaging, we validated the levels in both tubes as described in Fig. S4(a) in the Supplementary Material. During image acquisition, blood flowed through the tubes at a constant speed. Ultrasound signals and 700- and 750-nm PA signals were collected.
![]()
Figure 4.Results of phantom experiments. (a) Setup of the phantom. (b) US-PD images. (c) Binarized US-PD images. (d) and (e) Raw PA images at 700 and 750 nm. (f)
To compute the US-PD images, we used an SVD filter, discarded the lowest 10% singular vectors, and reconstructed the image using the remaining 90% signals. As shown in Fig. 4(b), the US-PD images provided a complete structural representation of the blood sample. Figure 4(c) shows the binarized US-PD image, which was used in the DEPANet model to enhance the PA image. In addition, vector Doppler results are also available in Fig. S4(b) in the Supplementary Material.
The raw PA results at the two wavelengths are presented in Figs. 4(d) and 4(e). The PA images reveal incomplete structures at the ends of the semicircular tubes, which is due to the limited view of the linear transducer. Along the semicircular arc, the PA signal intensity drops to half in range. Figure 4(f) presents the image, which shows differentiable oxygenation in the two tubes, but the limited-view problem still exists in the image.
In Figs. 4(g) and 4(h), we observed the enhanced results using DEPANet. Guided by the US-PD image, DEPANet successfully restored the PA structures, particularly at the ends of the tubes. In Fig. 4(g), the PA intensity along the center of the tube drops to half in a 163-deg range, representing a 10.2-fold improvement. The intensity profiles without and with the enhancement can be found in Fig. S4(c) in the Supplementary Material. In addition to the structural improvements, DEPANet improved the accuracy. The values [Fig. 4(i)] aligned more closely with the true values. We also performed enhancement tests on tubes with different , and detailed results are presented in Fig. S4(d) in the Supplementary Material. Furthermore, we performed a 2D -space analysis on the PA result at 700 nm. The analysis demonstrates that DEPANet effectively recovers the missing wedge information in the raw data within the spatial frequency domain. Detailed results are provided in Note S6 in the Supplementary Material. The phantom experiments demonstrate that DEPANet can both augment the view angle and improve imaging.
3.3 Imaging of the Cervical Vasculature in Rats
To evaluate the in vivo imaging performance of DEPANet, we observed the cervical vasculature of a rat. In these experiments, 12.5% of the low-rank singular vectors were discarded during the SVD clutter removal process. The laser fluences on the skin surface were maintained at for 700 nm and for 750 nm, both of which remain within the ANSI safety limits.80 These two specific wavelengths were selected for the rat experiments due to their similar spectral characteristics, which effectively minimizes the difference in light fluence caused by tissue scattering. Furthermore, to ensure the accuracy of image reconstruction, we corrected the speed of sound for different media. Specifically, for the internal carotid artery (ICA) experiments, we employed (water coupling environment) and (biological tissue). For the internal jugular vein (IJV), sound speeds of 1510 and were utilized for compensation.
We imaged the ICA of a 12-week-old male rat, as depicted in Fig. 5(a). The ultrasound results are shown in Fig. S6(a) in the Supplementary Material. Figures 5(b)–5(d) show the raw PA images at 700 and 750 nm, along with the calculated map. It is observable that the raw PA images suffer from the limited-view problem inherent to linear-array transducers, where tilted vascular orientations exhibit significant signal discontinuity or missing [indicated by green arrows in Fig. 5(b)]. Figure 5(e) displays the corresponding US-PD image, which clearly identifies the ICA structure without being affected by limited-view artifacts.
![]()
Figure 5.In vivo imaging of the rat ICA. (a) Anatomical schematic. (b)–(d) Raw
To demonstrate the correspondence among the dual-modal structures, we overlaid the US-PD images with the raw PA signals [Figs. 5(f)–5(h)], showcasing the superiority of US-PD as a structural prior. Furthermore, the overlay results confirm that although the raw PA signals are discontinuous, their spatial distribution is highly aligned with the priors provided by US-PD. These aligned vascular signals subsequently serve as numerical references for the DEPANet input.
Subsequently, using the US-PD mask shown in Fig. 5(i) as structural guidance, the raw enhanced results are shown in Figs. S8(a) and S8(b) in the Supplementary Material. Because PA imaging can detect signals from capillaries and small vessels with slow blood flow, which, however, are insensitive to US-PD, we overlay the enhanced vessel results onto the original PA image to create a more comprehensive depiction, as shown in Figs. 5(j)–5(l). The enhanced PA images significantly restore vascular continuity, particularly providing recovery of the previously missing deep curved vessel. Quantitative analysis further validates the enhancement effect. Figures 5(m) and 5(n) display the CNRs within the US-PD mask along the depth direction. The DEPANet-enhanced results significantly outperform the raw PA results. Considering the average enhancement across all depths, the 700-nm signal improved by an average of 7.5 dB and the 750-nm signal by 7.6 dB. In addition, the histogram in Fig. 5(o) shows that the enhanced distribution (blue) is more concentrated than the original distribution (orange), with peaks more densely aggregated above 0.95, which is highly consistent with the physiological expectation for arterial blood oxygen saturation.
As shown in Fig. 6(a), we imaged the IJV of a 10-week-old male rat. The corresponding ultrasound and vector Doppler results are presented in Figs. S6(b) and S7(a) in the Supplementary Material. The raw PA images at 700 and 750 nm, shown in Figs. 6(b) and 6(c), demonstrate that although PA maintains high sensitivity in superficial tissues, it struggles to effectively characterize deep tilted vessels, as indicated by the green arrows. This discrepancy highlights the severe limited-view problem inherent in linear-array PACT systems. Figure 6(d) illustrates the results derived from the raw PA data, where the limited-view issue and low SNR in deep tissues persist, leading to noisy measurements. Many values within the IJV do not align with the normal physiological expectations for healthy animals.
![]()
Figure 6.In vivo imaging of the rat IJV. (a) Anatomical schematic of the imaging region. (b)–(d) Raw PA images at
Figure 6(e) displays the US-PD image, with the IJV indicated by the white arrow. The US-PD image reveals thick vertical vessels that pose significant challenges to conventional PA imaging due to their specific orientation and depth. In Figs. 6(f)–6(h), we present the overlaid imaging of PA and US-PD. By leveraging the overlapping signal components between the two modalities, the IJV signal in the PA image can be clearly identified. Subsequently, the DEPANet was employed to fuse the raw PA images with the US-PD mask [Fig. 6(i)]. To reserve the high sensitivity of PA signals in superficial regions less affected by the limited-view problem, we manually excluded superficial signals based on the initial binarization. The raw enhanced results are shown in Figs. S8(c) and S8(d) in the Supplementary Material. The overlaid PA images at 700 and 750 nm [Figs. 6(j) and 6(k)] exhibit significant structural restoration, particularly in the regions marked by the green dashed lines. In the green dashed box, we observed vascular connections that were invisible in any single imaging modality, further demonstrating the advantages of the fusion-based imaging approach. Figure 6(l) displays the enhanced results, where the IJV shows values more consistent with venous oxygenation. The vector Doppler results in Fig. S7(a) in the Supplementary Material were used as a flow reference to verify accuracy across different arteriovenous vessels.
Quantitative analysis further characterizes the performance gains of this enhancement method by calculating the changes in CNR within the US-PD ROI. Within the 2- to 14-mm depth, the average CNRs increased by 8.1 dB at 700 nm and 7.9 dB at 750 nm [Figs. 6(m) and 6(n)]. Furthermore, the distribution characteristics within the IJV were significantly optimized. The peak aggregation interval shifted from 0.85–0.9 to 0.80–0.85 [Fig. 6(o)], a correction that aligns more closely with the actual physiological status of venous blood oxygen in healthy rats.
3.4 Vascular Imaging in Human Volunteers
To evaluate the in vivo imaging performance of DEPANet, we observed the vasculature in the human arm and neck. During these experiments, 12.5% of the low-rank singular vectors were discarded during the SVD clutter removal process. The laser fluences on the skin surface were maintained at for 700 nm and for 1064 nm, both adhering to ANSI safety standards.80 To improve the accuracy of blood oxygenation, we utilized Beer’s law to compensate for the discrepancies in light fluence between 700 and 1064 nm along the depth direction. Furthermore, to ensure the accuracy of image reconstruction, a dual-speed-of-sound calibration scheme was implemented. For the anterior interosseous artery (AIA), compensation was performed using in the water environment and in tissue. For the superior thyroid artery (STA) and superior thyroid vein (STV), the compensation values were set at 1470 and for the water environment and 1580 and for tissue, respectively.
We conducted human experimental validation on a 25-year-old volunteer to evaluate the oxygenation status of the AIA, with the anatomical schematic shown in Fig. 7(a). The vector Doppler result is shown in Fig. S7(b) in the Supplementary Material. In the raw PA imaging, although the original images at wavelengths of 700 nm [Fig. 7(b)] and 1064 nm [Fig. 7(c)] present partial outlines, both indicate missing vascular signals. This deficiency directly leads to severe non-physiological fluctuations in the derived from the raw data [Fig. 7(d)]. Figure 7(e) displays the US-PD image overlaid on the ultrasound structural image, clearly delineating the complete structure of the AIA and providing a reliable structural reference for subsequent PA enhancement. Figures 7(f)–7(h) overlay the US-PD images with the PA images, demonstrating the reference for subsequent PA numerical input. To combine advantages, we extracted the US-PD mask shown in Fig. 7(i) and employed the DEPANet for deep learning enhancement.
![]()
Figure 7.In vivo imaging of the AIA. (a) Anatomical schematic of the imaging site on the volunteer’s forearm. (b)–(d) Raw PA images at 700 and 1064 nm, along with the derived raw
The enhanced multi-wavelength PA results [Figs. 7(j) and 7(k)] exhibit significant structural restoration [raw enhanced results in Figs. S8(e) and S8(f) in the Supplementary Material], with previously broken or missing vascular boundaries being completely restored. Based on the completion of structural information, the enhanced image [Fig. 7(l)] demonstrates excellent numerical uniformity. Quantitative analysis further validates the effectiveness of this method: the pixel-wise CNR scatter plot in Fig. 7(m) shows that the enhanced data points deviate significantly from the isopleth, with the average CNRs for 700 and 1064 nm improving by 5.8 and 9.8 dB, respectively. Meanwhile, the statistical histogram in Fig. 7(j) indicates that the distribution of the AIA after enhancement significantly narrows and concentrates toward the high-value region, with its peak interval closely aligning with the normal physiological range of arterial blood oxygen in healthy adults.
We also performed human experimental validation on a 30-year-old volunteer to assess the of the STA and its accompanying STV. The anatomical schematic of STA is shown in Fig. 8(a). In raw PA imaging, both the 700-nm [Fig. 8(b)] and 1064-nm [Fig. 8(c)] channels exhibit vessel signal degradation due to limited-view constraints. This signal loss results in erratic, non-physiological fluctuations in the initial derivation [Fig. 8(d)]. Figure 8(e) presents the US-PD image, which provides a reliable structural prior by clearly delineating the STA’s trajectory. Guided by this structural information, the PA vascular signals are clearly observable along the vessel path [Figs. 8(f)–8(h)]. Notably, by leveraging the penetration depth and better sensitivity to arterial blood at 1064 nm, the imaging results demonstrate high fidelity to the US-PD data, thereby confirming the reliability of the raw PA signals even for deep-seated vasculature (). To further refine the imaging quality, we employed a US-PD mask [Fig. 8(i)] as an input to our DEPANet for deep-learning-based enhancement.
![]()
Figure 8.In vivo imaging of the human STA. (a) Anatomical schematic of the imaging site in the neck. (b)–(d) Raw PA images at 700 and 1064 nm and the corresponding raw
The enhanced multi-wavelength PA images [Figs. 8(j)–8(k)] demonstrate substantial structural restoration [raw enhanced results in Figs. S8(g)–S(h) in the Supplementary Material], effectively recovering previously fragmented signals. Consequently, the enhanced map [Fig. 8(l)] exhibits numerical stability and spatial uniformity. Quantitative analysis confirms this performance improvement: the pixel-wise CNR scatter plot [Fig. 8(m)] shows that enhanced data points deviate significantly from the isopleth, with mean CNR gains of 5.1 and 3.3 dB for the 700- and 1064-nm channels, respectively. Furthermore, the statistical histogram [Fig. 8(n)] reveals that 90% of the values in the enhanced STA are concentrated above 0.9, consistent with physiological expectations for arterial blood oxygen in healthy adults.
Building upon the arterial assessment, we further investigated the STV with the anatomical schematic shown in Fig. 9(a). Consistent with the findings in the STA, the raw PA images at 700 nm [Fig. 9(b)] and 1064 nm [Fig. 9(c)] suffer from signal fragmentation due to the limited-view effect, leading to unreliable estimation [Fig. 9(d)]. However, due to the relatively superficial location of the STV (), both the 700- and 1064-nm images demonstrate a high degree of spatial correlation with the US-PD maps [Fig. 9(e)], as shown in Figs. 9(f)–9(h). By utilizing the US-PD image as a structural constraint [Fig. 9(i)], the DEPANet-enhanced results [Figs. 9(j)–9(k)] effectively reconstruct the STV’s vascular morphology [raw enhanced results in Figs. S8(i) and S8(j) in the Supplementary Material]. The enhanced map [Fig. 9(l)] demonstrates a marked reduction in non-physiological noise compared with the raw data. Quantitative validation further underscores the efficacy of the proposed enhancement. The CNR scatter plot in Fig. 9(m) shows that the majority of data points for both wavelengths are distributed above the identity line. More importantly, the statistical histogram [Fig. 9(n)] shows that the distribution of values after enhancement narrows significantly and centers within the range of 0.70 to 0.85. This range is in agreement with the physiological venous blood oxygen saturation levels in healthy adults.
![]()
Figure 9.In vivo imaging of the human STV. (a) Anatomical schematic illustrating the venous imaging site. (b)–(d) Raw PA images at 700 and 1064 nm and the corresponding raw
Finally, to quantitatively demonstrate the robustness and improvement of our enhancement method, a stability analysis of was performed across all five major vessels of in vivo experiments, with detailed metrics and results provided in Note S8 in the Supplementary Material.
4 Discussion
PACT holds great potential for clinical diagnosis, particularly in providing high-resolution imaging of deep tissues that can offer crucial information on , vascular morphology, and functional status. However, the linear-array transducer used in PACT faces significant challenges due to the limited view, especially when imaging thick or oblique blood vessels. This limitation severely affects the accuracy and completeness of the images, which can hinder the effective clinical application of PACT, especially in disease research that requires high-precision vascular and functional imaging, such as cardiovascular and cerebrovascular.
To address this challenge, we propose DEPANet, an innovative deep-learning-based approach that fuses US-PD with multi-wavelength PACT, overcoming the limited-view problem. US-PD, a mature ultrasound technique with widespread clinical application, offers the advantage of being unaffected by the limited view and provides stable vascular structural information. By integrating US-PD with the optical absorption information from PACT, DEPANet uses deep learning models to enhance image quality, effectively compensating for the limitations of PACT in vascular morphology reconstruction, particularly in the imaging of oblique and think vessels, where significant enhancement is achieved.
A core advantage of DEPANet is that it does not require additional hardware support and can deliver high-quality imaging results using only the existing linear-array system and deep learning models. Through a series of simulations, phantom models, animal experiments, and human trials, we have demonstrated that DEPANet not only enhances image contrast but also improves vascular morphology reconstruction, making measurements more consistent. This provides a new technical pathway for the clinical application of PA imaging, particularly in disease research that requires high-precision vascular and functional imaging.
Despite the promising results shown by DEPANet, several challenges remain to be addressed. First, although DEPANet has performed well in the regions discussed in this study, in high-density vascular areas, its enhancement effects may be hindered by interference from neighboring vessels, which can degrade image quality. Its performance is also inherently constrained in scenarios with unreliable US-PD priors—such as slow-flow microvasculature or thrombosis. To overcome this, integrating more sensitive modalities, such as contrast-enhanced ultrasound, is required to provide robust US-PD priors for high-fidelity imaging in low-flow states. In addition, the current training data for DEPANet is primarily based on simulation results. Although the models trained on these data have demonstrated effectiveness in this study, there remain discrepancies between the model outputs and real-world data, which constitutes a primary limitation of the current research. Furthermore, future research about DEPANet could explore additional modal fusion solutions, such as combining CT and MRI data, to further enhance multi-modal imaging fusion effects.
Overall, DEPANet offers an innovative solution to the limited-view problem in PA imaging and shows significant potential for clinical and biomedical applications, particularly in efficient high-resolution vascular and functional imaging. With continued optimization of the model and advancements in technology, DEPANet is poised to evolve within the realm of multi-modal image fusion. We believe that, with further development, DEPANet will play a larger role in clinical diagnosis and early disease screening, especially in the context of precision healthcare, providing more reliable and precise imaging data support.
5 Conclusion
We present a deep learning approach to address the limited-view problem in linear-array PACT. Vascular morphological information in US-PD is fused with the optical absorption contrast in PA signals using the DEPANet model. The deep learning model effectively integrates the complementary strengths of both modalities, significantly improving the CNR and restoring structural fidelity in areas prone to signal loss. Our results, validated through simulations, phantom experiments, and in vivo studies, demonstrate the enhanced integrity in vascular reconstruction and precision in measurements. Furthermore, deep-tissue functional imaging is demonstrated in human studies. These advancements highlight the potential of DEPANet in both research and clinical settings, particularly for applications that require precise vascular imaging. The deep integration of US-PD with linear PACT holds great potential to enhance its diagnostic capabilities.
Acknowledgments
Acknowledgment. This work was supported by the Research Grants Council of the Hong Kong Special Administrative Region (Grant No. 11104922) and the National Natural Science Foundation of China (Grant No. 61805102).
Zheng Qu is a PhD student at the Department of Biomedical Engineering, City University of Hong Kong. He received his bachelor’s and master’s degrees from Tianjin University. His research focuses on photoacoustic imaging and ultrasound imaging.
Xuanhao Zhang is a PhD student at the Department of Biomedical Engineering, City University of Hong Kong. He received his bachelor’s degree from Nottingham University. His research focuses on photoacoustic imaging and ultrasound imaging.
Bin Ouyang is a PhD student at the Department of Biomedical Engineering, City University of Hong Kong. He received his bachelor’s degree from Sun Yat-sen University. His research focuses on photoacoustic imaging and flexible electronics.
Cong Mai is a medical doctor in the Critical Care Department of Guangdong Province People’s Hospital. He obtained his master’s degree and PhD from South China University of Technology. His research focuses on the development and application of photoacoustic imaging techniques, with a particular emphasis on their use in studying circulatory dynamics in post-resuscitation brain injury and other critical care diseases.
Xu Tang is a PhD student at the Department of Biomedical Engineering, City University of Hong Kong. He received his bachelor’s degree from Beihang University. His research focuses on photoacoustic imaging.
Lidai Wang received his bachelor’s and master’s degrees from Tsinghua University, and received his PhD from the University of Toronto, Canada. After serving as a postdoctoral research fellow in Prof. Lihong Wang’s group, he joined the City University of Hong Kong in 2015. His research focuses on biophotonics, biomedical imaging, wavefront engineering, instrumentation, and their biomedical applications.
References
[38] Y. Zhang et al. Functional photoacoustic noninvasive Doppler angiography in humans(2024).
[44] Y. Yang et al. Limited-view compensation in photoacoustic computed tomography using a generative mode, 1-4(2025).
[63] L. Chen et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning, 5659-5667(2017).
[64] J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks, 7132-7141(2018).
[66] A. Vaswani et al. Attention is all you need(2017).
[67] C.-F. R. Chen, Q. Fan, R. Panda. CrossViT: cross-attention multi-scale vision transformer for image classification, 357-366(2021).
[68] K. He et al. Deep residual learning for image recognition, 770-778(2016).
[71] T.-Y. Lin et al. Feature pyramid networks for object detection, 2117-2125(2017).
[76] W.-S. Lai et al. Deep Laplacian pyramid networks for fast and accurate super-resolution, 624-632(2017).
[80] . American National Standard for Safe Use of Lasers(2014).

Set citation alerts for the article
Please enter your email address


AI Video Guide
AI Picture Guide
AI One Sentence


