logo W2WDiff: Generalizing Underwater Diffusion Model via Unsupervised Underwater Conversion
Yuan-Lin Zhang1,2,#    Jie-Yu Yuan2,#    Xiao Chen1,*     Xiong-Xin Tang3,*    Qiao Chen3    Yi-Quan Wang1    Chong-Yi Li2
1Minzu University of China    2Nankai University    3Chinese Academy of Sciences

🔥 A More Generalizable Underwater Enhancement Method 🔥

We propose a novel diffusion-based underwater image enhancement (UIE) framework that integrates water-to-water domain aggregation with latent space operations to achieve superior generalization and enhancement performance. This design effectively addresses critical limitations of existing diffusion-based UIE methods, particularly their susceptibility to distribution shifts and semantic inconsistencies. To further validate the robustness and applicability of our approach, we present representative enhancement results across six common and challenging degradation scenarios: yellowish haze, greenish casts, light blue attenuation, deep blue absorption, low-light conditions, and fog-like veils.

Abstract

Underwater image enhancement (UIE) is essential for underwater information acquisition across marine science and ocean remote sensing. Diffusion-based UIE methods demonstrate remarkable enhancement capabilities but suffer significant performance degradation when test-time observations diverge from training-time assumptions. Moreover, the inherent stochasticity of the diffusion process often manifests as inconsistent and unstable enhancement results, compromising both reproducibility and quality assurance. To address these limitations, we propose W2WDiff, a novel framework that introduces an unsupervised water-to-water transformation strategy. By mapping heterogeneous underwater degradations to a tractable intermediate domain, our method circumvents the distribution shift problem inherent in direct enhancement approaches, achieving superior generalization across diverse underwater scenarios. In contrast to general pixel-space methods, we establish the feasibility of latent space UIE and introduce a corresponding diffusion paradigm. Our approach introduces a custom Markov chain specifically designed for underwater characteristics, achieving substantial reductions in sampling steps while mitigating color distortion. Furthermore, we propose a three-stage training scheme along with a Content Consistency Module (CCM) to mitigate pixel-level misalignment and enhance local structural fidelity and detail preservation. Comprehensive experiments demonstrate that W2WDiff achieves consistent and robust enhancement across a wide range of challenging underwater conditions, exhibiting strong zero-shot generalization performance.

Motivation

Comparison between Previous Diffusion-Based UIE Models and Our W2WDiff Framework. (a) General diffusion-based UIE models directly learn a supervised mapping from the underwater degradation domain D to the reference domain R , often struggling with domain shifts across different datasets. (b) Our proposed W2WDiff framework introduces an unsupervised water-to-water transformation, which first maps the original underwater domain D to an intermediate, more tractable underwater domain D' . This common underwater space bridges the distribution gap between diverse training and testing datasets, ensuring more effective adaptation. A diffusion model is then employed to reconstruct the reference domain R from D' . By leveraging this intermediate transformation, our approach mitigates domain mismatches and enhances zero-shot generalization across various underwater environments.

More Visual Comparisons

In this section, we present additional qualitative results that could not be included in the main paper due to space limitations. Specifically, visualizations on the out-of-distribution EUVP , U45 and UCCS datasets demonstrate the strong generalization capability of our method. Moreover, comparisons on the more challenging C60 dataset reveal that our approach consistently yields superior visual results compared to existing methods.

Visual comparisons on the challenging C60 dataset. Our method achieves superior visual enhancement under extremely adverse conditions, including low-light environments, compound degradation types, and turbid water. The results demonstrate robust detail preservation and effective color correction, validating the method capability in real-world, high-complexity underwater scenarios.


Generalization performance on out-of-distribution datasets EUVP and U45. Compared with baseline and diffusion-based methods, our approach exhibits strong generalization and robustness across diverse domains. It effectively enhances distant regions, seabed textures, and areas with scattered sediments, features typically overlooked by existing methods, demonstrating its adaptability to varying underwater distributions.




Enhancement consistency on the UCCS dataset. Our method maintains consistent enhancement across frames, addressing the stochasticity and semantic instability commonly observed in diffusion-based UIE methods. This consistency is particularly critical for video-based applications, underscoring the reliability and temporal stability of our framework.

Contact

Feel free to contact us at zason_zyl@163.com

© Yuan-Lin Zhang | Last updated: April. 2025