Thesis title: Degradation-Aware Lightweight Super-Resolution with Adaptive Feature Modulation
Abstract—Image Super-Resolution (SR) is the way to convert a low-resolution image into a high-resolution one. It is useful in many fields such as surveillance, medical imaging, satellite vision, and consumer photography. However, most existing SR models are trained only on clean, synthetic images. Consequently, they fail when applied to real-world noise, blur, or compression artifacts affected photos. Additionally, these models are also computationally inefficient, have complex architecture, and have no control over the output for users to adjust quality. This work proposes a deep learning-based SR model that is efficient, robust, and controllable. The proposed model was initially built on the Real-ESRGAN backbone but several key improvements have been incorporated. It replaces the typical nearest-neighbor up-sampling with dynamic Pixel-Shuffle based learnable convolutions network, which adds learnable parameters and gives sharper output. The model also uses Sigmoid Liner Unit activation instead of ReLU or Leaky-ReLU to ensure better gradient flow and stable training. To reduce computational cost, 1×1 bottleneck layers are added between the core residual-in-residual blocks. To handle real-world images, the training data is processed using a multi-order degradation pipeline that simulates real world degradation. This includes poisson blur, motion blur, color noise, JPEG compression, and resizing in random order. This exposure enables the model handle unseen scenarios and learn from more realistic examples, which help improving the model’s performance on actual photos. A key feature of the model upgradation is its Peak Signal-to-Noise Ratio (PSNR) guided restoration control. It uses a small network to estimate the input quality of the image in terms of PSNR and passes this value as its label into the model training. The model then adjusts how strongly it restores the image. Users can also provide this value manually to control the final output quality during inference. The model is trained in a single-stage pipeline that combines pixel loss, perceptual loss, and adversarial loss, which reduces the complexity of multi-phase training used in older models. The model performs well on both synthetic and real datasets. Quantitative assessment shows that the model yields higher PSNR of 32.75 and Structural Similarity Index (SSIM) scores 0.906 while maintaining lower Learned Perceptual Image Patch Similarity (LPIPS) of 0.062 in the synthetic dataset and PSNR of 26.84, SSIM of 0.809 maintaining 0.118 LPIPS score on real-world datasets like RealSR and HQ-50K, which is close to many state-of-the-art methods but almost 2~3 times faster inference compared to them. Qualitative evaluation confirms the superior performance of the model in producing better images in comparison to the other counterparts. Ablation studies show that each component of the proposed model has its share in the improvement of the enhanced performance of the model. In summary, this work presents an efficient, adaptive, and real-world-ready super-resolution model suitable for real deployment in various applications.