@@ -61,3 +61,87 @@ image:
61
61
slides :
62
62
---
63
63
64
+ ## Background
65
+
66
+ Recent SOTA Image Super-Resolution (SR) techniques are mainly based on Neural
67
+ networks (NNs) that can better capture such non-linearity and
68
+ hence improve the image quality. However, NN-based SR models are
69
+ computationally expensive for mobile devices with limited computing power.
70
+ A better alternative is to involve specialized hardware AI
71
+ accelerators that have been readily available in mobile SoCs,
72
+ such as Neural Processing Units (NPUs), in addition to traditional
73
+ processors (e.g., CPU and GPU) for faster inference.
74
+ However, their use of fixed-point
75
+ arithmetic could result in low quality in upscaled images
76
+ when being applied to regression-based SR task.
77
+
78
+ To mitigate such image quality drop, existing schemes
79
+ split input images into small patches and dispatch these
80
+ patches to traditional processors and AI accelerators.
81
+ However, when upscaled patches
82
+ are re-stitched to form a complete image, such image-based
83
+ split of SR computations often leads to color mismatch and
84
+ visual inconsistency across image patches, as shown in the
85
+ figure below. This inconsistency may not impact the structural
86
+ image quality with a small portion of mismatching patches
87
+ , but can largely affect the human perception of images.
88
+
89
+ ![ Quality drop and visual inconsistency] ( 2024-fye-sr/fye-sr-fig1.png )
90
+
91
+ ## Overview
92
+
93
+ ### Our Idea
94
+
95
+ Our work addresses the visual inconsistency
96
+ in upscaled images by introducing a new procedure-based
97
+ approach to splitting SR computations among heterogeneous
98
+ processors, as opposed to the traditional image-based split-
99
+ ting. As shown below, We split the SR model and adaptively
100
+ dispatch different NN layers of the SR model to heterogeneous\
101
+ processors, according to the computing complexity
102
+ of these NN layers and how SR computations in these layers
103
+ are affected by the reduced arithmetic precision. Our goal
104
+ is to maximize the utilization of AI accelerators within the
105
+ given time constraints on SR computations, while minimizing
106
+ their impact on perceptual image quality.
107
+
108
+ ![ FYE-SR basic idea] ( 2024-fye-sr/fye-sr-fig2.png )
109
+
110
+ ### System Design
111
+
112
+ ![ FYE-SR system overview] ( 2024-fye-sr/fye-sr-fig8.png )
113
+
114
+ As shown in the figure above,
115
+ our design of FYE-SR consists
116
+ of three main modules. During the
117
+ offline phase, we first use a SR Timing Profiler to measure the
118
+ computing latencies of SR model’s different NN layers on
119
+ traditional processors (e.g., GPU) and AI accelerators (e.g.,
120
+ NPU), respectively. Then, knowledge about such latencies
121
+ will be used to train a Model Split Learner to solve Eq. (2) for
122
+ the optimal split of SR model.
123
+
124
+ During the online phase, FYE-SR enforces such model
125
+ split, and uses a Data Format Converter to convert the intermediate
126
+ feature maps into the right data formats (e.g.,
127
+ INT8 and FP32) for properly switching SR computations be-
128
+ tween heterogeneous processors.
129
+
130
+ ## Results
131
+
132
+ As shown in the figures below,
133
+ compared to other SOTA image SR approaches,
134
+ our method could reach the overall optimal result
135
+ considering both the structual image quality and
136
+ perceptual quality, while meeting the preset deadline
137
+ requirement.
138
+
139
+ ![ FYE-SR comparison results] ( 2024-fye-sr/fye-sr-fig15.png )
140
+
141
+ Looking into the output images, FYE-SR can effectively
142
+ suppress the distortions and visual inconsistency at
143
+ detailed objects (windows on buildings).
144
+
145
+ ![ FYE-SR comparison: GPU-only, NPU-only] ( 2024-fye-sr/fye-sr-fig16ab.png )
146
+
147
+ ![ FYE-SR comparison: MobiSR, FYE-SR] ( 2024-fye-sr/fye-sr-fig16cd.png )
0 commit comments