Skip to content

Commit b75933f

Browse files
committed
Update all FYE-SR
1 parent 3b199e6 commit b75933f

File tree

10 files changed

+104
-6
lines changed

10 files changed

+104
-6
lines changed
856 KB
Loading
91 KB
Loading
1.17 MB
Loading
1.16 MB
Loading
247 KB
Loading
249 KB
Loading

config/_default/menus.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ main:
2424
weight: 52
2525
- name: Projects
2626
weight: 53
27+
- name: Inference-time Computation in Generative AI
28+
url: /projects/infer-time-generative-ai/
29+
parent: Projects
30+
weight: 75
2731
- name: On-device AI
2832
url: /projects/on-device-ai/
2933
parent: Projects
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
@inproceedings{10.1145/3636534.3690698,
2+
author = {Huang, Kai and Yin, Xiangyu and Gu, Tao and Gao, Wei},
3+
title = {Perceptual-Centric Image Super-Resolution using Heterogeneous Processors on Mobile Devices},
4+
year = {2024},
5+
isbn = {9798400704895},
6+
publisher = {Association for Computing Machinery},
7+
address = {New York, NY, USA},
8+
url = {https://doi.org/10.1145/3636534.3690698},
9+
doi = {10.1145/3636534.3690698},
10+
booktitle = {Proceedings of the 30th Annual International Conference on Mobile Computing and Networking},
11+
pages = {1361–1376},
12+
numpages = {16},
13+
keywords = {image super-resolution, perceptual quality, neural networks, heterogeneous computing, mobile devices},
14+
location = {Washington D.C., DC, USA},
15+
series = {ACM MobiCom '24}
16+
}

content/publication/2024-fye-sr/cite.bib.bk

Lines changed: 0 additions & 6 deletions
This file was deleted.

content/publication/2024-fye-sr/index.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,87 @@ image:
6161
slides:
6262
---
6363

64+
## Background
65+
66+
Recent SOTA Image Super-Resolution (SR) techniques are mainly based on Neural
67+
networks (NNs) that can better capture such non-linearity and
68+
hence improve the image quality. However, NN-based SR models are
69+
computationally expensive for mobile devices with limited computing power.
70+
A better alternative is to involve specialized hardware AI
71+
accelerators that have been readily available in mobile SoCs,
72+
such as Neural Processing Units (NPUs), in addition to traditional
73+
processors (e.g., CPU and GPU) for faster inference.
74+
However, their use of fixed-point
75+
arithmetic could result in low quality in upscaled images
76+
when being applied to regression-based SR task.
77+
78+
To mitigate such image quality drop, existing schemes
79+
split input images into small patches and dispatch these
80+
patches to traditional processors and AI accelerators.
81+
However, when upscaled patches
82+
are re-stitched to form a complete image, such image-based
83+
split of SR computations often leads to color mismatch and
84+
visual inconsistency across image patches, as shown in the
85+
figure below. This inconsistency may not impact the structural
86+
image quality with a small portion of mismatching patches
87+
, but can largely affect the human perception of images.
88+
89+
![Quality drop and visual inconsistency](2024-fye-sr/fye-sr-fig1.png)
90+
91+
## Overview
92+
93+
### Our Idea
94+
95+
Our work addresses the visual inconsistency
96+
in upscaled images by introducing a new procedure-based
97+
approach to splitting SR computations among heterogeneous
98+
processors, as opposed to the traditional image-based split-
99+
ting. As shown below, We split the SR model and adaptively
100+
dispatch different NN layers of the SR model to heterogeneous\
101+
processors, according to the computing complexity
102+
of these NN layers and how SR computations in these layers
103+
are affected by the reduced arithmetic precision. Our goal
104+
is to maximize the utilization of AI accelerators within the
105+
given time constraints on SR computations, while minimizing
106+
their impact on perceptual image quality.
107+
108+
![FYE-SR basic idea](2024-fye-sr/fye-sr-fig2.png)
109+
110+
### System Design
111+
112+
![FYE-SR system overview](2024-fye-sr/fye-sr-fig8.png)
113+
114+
As shown in the figure above,
115+
our design of FYE-SR consists
116+
of three main modules. During the
117+
offline phase, we first use a SR Timing Profiler to measure the
118+
computing latencies of SR model’s different NN layers on
119+
traditional processors (e.g., GPU) and AI accelerators (e.g.,
120+
NPU), respectively. Then, knowledge about such latencies
121+
will be used to train a Model Split Learner to solve Eq. (2) for
122+
the optimal split of SR model.
123+
124+
During the online phase, FYE-SR enforces such model
125+
split, and uses a Data Format Converter to convert the intermediate
126+
feature maps into the right data formats (e.g.,
127+
INT8 and FP32) for properly switching SR computations be-
128+
tween heterogeneous processors.
129+
130+
## Results
131+
132+
As shown in the figures below,
133+
compared to other SOTA image SR approaches,
134+
our method could reach the overall optimal result
135+
considering both the structual image quality and
136+
perceptual quality, while meeting the preset deadline
137+
requirement.
138+
139+
![FYE-SR comparison results](2024-fye-sr/fye-sr-fig15.png)
140+
141+
Looking into the output images, FYE-SR can effectively
142+
suppress the distortions and visual inconsistency at
143+
detailed objects (windows on buildings).
144+
145+
![FYE-SR comparison: GPU-only, NPU-only](2024-fye-sr/fye-sr-fig16ab.png)
146+
147+
![FYE-SR comparison: MobiSR, FYE-SR](2024-fye-sr/fye-sr-fig16cd.png)

0 commit comments

Comments
 (0)