Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ShengYun-Peng authored Nov 16, 2023
1 parent 76d6d52 commit bbdf620
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,20 @@
[![arxiv badge](https://img.shields.io/badge/arXiv-2311.05565-red)](https://arxiv.org/abs/2311.05565)
[![license](https://img.shields.io/badge/License-MIT-success)](https://github.com/poloclub/wizmap/blob/main/LICENSE)

This is a PyTorch implementation of the NeurIPS'23 TRL Workshop Oral paper [High-Performance Transformers for Table Structure Recognition Need Early Convolutions](https://arxiv.org/abs/2311.05565).
[High-Performance Transformers for Table Structure Recognition Need Early Convolutions](https://arxiv.org/abs/2311.05565). ShengYun Peng, Seongmin Lee, Xiaojing Wang, Rajarajeswari Balasubramaniyan, Duen Horng Chau. In *NeurIPS 2023 Second Table Representation Learning Workshop*, 2023. (Oral)

📖 <a href="https://arxiv.org/abs/2311.05565"> Research Paper</a> &nbsp;&nbsp;&nbsp;&nbsp;
🚀 <a href="https://shengyun-peng.github.io/tsr-convstem"> Project Page</a> &nbsp;&nbsp;&nbsp;&nbsp;

- [x] <a href="https://arxiv.org/abs/2311.05565">Research Paper 📖</a>
- [x] <a href="https://shengyun-peng.github.io/tsr-convstem">Project Page 🚀</a>

<p align="center">
<img src="imgs/pipeline.png" alt="drawing" width="600"/>
<img src="imgs/pipeline.png" alt="drawing" width="100%"/>
</p>

Table structure recognition (TSR) aims to convert tabular images into a machine-readable format, where a visual encoder extracts image features and a textual decoder generates table-representing tokens. Existing approaches use classic convolutional neural network (CNN) backbones for the visual encoder and transformers for the textual decoder. However, this hybrid CNN-Transformer architecture introduces a complex visual encoder that accounts for nearly half of the total model parameters, markedly reduces both training and inference speed, and hinders the potential for self-supervised learning in TSR. In this work, we design a lightweight visual encoder for TSR without sacrificing expressive power. We discover that a convolutional stem can match classic CNN backbone performance, with a much simpler model. The convolutional stem strikes an optimal balance between two crucial factors for high-performance TSR: a higher receptive field (RF) ratio and a longer sequence length. This allows it to "see" an appropriate portion of the table and "store" the complex table structure within sufficient context length for the subsequent transformer.

## News
`Oct. 2023` - Paper accepted by [NeurIPS'23 2nd Table Representation Learning Workshop](https://table-representation-learning.github.io/)
`Oct. 2023` - Paper accepted by [NeurIPS'23 Table Representation Learning Workshop](https://table-representation-learning.github.io/)

`Oct. 2023` - Paper selected as [oral](https://openreview.net/group?id=NeurIPS.cc/2023/Workshop/TRL)

Expand Down Expand Up @@ -47,4 +49,4 @@ make experiments/r18_e2_d4_adamw/.done_teds_structure
}
```
## Contact
If you have any questions, feel free to [open an issue](https://github.com/poloclub/tsr-convstem/issues/new) or contact [Anthony Peng](https://shengyun-peng.github.io/) (CS PhD @Georgia Tech).
If you have any questions, feel free to [open an issue](https://github.com/poloclub/tsr-convstem/issues/new) or contact [Anthony Peng](https://shengyun-peng.github.io/) (CS PhD @Georgia Tech).

0 comments on commit bbdf620

Please sign in to comment.