OCR 英文全稱是 Optical Character Recognition,中文叫做光學字元識別,目前是文字辨識的統稱,已不限於文檔或書本文字辨識,更包括辨識自然場景下的文字,又可以稱為 STR(Scene Text Recognition)。
圖1 中有三個大分類,包含 Text detection, Text recognition, Text spotting,Text detection 主要是偵測文字在影像中的哪個位置,Text recognition 主要是將偵測後的結果拿來辨識是什麼文字,而 Text spotting 則是將 detection 和 recognition 整合到一個 End-to-End 的網路中來進行文字辨識。
Text detection
1. Methods Inspired by Object Detection
- an Efficient and Accurate Scene Text detector (EAST) (Zhou et al. 2017)
2. Methods Based on Sub-text Components
2.1. Pixel-level methods
- PixelLink (Deng et al. 2018)
2.2. Component-level methods
- Connec- tionist Text Proposal Network (CTPN) (Tian et al. 2016)
- SegLink (Shi et al. 2017a)
- TextSnake Long et al. (2018)
- Differentiable Binarization (DB) (Minghui Liao et al. 2019)
2.3. Character-level representation
- Character Region Awareness for Text Detection (CRAFT) (Baek et al. 2019b)
- Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection (DRRG) (Shi-Xue Zhang et al. 2020)
Text recognition
1. Connectionist Temporal Classification (CTC) Based methods
- CRNN (Baoguang Shi et al. 2016)
2. Encoder-decoder methods
- An Attentional Scene Text Recognizer with Flexible Rectification (ASTER) (Baoguang Shi et al. 2018)
3. Adaption of irregular text recognition
- Alchemy (Shangbang Long et al. 2019)
- Semantic Reasoning Network (SRN) (Deli Yu et al. 2020)
Text spotting
1. Two step pipelines
- TextBoxes (Liao et al. 2017)
2. Two stage pipelines
- A Feasible Framework for Arbitrary-Shaped Scene Text Recognition (AttentionOCR) (Jinjin Zhang et al. 2019)
- Character Region Attention For Text Spotting (CRAFTS) (Youngmin Baek et al. 2020)
3. One stage pipelines
- Convolutional Character Networks (Linjie Xing et al. 2019)