文字辨識方法統整

2 min readDec 30, 2021

OCR 英文全稱是 Optical Character Recognition，中文叫做光學字元識別，目前是文字辨識的統稱，已不限於文檔或書本文字辨識，更包括辨識自然場景下的文字，又可以稱為 STR（Scene Text Recognition）。

圖1 中有三個大分類，包含 Text detection, Text recognition, Text spotting，Text detection 主要是偵測文字在影像中的哪個位置，Text recognition 主要是將偵測後的結果拿來辨識是什麼文字，而 Text spotting 則是將 detection 和 recognition 整合到一個 End-to-End 的網路中來進行文字辨識。

Text detection

1. Methods Inspired by Object Detection

an Efficient and Accurate Scene Text detector (EAST) (Zhou et al. 2017)

2. Methods Based on Sub-text Components

2.1. Pixel-level methods

PixelLink (Deng et al. 2018)

2.2. Component-level methods

Connec- tionist Text Proposal Network (CTPN) (Tian et al. 2016)
SegLink (Shi et al. 2017a)
TextSnake Long et al. (2018)
Differentiable Binarization (DB) (Minghui Liao et al. 2019)

2.3. Character-level representation

Character Region Awareness for Text Detection (CRAFT) (Baek et al. 2019b)
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection (DRRG) (Shi-Xue Zhang et al. 2020)

Text recognition

1. Connectionist Temporal Classification (CTC) Based methods

CRNN (Baoguang Shi et al. 2016)

2. Encoder-decoder methods

An Attentional Scene Text Recognizer with Flexible Rectification (ASTER) (Baoguang Shi et al. 2018)

3. Adaption of irregular text recognition

Alchemy (Shangbang Long et al. 2019)
Semantic Reasoning Network (SRN) (Deli Yu et al. 2020)

Text spotting

1. Two step pipelines

TextBoxes (Liao et al. 2017)

2. Two stage pipelines

A Feasible Framework for Arbitrary-Shaped Scene Text Recognition (AttentionOCR) (Jinjin Zhang et al. 2019)
Character Region Attention For Text Spotting (CRAFTS) (Youngmin Baek et al. 2020)

3. One stage pipelines

Convolutional Character Networks (Linjie Xing et al. 2019)

Reference

Scene Text Detection and Recognition: The Deep Learning Era