文字辨識方法統整

張家銘
2 min readDec 30, 2021

OCR 英文全稱是 Optical Character Recognition,中文叫做光學字元識別,目前是文字辨識的統稱,已不限於文檔或書本文字辨識,更包括辨識自然場景下的文字,又可以稱為 STR(Scene Text Recognition)。

圖1 中有三個大分類,包含 Text detection, Text recognition, Text spotting,Text detection 主要是偵測文字在影像中的哪個位置,Text recognition 主要是將偵測後的結果拿來辨識是什麼文字,而 Text spotting 則是將 detection 和 recognition 整合到一個 End-to-End 的網路中來進行文字辨識。

圖1,文字辨識的示意圖

Text detection

1. Methods Inspired by Object Detection

  • an Efficient and Accurate Scene Text detector (EAST) (Zhou et al. 2017)

2. Methods Based on Sub-text Components

2.1. Pixel-level methods

  • PixelLink (Deng et al. 2018)

2.2. Component-level methods

  • Connec- tionist Text Proposal Network (CTPN) (Tian et al. 2016)
  • SegLink (Shi et al. 2017a)
  • TextSnake Long et al. (2018)
  • Differentiable Binarization (DB) (Minghui Liao et al. 2019)

2.3. Character-level representation

  • Character Region Awareness for Text Detection (CRAFT) (Baek et al. 2019b)
  • Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection (DRRG) (Shi-Xue Zhang et al. 2020)

Text recognition

1. Connectionist Temporal Classification (CTC) Based methods

  • CRNN (Baoguang Shi et al. 2016)

2. Encoder-decoder methods

  • An Attentional Scene Text Recognizer with Flexible Rectification (ASTER) (Baoguang Shi et al. 2018)

3. Adaption of irregular text recognition

  • Alchemy (Shangbang Long et al. 2019)
  • Semantic Reasoning Network (SRN) (Deli Yu et al. 2020)

Text spotting

1. Two step pipelines

  • TextBoxes (Liao et al. 2017)

2. Two stage pipelines

  • A Feasible Framework for Arbitrary-Shaped Scene Text Recognition (AttentionOCR) (Jinjin Zhang et al. 2019)
  • Character Region Attention For Text Spotting (CRAFTS) (Youngmin Baek et al. 2020)

3. One stage pipelines

  • Convolutional Character Networks (Linjie Xing et al. 2019)

Reference

Scene Text Detection and Recognition: The Deep Learning Era

--

--