OCR: Optical Character Recognition
Datasets
- pixparse/idl-wds · Datasets at Hugging Face
- pixparse/pdfa-eng-wds · Datasets at Hugging Face
- yifeihu/TFT-ID-1.0 · Hugging Face
- lightonai/fc-amf-ocr · Datasets at Hugging Face
Text Detection
Text Recognition
Layout, Tables and etc
- fintabnet dataset
- GitHub - opendatalab/OmniDocBench: A Comprehensive Benchmark for Document Parsing and Evaluation
End to End
- stepfun-ai/GOT-OCR2_0 · Hugging Face
- GitHub - Topdu/OpenOCR: OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
- GitHub - clovaai/donut: Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
VLM
- GitHub - QwenLM/Qwen2-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
- smol-vision/ColPali_+_Qwen2_VL.ipynb at main · merveenoyan/smol-vision · GitHub
- GitHub - THUDM/CogVLM2: GPT4V-level open-source multi-modal model based on Llama3-8B
- GitHub - Ucas-HaoranWei/GOT-OCR2.0: Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model