OCR: Optical Character Recognition
Datasets
- pixparse/idl-wds · Datasets at Hugging Face
- pixparse/pdfa-eng-wds · Datasets at Hugging Face
- yifeihu/TFT-ID-1.0 · Hugging Face
- lightonai/fc-amf-ocr · Datasets at Hugging Face
Text Detection
Text Recognition
End to End
VLM
- GitHub - QwenLM/Qwen2-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
- smol-vision/ColPali_+_Qwen2_VL.ipynb at main · merveenoyan/smol-vision · GitHub
- GitHub - THUDM/CogVLM2: GPT4V-level open-source multi-modal model based on Llama3-8B
- GitHub - Ucas-HaoranWei/GOT-OCR2.0: Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model