All Questions
2,161 questions
1
vote
0
answers
48
views
Performing OCR of Seven Segment Display Multimeter
Firstly, I am very very new to these things, and I have come this far with the help of ChatGPT.
We recorded some videos of two multimeters that have seven-segment displays. I want to OCR them to use ...
0
votes
0
answers
20
views
Tesseract OCR misreads coloured labels a scaned image despite correct recognition of other colored text in the same image
I'm working on processing OCT scan images using Tesseract OCR. My goal is to extract patient information, including eye labels "OD" (right eye) and "OS" (left eye), from these ...
-2
votes
0
answers
47
views
Pytesseract not able to extract vehicle number plate text
I have designed code to detect number plates succesfully,But problem is arising when i need to extract number plate information using pytesseteract and store it in excel,It is not extracting number ...
0
votes
0
answers
24
views
lstm-unicharset file is unable to be created during tesseract training
I am trying to fine-tune an Optical Character Recognition (OCR) model on Tesseract's provided tesstrain repository for Japanese . I tried encoding the bash commands into Python in VSCode as I wanted ...
-1
votes
2
answers
84
views
How can I improve Tesseract OCR accuracy on rotated images in C++?
I am using Tesseract OCR (v5.5.0) in C++ to extract text from images, but I’m encountering issues when the images are rotated. Tesseract’s PSM_AUTO_OSD (PageSegMode::PSM_AUTO_OSD) works well for ...
0
votes
0
answers
45
views
Improving OCR Accuracy for Digits Inside Circles from Vector PDF Images (AutoCAD Export)
I'm working on an OCR task where I need to detect and read numbers that are inside circles. The original data comes from .dwg (AutoCAD) files, and the PDFs I use are not scanned — they’re exported ...
0
votes
0
answers
17
views
PyautoGui - text detection, issue in finding the alpha numeric words
Using PyautoGUI module to read texts, containing alphanumeric charectors is not bring detected. Specially numbers in it
For example i want to detect 'apple1', it is reading it in other context and ...
1
vote
1
answer
70
views
How to prioritize French OCR over Arabic when using Tesseract (fra+ara) on bilingual documents?
I'm working on scanned documents (registers) that contain both French and Arabic text.
When I run Tesseract OCR with lang='fra', all the French text is extracted perfectly.
But when I use lang='ara+...
0
votes
0
answers
11
views
Tesseract HOCR to a structured text for LLMs
I want to use the HOCR that I get from TesseractJS (I work on Javascript) and somehow transform it to be readable by a LLM.
The goal is to reade technical documents with prices, tabs, header, lines, ...
0
votes
0
answers
68
views
Tesseract OCR Command in ocrmypdf Fails with 'SubprocessOutputError' on Windows
ExitCodeException _common.py:271
Traceback (most recent call last):
File "C:\<USER>\apps\python\...
0
votes
0
answers
24
views
How to prevent Tesseract OCR from re-ordering the sentence in RTL context?
I have a large collection of structured text, in Hebrew & English, mixed in every sentence.
Tesseract is re-ordering the words in these sentence, without success.
How can I tell Tesseract just to ...
0
votes
0
answers
40
views
Tesseract Training: Error 'Integer (fast) model' When Using Apex.lstm
I’ve been following this tutorial from YouTube:
Guide to Tesseract Training
https://www.youtube.com/watch?v=KE4xEzFGSU8&t=13s
and its corresponding GitHub repository: astutejoe/tesseract_tutorial.
...
-1
votes
1
answer
52
views
I'm having trouble trying to convert image to text in python
I'm trying to convert the attached image using the pytesseract and opencv libraries in python, but the conversion is not satisfactory, since many characters are converted incorrectly. Does anyone have ...
0
votes
2
answers
67
views
python cv2 replace color with white
I try to replace the turquoise part (words) from the image with white background to have a clear source for tesseract-ocr.
The picture is loaded ok, image mask is created ok.
My question is how to ...
0
votes
0
answers
43
views
Tesseract, OCR and text based layout
I'm trying to build a small application (C#) that can OCR process some images, extracting the raw text with layout roughly intact (using tabs, spaces or whatever, to position the text in the output ...