Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
1 vote
0 answers
48 views

Performing OCR of Seven Segment Display Multimeter

Firstly, I am very very new to these things, and I have come this far with the help of ChatGPT. We recorded some videos of two multimeters that have seven-segment displays. I want to OCR them to use ...
AP3X's user avatar
  • 11
0 votes
0 answers
20 views

Tesseract OCR misreads coloured labels a scaned image despite correct recognition of other colored text in the same image

I'm working on processing OCT scan images using Tesseract OCR. My goal is to extract patient information, including eye labels "OD" (right eye) and "OS" (left eye), from these ...
User901036845's user avatar
-2 votes
0 answers
47 views

Pytesseract not able to extract vehicle number plate text

I have designed code to detect number plates succesfully,But problem is arising when i need to extract number plate information using pytesseteract and store it in excel,It is not extracting number ...
Raj's user avatar
  • 19
0 votes
0 answers
24 views

lstm-unicharset file is unable to be created during tesseract training

I am trying to fine-tune an Optical Character Recognition (OCR) model on Tesseract's provided tesstrain repository for Japanese . I tried encoding the bash commands into Python in VSCode as I wanted ...
Jiansen Chan's user avatar
-1 votes
2 answers
84 views

How can I improve Tesseract OCR accuracy on rotated images in C++?

I am using Tesseract OCR (v5.5.0) in C++ to extract text from images, but I’m encountering issues when the images are rotated. Tesseract’s PSM_AUTO_OSD (PageSegMode::PSM_AUTO_OSD) works well for ...
OMKAR GULAMBE's user avatar
0 votes
0 answers
45 views

Improving OCR Accuracy for Digits Inside Circles from Vector PDF Images (AutoCAD Export)

I'm working on an OCR task where I need to detect and read numbers that are inside circles. The original data comes from .dwg (AutoCAD) files, and the PDFs I use are not scanned — they’re exported ...
Jules Angebault's user avatar
0 votes
0 answers
17 views

PyautoGui - text detection, issue in finding the alpha numeric words

Using PyautoGUI module to read texts, containing alphanumeric charectors is not bring detected. Specially numbers in it For example i want to detect 'apple1', it is reading it in other context and ...
19131A0453 DOKALA CHARAN TEJA's user avatar
1 vote
1 answer
70 views

How to prioritize French OCR over Arabic when using Tesseract (fra+ara) on bilingual documents?

I'm working on scanned documents (registers) that contain both French and Arabic text. When I run Tesseract OCR with lang='fra', all the French text is extracted perfectly. But when I use lang='ara+...
nevermiind's user avatar
0 votes
0 answers
11 views

Tesseract HOCR to a structured text for LLMs

I want to use the HOCR that I get from TesseractJS (I work on Javascript) and somehow transform it to be readable by a LLM. The goal is to reade technical documents with prices, tabs, header, lines, ...
Blovnar's user avatar
  • 55
0 votes
0 answers
68 views

Tesseract OCR Command in ocrmypdf Fails with 'SubprocessOutputError' on Windows

ExitCodeException _common.py:271 Traceback (most recent call last): File "C:\<USER>\apps\python\...
Username's user avatar
0 votes
0 answers
24 views

How to prevent Tesseract OCR from re-ordering the sentence in RTL context?

I have a large collection of structured text, in Hebrew & English, mixed in every sentence. Tesseract is re-ordering the words in these sentence, without success. How can I tell Tesseract just to ...
Berry Tsakala's user avatar
0 votes
0 answers
40 views

Tesseract Training: Error 'Integer (fast) model' When Using Apex.lstm

I’ve been following this tutorial from YouTube: Guide to Tesseract Training https://www.youtube.com/watch?v=KE4xEzFGSU8&t=13s and its corresponding GitHub repository: astutejoe/tesseract_tutorial. ...
Impetus's user avatar
-1 votes
1 answer
52 views

I'm having trouble trying to convert image to text in python

I'm trying to convert the attached image using the pytesseract and opencv libraries in python, but the conversion is not satisfactory, since many characters are converted incorrectly. Does anyone have ...
Cristi Garcia's user avatar
0 votes
2 answers
67 views

python cv2 replace color with white

I try to replace the turquoise part (words) from the image with white background to have a clear source for tesseract-ocr. The picture is loaded ok, image mask is created ok. My question is how to ...
HEP's user avatar
  • 13
0 votes
0 answers
43 views

Tesseract, OCR and text based layout

I'm trying to build a small application (C#) that can OCR process some images, extracting the raw text with layout roughly intact (using tabs, spaces or whatever, to position the text in the output ...
Aidal's user avatar
  • 869

15 30 50 per page
1
2 3 4 5
145