Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
0 answers
26 views

I am extracting images from a PDF file. Then I am applying deskewing on it to correct the orientation. However it is failing in certain cases

Fixture details In this Fixture 5 is getting rotated properly, whereas Fixture 6 is not getting rotated properly and is in a downright position. Please help def deskew2(image): coords = np....
Nishit Sadual's user avatar
2 votes
0 answers
512 views

Using Python to extract timestamp text from frame of videos

I have a series of videos that contain timestamp in a specific ROI which I have already found. The text can vary between black and white and can mix in the same timestamp. I am struggling to use ...
Andrew Li's user avatar
-1 votes
2 answers
751 views

Tesseract not recognising text in image [closed]

I am trying to use tesseract in python 3.11 to convert some images to text on Windows 11. I have tried preprocessing the images including morphing, enlarging, greyscaling and thresholding but nothing ...
Vik's user avatar
  • 549
0 votes
1 answer
1k views

How to find numbers and alphabets in images and blur them(mask them)?

I have to blur the identification number without doing a hard code. I want to find the identification number on its own and blur that number for example In this image I need to blur the Account ...
Nik's user avatar
  • 1
1 vote
1 answer
962 views

Tesseract OCR is inaccurate for images with letter spacing

I'm trying to use Tesseract OCR to extract a string of characters (not a valid word) from an image. The issue is that the characters in the image are spaced out, like in the picture below. With ...
qanpi's user avatar
  • 33
0 votes
0 answers
99 views

Preprocessing for OCR [duplicate]

I am heavily relying on OCR for a project I have been working on, however with my limited understanding of the field I am not sure how to proceed. I have a list of pdf documents that need to be ...
Chinmay's user avatar
  • 171
0 votes
0 answers
446 views

create bounding box on meaningful word

I m using pytesseract.image_to_data() on this image: code to create Bounding Box: import pytesseract from pytesseract import Output import cv2 img = cv2.imread('Page_2.jpg') d = pytesseract....
aditya's user avatar
  • 33
1 vote
0 answers
40 views

Optically detect specific characters in a text document

We try to detect certain rare characters when doing OCR of scans of old documents. What is the state-of-the-art approach to text-object detection in document image analysis and OCR? Are standard SURF ...
barium's user avatar
  • 63
0 votes
0 answers
2k views

Which are the best practices for Tesseract OCR on low quality images?

I've been working for a while on an OCR solution for my business and I can't seem to get the catch of image filtering for low quality images. The balance between removing the noise and not breaking ...
Francisco Ferraz's user avatar
1 vote
1 answer
2k views

How to detect text using Tesseract on images with poor camera angles?

I'm working on extracting text on images, that are similar to the one shown below: Warehouse boxes with all kinds of different labels. Images often have poor angles. My code: im = cv2.imread('1.jpg') ...
Royce Ho's user avatar
1 vote
1 answer
268 views

How to detect digits from images using pytesseract?

I am trying to detect the text from the images but fail due to some unknown reasons. import pytesseract as pt from PIL import Image import re image = Image.open('sample.jpg') custom_config = r'--oem 3 ...
scee's user avatar
  • 15
1 vote
1 answer
1k views

Tesseract fine tuning error - Compute CTC targets failed

I'm trying to fine-tune tesseract 4.1.1 on my own specific data according to this guide. I want it to become able to detect and recognize text in boxes like that: I have generated a number of images ...
Ilya Fedorov's user avatar
2 votes
1 answer
727 views

Detect and remove rectangles surrounding character

How can i remove rectangles surrounding characters and digits so i can perform OCR after? Here is an example: I assume that the lines are continous. I tried to do it with OpenCV contours but so far ...
Ilya Fedorov's user avatar
0 votes
1 answer
962 views

Tesseract - digit regonition with many errors

I want to be able to recognize digits from images. So I have been playing around with tesseract and python. I looked into how to prepare the image and tried running tesseract on it and I must say I am ...
Dynamicnotion's user avatar
0 votes
0 answers
88 views

Improving the quality of OCR using pytesseract

I'm trying to use pytesseract to recognize text from this image but I'm unable to get satisfactory results. I've tried a number of things to make it easier for the tesseract to recognize the text. My ...
WorkerBee's user avatar

15 30 50 per page