All Questions
43 questions
0
votes
0
answers
26
views
I am extracting images from a PDF file. Then I am applying deskewing on it to correct the orientation. However it is failing in certain cases
Fixture details
In this Fixture 5 is getting rotated properly, whereas Fixture 6 is not getting rotated properly and is in a downright position. Please help
def deskew2(image):
coords = np....
2
votes
0
answers
512
views
Using Python to extract timestamp text from frame of videos
I have a series of videos that contain timestamp in a specific ROI which I have already found. The text can vary between black and white and can mix in the same timestamp. I am struggling to use ...
-1
votes
2
answers
751
views
Tesseract not recognising text in image [closed]
I am trying to use tesseract in python 3.11 to convert some images to text on Windows 11. I have tried preprocessing the images including morphing, enlarging, greyscaling and thresholding but nothing ...
0
votes
1
answer
1k
views
How to find numbers and alphabets in images and blur them(mask them)?
I have to blur the identification number without doing a hard code. I want to find the identification number on its own and blur that number for example
In this image I need to blur the Account ...
1
vote
1
answer
962
views
Tesseract OCR is inaccurate for images with letter spacing
I'm trying to use Tesseract OCR to extract a string of characters (not a valid word) from an image. The issue is that the characters in the image are spaced out, like in the picture below.
With ...
0
votes
0
answers
99
views
Preprocessing for OCR [duplicate]
I am heavily relying on OCR for a project I have been working on, however with my limited understanding of the field I am not sure how to proceed.
I have a list of pdf documents that need to be ...
0
votes
0
answers
446
views
create bounding box on meaningful word
I m using pytesseract.image_to_data()
on this image:
code to create Bounding Box:
import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('Page_2.jpg')
d = pytesseract....
1
vote
0
answers
40
views
Optically detect specific characters in a text document
We try to detect certain rare characters when doing OCR of scans of old documents.
What is the state-of-the-art approach to text-object detection in document image analysis and OCR? Are standard SURF ...
0
votes
0
answers
2k
views
Which are the best practices for Tesseract OCR on low quality images?
I've been working for a while on an OCR solution for my business and I can't seem to get the catch of image filtering for low quality images. The balance between removing the noise and not breaking ...
1
vote
1
answer
2k
views
How to detect text using Tesseract on images with poor camera angles?
I'm working on extracting text on images, that are similar to the one shown below: Warehouse boxes with all kinds of different labels. Images often have poor angles.
My code:
im = cv2.imread('1.jpg')
...
1
vote
1
answer
268
views
How to detect digits from images using pytesseract?
I am trying to detect the text from the images
but fail due to some unknown reasons.
import pytesseract as pt
from PIL import Image
import re
image = Image.open('sample.jpg')
custom_config = r'--oem 3 ...
1
vote
1
answer
1k
views
Tesseract fine tuning error - Compute CTC targets failed
I'm trying to fine-tune tesseract 4.1.1 on my own specific data according to this guide. I want it to become able to detect and recognize text in boxes like that:
I have generated a number of images ...
2
votes
1
answer
727
views
Detect and remove rectangles surrounding character
How can i remove rectangles surrounding characters and digits so i can perform OCR after? Here is an example:
I assume that the lines are continous. I tried to do it with OpenCV contours but so far ...
0
votes
1
answer
962
views
Tesseract - digit regonition with many errors
I want to be able to recognize digits from images. So I have been playing around with tesseract and python. I looked into how to prepare the image and tried running tesseract on it and I must say I am ...
0
votes
0
answers
88
views
Improving the quality of OCR using pytesseract
I'm trying to use pytesseract to recognize text from this image but I'm unable to get satisfactory results.
I've tried a number of things to make it easier for the tesseract to recognize the text. My ...