Newest 'ocr pdf' Questions

1 vote

1 answer

67 views

How to extract bold text from a PDF file [closed]

I'm working on a project where I need to extract only the bold text from PDF files using Python. At first, I tried using libraries like PyMuPDF (fitz) and pdfminer, extracting the PDF as HTML and ...

Marco Floriano

359

asked yesterday

0 votes

0 answers

49 views

PyMuPDF - Extract table contents

I try to extract the table text of a PDF: With the following code code i get: page 0 of page-1-ocr.pdf Tables rowsasf 49 texysdft [['', '', 'Staatlic', 'he Fische', 'rprüfung', 'in Bayern - Prü', '...

Marc

3,934

asked Apr 18 at 19:39

0 votes

0 answers

45 views

Improving OCR Accuracy for Digits Inside Circles from Vector PDF Images (AutoCAD Export)

I'm working on an OCR task where I need to detect and read numbers that are inside circles. The original data comes from .dwg (AutoCAD) files, and the PDFs I use are not scanned — they’re exported ...

Jules Angebault

9

asked Apr 9 at 12:25

1 vote

0 answers

46 views

Preserve Empty Columns When Extracting Tables from PDF

I have 25–30 different types of PDF documents, each containing tables with varying structures. My ultimate goal is to extract table data from specific headings (i.e., between certain titles) and ...

Requiet

85

asked Mar 19 at 12:13

4 votes

2 answers

827 views

Mistral AI OCR not returning anything useful

I am trying to extract a table from a PDF. I was able to use the Le Chat feature of Mistral and get a super great result, but when I try to use the API to programmatically get the same result, I am ...

Shelly Liu

41

asked Mar 9 at 15:12

1 vote

2 answers

113 views

Read numbers under barcode (not barcode stripes itself) in .NET 8

I need help about reading numbers under barcode (not stripes) from PDF file. My idea was to convert page to image and then read it. This is example of barcode that is located in the top right part of ...

Pelle Woah

11

asked Oct 15, 2024 at 11:59

0 votes

0 answers

33 views

How to detect PDF pages with scanned content?

I'm trying to build an algorithm that it is able to detect pages that require to apply AWS Textract because they are scanned content. The use case is that some documents have text plain content but ...

Sebastian Chavarry Gutierrez

25

asked Jul 25, 2024 at 15:36

0 votes

0 answers

73 views

How to convert non-readable PDF into readable PDF with OcrMyPdf: troubles with tesseract and configparser

I'm trying to convert a scanned PDF into a readable one. The original PDF contains text, tables, images/logos. The desired output file should be exactly the same of the original file. I found ...

eljamba

407

asked Jul 18, 2024 at 12:14

1 vote

1 answer

172 views

Embed/Insert/Add JSON OCR data generated by 'Google Cloud Vision (OCR)' inside a PDF file and make the PDF searchable

I am using Google Cloud Vision API (OCR) to detect text in PDF files using the PHP API Library. The OCR is done perfectly and I have saved the complete set of JSON output files (ex. output-1-to-2.json)...

sariDon

7,991

asked Jun 23, 2024 at 20:54

0 votes

1 answer

197 views

Does Datacap have a way to read searchable pdf instead of using OCR?

I'm working with searchable pdf, there is no need to OCR the document. Is there a rule/action in Datacap that can read the data instead of using OCR?

user3067752

65

asked Jun 6, 2024 at 16:00

1 vote

2 answers

1k views

OCR - Azure Document Intelligence to recreate document digitally

Where i work we have lots of scanned documents, we want to digitalize them without losing the general format of the document, a document can have many key-value pairs like forms, titles, plaragraphs, ...

I NN_

185

asked May 23, 2024 at 2:57

1 vote

1 answer

42 views

Definite OCR Quality [closed]

I have a low-quality English PDF file with no image, no table, single-column and completely black and white - not even gray. I used ABBYY FineReader and it detected the text just fine and I can search ...

Ebrahim Mehri

83

asked Apr 25, 2024 at 2:41

0 votes

1 answer

692 views

How can I extract the PDF section/chapter titles with Python?

I want to add the page titles in the pdf to an array with a loop.I have tried many ways so far but I have not succeeded. How can it be done? I tried to do it by selecting the first lines on the page, ...

gofQ

1

asked Apr 19, 2024 at 19:46

0 votes

0 answers

168 views

Extract PDF data in C# using OCR which include datatables

I have a PDF which contains tables and some headers and value field I want to extract this data in c# objects using OCR, so that I can use that to insert in database The PDF data is in the form as ...

Radha

81

asked Apr 5, 2024 at 15:24

0 votes

0 answers

49 views

Issues with Extracting Tables from bank transaction PDFs

I am working on a python code for extracting tables from bank transactional pdfs(not image based pdfs). currently have worked with extracting tabular data using Tabula and Camelot but these are not ...

Gaurav Nambiar

1

asked Feb 19, 2024 at 6:38

Collectives™ on Stack Overflow

All Questions

How to extract bold text from a PDF file [closed]

PyMuPDF - Extract table contents

Improving OCR Accuracy for Digits Inside Circles from Vector PDF Images (AutoCAD Export)

Preserve Empty Columns When Extracting Tables from PDF

Mistral AI OCR not returning anything useful

Read numbers under barcode (not barcode stripes itself) in .NET 8

How to detect PDF pages with scanned content?

How to convert non-readable PDF into readable PDF with OcrMyPdf: troubles with tesseract and configparser

Embed/Insert/Add JSON OCR data generated by 'Google Cloud Vision (OCR)' inside a PDF file and make the PDF searchable

Does Datacap have a way to read searchable pdf instead of using OCR?

OCR - Azure Document Intelligence to recreate document digitally

Definite OCR Quality [closed]

How can I extract the PDF section/chapter titles with Python?

Extract PDF data in C# using OCR which include datatables

Issues with Extracting Tables from bank transaction PDFs

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags