335 questions
-1
votes
1
answer
69
views
extraction of redline of word document converted to pdf using python library
I have a set of documents that were edited in Microsoft Word. Take for example a document that says "This is a test document." The document is then edited, with track changes on, to read &...
2
votes
1
answer
90
views
how can i extract the form information from a .mdb file?
I am after the form specification data - all the parameters you would type in in the design view for a form.
I have used jackess to access the .mdb file.
I fiddled with permissions on MSysObjects ...
1
vote
0
answers
49
views
Need raw text segregated into just title and content , from wikipedia dump (English) [duplicate]
I am working on a Full Text Search Implementation (sort of a matching algorithm) in a tool called Tantivy_py , I tried with a small text source and it worked smoothly , Now i want to test it on a very ...
1
vote
1
answer
104
views
Extracting phylogenetic tree information from images using machine learning
There are various machine learning models (Claude, chatGPT, etc) which can be used to extract machine-readable information from images. Has anyone seen cases of successfully extracting Newick format ...
1
vote
1
answer
151
views
Is there any OCR or technique that can recognize/identify radio buttons printed out in the form of pdf document?
I have a pdf document with radio responses like attached screenshot. I want to extract the selected response only through python or any OCR technique. Is there any way of doing it?
(https://i.sstatic....
0
votes
0
answers
20
views
Extracting conditional numeric values from character data in R
Stuck on a data tidying problem, and not sure how to work around it. I have messy character data on whisky, which I'm looking to organise so that I can conduct some analyses. Specifically, I'm looking ...
0
votes
0
answers
135
views
Remove Bg fill from tables in pdf using pymupdf/fitz or pdfminer/pdfplumber
I want to remove background fill in cells of table. tired using get_drawings() form fitz, I'm able to change the fill value in drawing object but It reset back to original value before saving the pdf.
...
0
votes
2
answers
366
views
Is there a way to extract unmatched data from a cell string in excel?
I have been given a excel file which contain columns, and within each cell of the column there are multiple entries separated by commas, as
Column 1
Column 2
Column 3
A1, A7, A11, B12, B15
A1, A7, A11,...
1
vote
3
answers
102
views
Identify sequences in alphanumeric strings in R
I am attempting to create a flag for when transaction IDs are sequential. For reasons that I will not get into here, these can be a red flag. The problem that I am having is that the IDs are not ...
0
votes
0
answers
17
views
Extraction particular portion from text file using python
As mention below I have A1 to A300 Specific set of information in a single text file named full change.txt
If *****
Begin
****A1
End
Go
If *****
Begin
****A2
End
Go
………..
I have 300 files and each ...
-1
votes
2
answers
90
views
how to get number from string without regex
Without using regex, is there a way to get number from a string in JavaScript?
For example, if input is "id is 12345", then I want 12345 number in output.
I know a lot of regex solutions, ...
0
votes
1
answer
158
views
Is there a way to tell spaCy that certain words are related to a certain number? e.g. Feed rate and aspirator rate were 3l/hr and 100% respectively
I'm very new to Python, spaCy, and even stack overflow in general. So forgive me if my question is too vague. I would like to ask if there's a way to tell spaCy that certain words in a sentence are ...
0
votes
1
answer
174
views
Using GPT-3 to identify relationships in a corpus
I have a corpus of 15K news articles. I would like to train a GPT model (3 or 4) to ingest these texts and then output how the locations, events, actions, participants, and things described in the ...
0
votes
1
answer
146
views
How can i extract one value from this .xml to a string? c#
The xml file
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MPL Version="2.0" Title="JRSidecar" PathSeparator="\">
&...
1
vote
1
answer
204
views
Is there a way to return multiple values from csv file in function with statistics? (string & float)
I'm pretty new to this - I'm working on a basic sales csv file to extract multiple values.. The csv contains a list of months and the number of sales for that month as well as other columns but these ...