Newest 'text-mining' Questions

0 votes

1 answer

62 views

catelog sentences into 5 words that represent them

I have dataframe with 1000 text rows. df['text'] I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1) every score will be in df["word1&...

rafine

471

asked Dec 19, 2024 at 10:16

0 votes

1 answer

66 views

similarity from word to sentence after doing words Embedding

I have dataframe with 1000 text rows. I did word2vec . Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king". I ...

rafine

471

asked Dec 9, 2024 at 8:14

0 votes

0 answers

39 views

Spacy Lemmatization not correctly lemmatizing adjectives

I am using spacy to lemmatize some text and it lemmatizes the words robotic to robotic instead of robot. Could someone help me with this? Here is the code: import spacy nlp = spacy.load('en') ...

Mnnth

1

asked Sep 14, 2024 at 13:53

0 votes

0 answers

60 views

How can I do text pre-processing in IBM SPSS Modeler?

I want to do the standard text cleaning of removing stop word, stemming, tokenisation, etc. I think SPSS has limitation but how do people get around with this using SPSS? Chatgpt said to use python ...

Aisyah

1

asked Aug 13, 2024 at 14:46

3 votes

1 answer

79 views

Regex for Parsing Japanese Parliamentary Speeches in Python

I'm a beginner in Python and am working on a project to preprocess Japanese text data for argument mining. I need to extract metadata (e.g., parliamentary session, date, speaker) and the speech ...

Ana17

31

asked Aug 13, 2024 at 10:32

0 votes

1 answer

47 views

Extract Keywords from Text Vector -- one set of keyworks for each element

Please consider the reprex at the end of the post. It works along the lines of https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html It extracts a set ...

larry77

1,533

asked Jun 18, 2024 at 20:30

0 votes

0 answers

38 views

Errors attaching metadata to corpus

I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...

Nicolette

1

asked Jun 14, 2024 at 20:00

0 votes

1 answer

88 views

Unordered txt file contents: How to design in proper dictionary

I have txt file and it's contents are unordered like below sample. I must select first row because it has train run exact time. my txt file has couple of summary 1, 2 and so on. hence, keys are same ...

eric

53

asked Jun 7, 2024 at 17:07

0 votes

0 answers

61 views

pdftools – How to skip errors?

I have an R script that converts all pdf files to text, but the "pdftools" package runs into various errors and stops the process. I would like to include in the code that if it finds an ...

onlyjust17

135

asked May 29, 2024 at 20:48

1 vote

1 answer

50 views

Extracting Text via Web Scraping: Loop with several optional start/ end strings

I would like to webscrape the text of several press statements. The problem I'm, currently having is, to define several strings, where the scraping of the text should start/ end. For example the ...

Alexandra

11

asked Apr 17, 2024 at 12:59

0 votes

1 answer

71 views

Export txt files from a corpus after preprocessing

I am struggling to export files from my corpus after preprocessing, I currently have 26 documents in my corpus, but i want to export them as txt files os they have been pre processed so i can combine ...

Bilal Rashid

1

asked Apr 13, 2024 at 23:49

1 vote

1 answer

54 views

I cannot get past data(stop_words) to analyze text in text mining

It's my first attempt at text mining and I have run into a wall. This is what I have done thus far: library(tm) library(tidytext) library(dplyr) library(ggplot2) text1 <- c("Dear land of ...

Rohan Sagar

31

asked Apr 13, 2024 at 18:02

-1 votes

1 answer

26 views

Error while creating the TDM - "No applicable method for 'meta' applied to an object of class "character""

While creating the tm package TermDocumentMatrix, i am getting error. following code i have used. int_vc <- VCorpus(int_vc) int_vc <- tm_map(int_vc, tolower) int_vc <- tm_map(int_vc, ...

yem

29

asked Oct 20, 2023 at 9:45

0 votes

0 answers

140 views

LDA Topic Modeling Producing Identical/Empty Topics

I am topic modeling on two large text documents (around 500-750 KB) and am asking for ten topics. I keep getting a repeat of two topics. Could this be an issue of the small number of documents? Or ...

Dez Miller

3

asked Oct 14, 2023 at 20:22

0 votes

2 answers

434 views

Python NLTK text dispersion plot has y vertical axis is in backwards / reversed order

Since last month NLTK dispersion_plot seems to have y (vertical) axis in reversed order on my machine. This is likely something about my versions of software (I am on a school virtual machine). ...

drpawelo

2,590

asked Oct 10, 2023 at 0:05

Collectives™ on Stack Overflow

catelog sentences into 5 words that represent them

similarity from word to sentence after doing words Embedding

Spacy Lemmatization not correctly lemmatizing adjectives

How can I do text pre-processing in IBM SPSS Modeler?

Regex for Parsing Japanese Parliamentary Speeches in Python

Extract Keywords from Text Vector -- one set of keyworks for each element

Errors attaching metadata to corpus

Unordered txt file contents: How to design in proper dictionary

pdftools – How to skip errors?

Extracting Text via Web Scraping: Loop with several optional start/ end strings

Export txt files from a corpus after preprocessing

I cannot get past data(stop_words) to analyze text in text mining

Error while creating the TDM - "No applicable method for 'meta' applied to an object of class "character""

LDA Topic Modeling Producing Identical/Empty Topics

Python NLTK text dispersion plot has y vertical axis is in backwards / reversed order

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags