All Questions
261 questions
0
votes
0
answers
38
views
Errors attaching metadata to corpus
I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...
0
votes
1
answer
160
views
is package tm suitable for extracting scores from text data?
I have many cognitive assessment data stored as txt files. Each file looks like this:
patient number xxxxxx
score A (98) (95)ile%
score B (100) (97)ile%
test C
score D (76)
...
1
vote
1
answer
184
views
Remove Words with less than Certain Character Lengths plus Noise Reduction before Tokenization
I have the following data frame
report <- data.frame(Text = c("unit 1 crosses the street",
"driver 2 was speeding and saw driver# 1",
"year 2019 was the ...
2
votes
1
answer
250
views
Remove Numbers, Punctuations, White Spaces before Tokenization
I have the following data frame
report <- data.frame(Text = c("unit 1 crosses the street",
"driver 2 was speeding and saw driver# 1",
"year 2019 was the ...
2
votes
1
answer
83
views
Some words won't be stemmed using tm ("easier" or "easiest")
I have large questionaire dataset where some of the features need to be stemmed, with the goal being to assign a topic to each response. However, I'm having trouble stemming some words using the ...
0
votes
1
answer
1k
views
Cosine Similarity Matrix in R
I have a document term matrix, "mydtm" that I have created in R, using the 'tm' package. I am attempting to depict the similarities between each of the 557 documents contained within the dtm/...
0
votes
1
answer
191
views
Dealing with several text columns in a labeled data set while running NLP in R
Hope all of you guys are healthy and well.
I am new to the world of NLP and my question may sound stupid, so I apologize in advance.I would like to perform NLP on some text data which is labeled and ...
0
votes
0
answers
149
views
Failing to create DTM for n-grams in R
I've tried to apply the answer to this question, but it doesn't work. I've used VCorpus to get the docs_es corpus.
docs_es<-readRDS("docs_es.rds")
tokenitzador<-function(x){
unlist(...
1
vote
2
answers
118
views
Mapping the topic of the review in R
I have two data sets, Review Data & Topic Data
Dput code of my Review Data
structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved",
"Sports and physical ...
1
vote
1
answer
77
views
Store multiple corpus via for loop by different names
I have multiple text documents per ticker which I want to store as an individual corpus.
I've read about creating ''lists in lists'', but this doesn't work for me. For example, ''text mining and ...
1
vote
1
answer
1k
views
R text mining: grouping similar words using stemDocuments in tm package
I am doing text mining of around 30000 tweets, Now the problem is to make results more reliable i want to convert "synonyms" to similar words for ex. some user use words "girl", some use "girls", some ...
0
votes
1
answer
215
views
How to create dataframeSource in R? Unable to create a corpus that fits my needs
A beginner here.
I have a dataset of 4 columns, basically news articles, containing columns with names: date, author, title and body (which contains text).
I want to create a corpus, but I don't ...
1
vote
1
answer
332
views
How to remove common word endings from a non-English corpus using the tm package?
I am trying to do some text mining, using tm package, on reviews that Italian users of a certain website wrote there. I scraped the texts, stored them on a corpus, did some sort of cleaning, but when ...
2
votes
1
answer
151
views
R text mining with TM: Does a document contain words that are rare
Using TM package in R, how can I score a document in term of its uniqueness? I want to somehow separate documents with very unique words from documents that contain often used words.
I know how to ...
2
votes
2
answers
195
views
Using tm() to mine PDFs for two and three word phrases
I'm trying to mine a set of PDFs for specific two and three word phrases.
I know this question has been asked under various circumstances and
This solution partly works. However, the list does not ...