All Questions
Tagged with text-mining tm
275 questions
0
votes
0
answers
38
views
Errors attaching metadata to corpus
I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...
-1
votes
1
answer
26
views
Error while creating the TDM - "No applicable method for 'meta' applied to an object of class "character""
While creating the tm package TermDocumentMatrix, i am getting error. following code i have used.
int_vc <- VCorpus(int_vc)
int_vc <- tm_map(int_vc, tolower)
int_vc <- tm_map(int_vc, ...
0
votes
1
answer
160
views
is package tm suitable for extracting scores from text data?
I have many cognitive assessment data stored as txt files. Each file looks like this:
patient number xxxxxx
score A (98) (95)ile%
score B (100) (97)ile%
test C
score D (76)
...
1
vote
1
answer
184
views
Remove Words with less than Certain Character Lengths plus Noise Reduction before Tokenization
I have the following data frame
report <- data.frame(Text = c("unit 1 crosses the street",
"driver 2 was speeding and saw driver# 1",
"year 2019 was the ...
2
votes
1
answer
250
views
Remove Numbers, Punctuations, White Spaces before Tokenization
I have the following data frame
report <- data.frame(Text = c("unit 1 crosses the street",
"driver 2 was speeding and saw driver# 1",
"year 2019 was the ...
1
vote
1
answer
271
views
How can I extract bigrams from text without removing the hash symbol?
I am using the following function (based on https://rpubs.com/sprishi/twitterIBM) to extract bigrams from text. However, I want to keep the hash symbol for analysis purposes. The function to clean ...
1
vote
0
answers
26
views
Why does the clean.text() function change word frequencies?
I am doing text analysis and reading articles into R. When I use the clean.text() function from TextReg to clean the text of a corpus and then look up word frequencies using term_stats() from tm, the ...
2
votes
1
answer
83
views
Some words won't be stemmed using tm ("easier" or "easiest")
I have large questionaire dataset where some of the features need to be stemmed, with the goal being to assign a topic to each response. However, I'm having trouble stemming some words using the ...
0
votes
1
answer
1k
views
Cosine Similarity Matrix in R
I have a document term matrix, "mydtm" that I have created in R, using the 'tm' package. I am attempting to depict the similarities between each of the 557 documents contained within the dtm/...
0
votes
1
answer
191
views
Dealing with several text columns in a labeled data set while running NLP in R
Hope all of you guys are healthy and well.
I am new to the world of NLP and my question may sound stupid, so I apologize in advance.I would like to perform NLP on some text data which is labeled and ...
0
votes
0
answers
149
views
Failing to create DTM for n-grams in R
I've tried to apply the answer to this question, but it doesn't work. I've used VCorpus to get the docs_es corpus.
docs_es<-readRDS("docs_es.rds")
tokenitzador<-function(x){
unlist(...
1
vote
2
answers
118
views
Mapping the topic of the review in R
I have two data sets, Review Data & Topic Data
Dput code of my Review Data
structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved",
"Sports and physical ...
1
vote
1
answer
77
views
Store multiple corpus via for loop by different names
I have multiple text documents per ticker which I want to store as an individual corpus.
I've read about creating ''lists in lists'', but this doesn't work for me. For example, ''text mining and ...
1
vote
1
answer
1k
views
R text mining: grouping similar words using stemDocuments in tm package
I am doing text mining of around 30000 tweets, Now the problem is to make results more reliable i want to convert "synonyms" to similar words for ex. some user use words "girl", some use "girls", some ...
0
votes
1
answer
215
views
How to create dataframeSource in R? Unable to create a corpus that fits my needs
A beginner here.
I have a dataset of 4 columns, basically news articles, containing columns with names: date, author, title and body (which contains text).
I want to create a corpus, but I don't ...