All Questions
3 questions
87
votes
1
answer
1k
views
Inconsistent behaviour with tm_map transformation functions when using multiple cores
Another potential title for this post could be "When parallel processing in R, does the ratio between the number of cores, loop chunk size, and object size matter?"
I have a corpus I am ...
0
votes
1
answer
270
views
Clean subtitle files with 'tm' package and parallel processing
I have 150,000 subtitle files in the "File" format (because I forgot to add .txt to the end of each one when converting from .srt) for which I want to remove everything that isn't text in order to ...
1
vote
1
answer
2k
views
Scaling and parallel processing 'tm' package Term-Document Matrix calculations in R studio?
I need some help making calculating the cosine similarity score of vectors in a term document matrix much faster. I have a matrix of strings and I need to get the word similarity scores between the ...