Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
2 votes
1 answer
354 views

how do i solve AttributeError: 'float' object has no attribute 'encode'

this is the code import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns plt.style.use('ggplot') import nltk df = pd.read_csv('/kaggle/input/starbucks-review-...
Aristo Lie's user avatar
0 votes
0 answers
254 views

Tokenizing Strings without Punctuation in Python and putting punctuation back subsequently

After reading here for a while already, I have decided to make a post because I am not getting anywhere with my problem. Unfortunately, I am just a "finance guy" and need some help in coding ...
Moelis Hardo's user avatar
-1 votes
4 answers
605 views

Remove punctuation marks from tokenized text using for loop

I'm trying to remove punctuations from a tokenized text in python like so: word_tokens = ntlk.tokenize(text) w = word_tokens for e in word_tokens: if e in punctuation_marks: w.remove(e) ...
Hal's user avatar
  • 105
1 vote
0 answers
270 views

Removing punctuation with an exception in python

I am trying to remove punctuation from a given string in python. It works well, however the data I am using includes lots of ":D" or ":)" or ":(". Therefore when I ...
oakca's user avatar
  • 1,568
0 votes
1 answer
1k views

Error when using string.punctuation to remove punctuation for a string

Quick question: I'm using string and nltk.stopwords to strip a block of text of all its punctuation and stopwords as part of data pre-processing before feeding it into some natural language ...
Isaac Nikolai Fox's user avatar
-2 votes
2 answers
2k views

Python program to put proper punctuations in a given string

I want to put proper punctuation marks in a given paragraph having many punctuationless sentences. E.g: input: hey how are you can you come today output: hey, how are you? can you come today? I just ...
Mohit's user avatar
  • 27
1 vote
3 answers
14k views

removing stop words and string.punctuation

i can't figured out why this doesn't works: import nltk from nltk.corpus import stopwords import string with open('moby.txt', 'r') as f: moby_raw = f.read() stop = set(stopwords.words('...
Giacomo Ciampoli's user avatar
6 votes
1 answer
7k views

How to preserve punctuation marks in Scikit-Learn text CountVectorizer or TfidfVectorizer?

Is there any way for me to preserve punctuation marks of !, ?, " and ' from my text documents using text CountVectorizer or TfidfVectorizer parameters in scikit-learn?
Suhairi Suhaimin's user avatar
0 votes
0 answers
112 views

Double quote does not recognised as punctuation in Python 2.7? [duplicate]

I have a question about string.punctuation. I'm using NLTK and I need to clear my text from punctuation (the text is already divided in tokens with function word_tokenize(my_str)). I wrote simple ...
Kyrol's user avatar
  • 3,627
0 votes
1 answer
593 views

Python NLTK not taking out punctuations correctly

I have defined the following code exclude = set(string.punctuation) lmtzr = nltk.stem.wordnet.WordNetLemmatizer() wordList= ['"the'] answer = [lmtzr.lemmatize(word.lower()) for word in list(set(...
carebear's user avatar
  • 771
0 votes
1 answer
132 views

Remove selected punctuation from list of sentences

I have a list of sentences like : [' no , 2nd main 4th a cross, uas layout, near ganesha temple/ bsnl exchange, sanjaynagar, bangalore', ' grihalakshmi apartments flat , southend road basavangudi ...
Hypothetical Ninja's user avatar
0 votes
2 answers
162 views

Insert spaces next to punctuation when writing to .txt file

I have written a function that uses an nltk tokenizer to preprocess .txt files. Basically, the function takes a .txt file, modifies it so that each sentence appears on a separate line, and overwrites ...
alexponline's user avatar
0 votes
1 answer
580 views

Splitting a string after punctuation while including punctuation

I'm trying to split a string of words into a list of words via regex. I'm still a bit of a beginner with regular expressions. I'm using nltk.regex_tokenize, which is yielding results that are close, ...
ktflghm's user avatar
  • 169