topic-modeling
Here are 741 public repositories matching this topic...
This is an awesome library, thanks @ddbourgin!!
Users might not know the best way to install this package and try it out. (I didn't, so I eventually just copied the source files.)
Neither the readme nor readthedocs have install instructions.
I couldn't find it on PyPi or Anaconda, and there doesn't appear to be a pyproject.toml
, setup.cfg
, setup.py
, or conda recipe.
Moreover, the t
-
Updated
Mar 28, 2020 - C++
This is basically a shameless spin off of https://stackoverflow.com/questions/57330300/how-to-reproduce-hypertools-clusters-identified-from-hypertools-plot
I am trying to take the results of using hypertools.plot(...)
, but my attempts to replicate them by using other parts of hypertools are yielding surprisingly different results.
I would like some guidance on this, but I also feel like ha
Hi there,
I think there might be a mistake in the documentation. The Understanding Scaled F-Score
section says
The F-Score of these two values is defined as:
$$ \mathcal{F}_\beta(\mbox{prec}, \mbox{freq}) = (1 + \beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec} + \mbox{freq}}. $$
$\beta \in \mathcal{R}^+$ is a scaling factor where frequency is favored if $\beta
-
Updated
May 12, 2020 - OCaml
I would like to know what all the abbreviations mean? Some I can guess, like "PUNCT", but no idea what "X" might be. I want to retain contractions, but hard to choose options without documentation.
Thanks. Great performance code!
When using artm.SmoothSparseThetaRegularizer(tau=tau_val) with tau_val<0 we get some \Theta matrix columns filled totally with zeros. From perplexity score, the optimization converges. The quantity of documents with all zeros in their \Theta columns grows as $tau_val->-\infty$.
How it's possible that optimization constraint on theta columns violates?
-
Updated
May 25, 2020
Hi,
I used to have a previous version of LDAvis (2014) installed with devtools.
In the version I had of LDAvis I would call createJSON as:
json <- createJSON(K, phi, term.frequency, vocab, topic.proportions)
Today I updated my R packages and have a newer vesion of LDAvis (from CRAN) which uses createJSON as:
json <- createJSON(phi, theta, doc.length, vocab, term.frequency)
I'm using MALLET for t
-
Updated
May 21, 2020 - Python
Hello
I have 200k documents and I create 100 topics. I look at the terms and see that the topics are good.
But when I want to look at examples for each topic I do probs, _ = topic_model.transform(count_matrix, details=True)
. Then I create new column for each for example dataframe['topic=0']=pd.Series(probs[:, 0])
. Then I sort dataframe by prob value decrease and I see that about 1/3 of the
Hi @vi3k6i5 ,
I'm trying guided lda on six reviews data by initializing seed confiedence of 0.15, but they are not moving up the list as expected.
code below:
df = pd.DataFrame(corpus,columns=['Review'])
import spacy
nlp = spacy.load("en_core_web_sm")
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English
import string
from unidecode import uni
-
Updated
May 4, 2020
-
Updated
Feb 17, 2020 - Jupyter Notebook
-
Updated
Jul 16, 2019 - JavaScript
-
Updated
Mar 25, 2020 - Python
-
Updated
Feb 9, 2020 - Java
-
Updated
May 8, 2017 - Java
-
Updated
May 23, 2020 - Python
-
Updated
Jan 29, 2020 - Python
-
Updated
Apr 8, 2020 - Java
-
Updated
Mar 31, 2020 - Python
Running on GoogleColab with python 3 + GPU:
Issue in preprocess.py on line #26 for method nlp():
nlp = spacy.load('en')
text = nlp(text, tag=True, parse=False, entity=False)
nlp() unknown arguments (e.g. - tag, etc.)
changed to this:
text = nlp(text)
-
Updated
Jun 28, 2017 - R
Is there a way to get the topic mixture of each document back out from a hierarchical model? I am training a HLDAModel:
h_mdl = tp.HLDAModel(depth=4,corpus=corpus,seed=1)
for i in range(0, 100, 10): #Train the model using Gibbs-sampling
h_mdl.train(10)
print('Iteration: {}\tLog-likelihood: {}'.format(i, h_mdl.ll_per_word))
I am using the Document class to access insta
-
Updated
Mar 2, 2018 - PureScript
-
Updated
Aug 5, 2019 - Python
-
Updated
May 14, 2020 - Python
Improve this page
Add a description, image, and links to the topic-modeling topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the topic-modeling topic, visit your repo's landing page and select "manage topics."
Example (from TfidfTransformer)
This method expects a list of tuples, instead of an iterable. This means that the entire corpus has to be stored as a lis