topic-modeling

Example (from TfidfTransformer)

if isinstance(docs[0], tuple):
    docs = [docs]
return [self.gensim_model[doc] for doc in docs]

This method expects a list of tuples, instead of an iterable. This means that the entire corpus has to be stored as a lis

@ddbourgin

This is an awesome library, thanks @ddbourgin!!

Users might not know the best way to install this package and try it out. (I didn't, so I eventually just copied the source files.)
Neither the readme nor readthedocs have install instructions.

I couldn't find it on PyPi or Anaconda, and there doesn't appear to be a pyproject.toml, setup.cfg, setup.py, or conda recipe.

Moreover, the t

This is basically a shameless spin off of https://stackoverflow.com/questions/57330300/how-to-reproduce-hypertools-clusters-identified-from-hypertools-plot

I am trying to take the results of using hypertools.plot(...), but my attempts to replicate them by using other parts of hypertools are yielding surprisingly different results.

I would like some guidance on this, but I also feel like ha

Hi there,

I think there might be a mistake in the documentation. The Understanding Scaled F-Score section says

The F-Score of these two values is defined as:

$$ \mathcal{F}_\beta(\mbox{prec}, \mbox{freq}) = (1 + \beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec} + \mbox{freq}}. $$

$\beta \in \mathcal{R}^+$ is a scaling factor where frequency is favored if $\beta

I would like to know what all the abbreviations mean? Some I can guess, like "PUNCT", but no idea what "X" might be. I want to retain contractions, but hard to choose options without documentation.

Thanks. Great performance code!

When using artm.SmoothSparseThetaRegularizer(tau=tau_val) with tau_val<0 we get some \Theta matrix columns filled totally with zeros. From perplexity score, the optimization converges. The quantity of documents with all zeros in their \Theta columns grows as $tau_val->-\infty$.
How it's possible that optimization constraint on theta columns violates?

Hi,
I used to have a previous version of LDAvis (2014) installed with devtools.
In the version I had of LDAvis I would call createJSON as:
json <- createJSON(K, phi, term.frequency, vocab, topic.proportions)

Today I updated my R packages and have a newer vesion of LDAvis (from CRAN) which uses createJSON as:
json <- createJSON(phi, theta, doc.length, vocab, term.frequency)

I'm using MALLET for t

Hello

I have 200k documents and I create 100 topics. I look at the terms and see that the topics are good.
But when I want to look at examples for each topic I do probs, _ = topic_model.transform(count_matrix, details=True). Then I create new column for each for example dataframe['topic=0']=pd.Series(probs[:, 0]). Then I sort dataframe by prob value decrease and I see that about 1/3 of the

@vi3k6i5

Hi @vi3k6i5 ,

I'm trying guided lda on six reviews data by initializing seed confiedence of 0.15, but they are not moving up the list as expected.

code below:

df = pd.DataFrame(corpus,columns=['Review'])

import spacy

nlp = spacy.load("en_core_web_sm")

from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English
import string
from unidecode import uni

Running on GoogleColab with python 3 + GPU:
Issue in preprocess.py on line #26 for method nlp():

nlp = spacy.load('en')
text = nlp(text, tag=True, parse=False, entity=False)

nlp() unknown arguments (e.g. - tag, etc.)
changed to this:
text = nlp(text)

Is there a way to get the topic mixture of each document back out from a hierarchical model? I am training a HLDAModel:

h_mdl = tp.HLDAModel(depth=4,corpus=corpus,seed=1)
    
for i in range(0, 100, 10): #Train the model using Gibbs-sampling
    h_mdl.train(10)
    print('Iteration: {}\tLog-likelihood: {}'.format(i, h_mdl.ll_per_word))

I am using the Document class to access insta

topic-modeling

Here are 741 public repositories matching this topic...

RaRe-Technologies / gensim

ddbourgin / numpy-ml

baidu / Familia

ContextLab / hypertools

JasonKessler / scattertext

owlbarn / owl

dselivanov / text2vec

bigartm / bigartm

iwangjian / Paper-Reading

cpsievert / LDAvis

stephenhky / PyShortTextCategorization

gregversteeg / corex_topic

vi3k6i5 / GuidedLDA

stepthom / text_mining_resources

jmartinezheras / 2018-MachineLearning-Lectures-ESA

primaryobjects / lda

ruidan / Unsupervised-Aspect-Extraction

yangliuy / LDAGibbsSampling

datquocnguyen / LFTM

dongrixinyu / chinese_keyphrase_extractor

hugochan / KATE

dice-group / Palmetto

WZBSocialScienceCenter / tmtoolkit

TropComplique / lda2vec-pytorch

dipanjanS / learning-social-media-analytics-with-r

AdrienGuille / TOM

bab2min / tomotopy

lettier / lda-topic-modeling

yuewang-cuhk / TAKG

lmcinnes / enstop

Improve this page

Add this topic to your repo