|Title:||“For the times they are a-changin’”|
|Abstract:||This paper deals with the problem of deriving consistent time-series from newspaper contentbased topic models. In the first part, we recapitulate a few our own failed attempts, in the second one, we show some results using a twin strategy, that we call prototyping and seeding. Given the popularity news-based indicators have assumed in econometric analyses in recent years, this seems to be a valuable exercise for researchers working on related issues. Building on earlier writings, where we use the topic modelling approach Latent Dirichlet Allocation (LDA) to gauge economic uncertainty perception, we show the difficulties that arise when a number of one-shot LDAs, performed at different points in time, are used to produce something akin of a time-series. The models’ topic structures differ considerably from computation to computation. Neither parameter variations nor the accumulation of several topics to broader categories of related content are able solve the problem of incompatibleness. It is not just the content that is added at each observation point, but the very properties of LDA itself: since it uses random initializations and conditional reassignments within the iterative process, fundamentally different models can emerge when the algorithm is executed several times, even if the data and the parameter settings are identical. To tame LDA’s randomness, we apply a newish “prototyping” approach to the corpus, upon which our Uncertainty Perception Indicator (UPI) is built. Still, the outcomes vary considerably over time. To get closer to our goal, we drop the notion that LDA models should be allowed to take various forms freely at each run. Instead, the topic structure is fixated, using a “seeding” technique that distributes incoming new data to our model’s existing topic structure. This approach seems to work quite well, as our consistent and plausible results show, but it is bound to run into difficulties over time either.|
latent Dirichlet allocation
|Appears in Collections:||DoCMA Working Papers|
This item is protected by original copyright
All resources in the repository are protected by copyright.