These days one can frequently read about wordclouds created with R, initiated by the release of the wordcloud package by Ian Fellows on July 23rd. So here I am to put in my two cents.

I thought about creating a wordcloud of a complete blog history, so I build a script that connects to a MySQL database and grabs all published posts and pages. All articles are combined in an huge text, that, when purged from tags and special chars, is visualized as a wordcloud:

[cc lang=”rsplus” lines=”-1” file=”pipapo/R/wordpress-wordcloud.R”][/cc]

Enough code, here is the result for my slight blog:

Smart image, isn’t it? Unfortunately it takes about 30 secs to generate it, otherwise it would be cool to create such a cloud live, for example using rApache.

Download: R: wordpress-wordcloud.R (Please take a look at the man-page. Browse bugs and feature requests.)

Martin Scharm

stuff. just for the records.


Tal Galili | Permalink | 2011-08-03 22:11:07

VERY cool - thank you for this post!

BTW, it might be worth removing some of the “that” “this” etc words, using the tm package…


[…] passada me deparei com um post interessante do Martin Scharm sobre como fazer wordcloud no R em um dos blogs que leio com frequência, o R […]

Ray | Permalink | 2014-05-20 16:53:15

Very cool. 2 followups: How did you shape the cloud to be round? When I generate my cloud, the words are very spread out, any tips?

Martin Scharm | Permalink | 2014-05-20 18:34:09

Hi Ray, thanks for your interest :)
Unfortunately, I have no idea what might be wrong with your code. Especially if you do not publish it.. Moreover, it’s been some time that I developed this small piece of code. Thus, the R packages probably have changed their behavior?
I hope someone else is able to help you.

Post a comment

read more about submitting comments