These days one can frequently read about wordclouds created with R, initiated by the release of the wordcloud package by Ian Fellows on July 23rd. So here I am to put in my two cents.

I thought about creating a wordcloud of a complete blog history, so I build a script that connects to a MySQL database and grabs all published posts and pages. All articles are combined in an huge text, that, when purged from tags and special chars, is visualized as a wordcloud:

[cc lang=”rsplus” lines=”-1” file=”pipapo/R/wordpress-wordcloud.R”][/cc]

Enough code, here is the result for my slight blog:

Smart image, isn’t it? Unfortunately it takes about 30 secs to generate it, otherwise it would be cool to create such a cloud live, for example using rApache.

Download: R: wordpress-wordcloud.R (Please take a look at the man-page. Browse bugs and feature requests.)

Martin Scharm

stuff. just for the records.

Do you like this page?
You can actively support me!

4 comments

Tal Galili | Permalink |

VERY cool - thank you for this post!

BTW, it might be worth removing some of the “that” “this” etc words, using the tm package…

Cheers, Tal

[…] passada me deparei com um post interessante do Martin Scharm sobre como fazer wordcloud no R em um dos blogs que leio com frequencia, o R […]

Ray | Permalink |

Very cool. 2 followups: How did you shape the cloud to be round? When I generate my cloud, the words are very spread out, any tips?

Martin Scharm | Permalink |

Hi Ray, thanks for your interest :) Unfortunately, I have no idea what might be wrong with your code. Especially if you do not publish it.. Moreover, it’s been some time that I developed this small piece of code. Thus, the R packages probably have changed their behavior? I hope someone else is able to help you.

Leave a comment

There are multiple options to leave a comment: