Aug

22

News Clouds at The Daily Anvil

Posted by Danny on Saturday, August 22, 2009 at 2:17 am

A Wordle Cloud of text from The Toronto Sun by Byron at blog.thedailyanvil.com.

Because Byron’s the cool sort of cat what wonders about things and then fiddles with them until he has answers, he’s been playing with Wordle’s graphic representation of word frequency to see if he can capture and distinguish the flavour of diverse news publications by feeding in a week’s worth of copy.  It’s a neat smudging of quantification and art: the data could be presented with more rigour and reliability if he’d used a more basic frequency calculator and set rules about what text gets included (advertisements, classifieds, etc., which might be irregularly available or partially OCR’d across the publications he surveyed), but the benefit to this approach is that the presentation essentially distils the visceral impact without wholly discarding the medium—you’re still sort of looking at the emotional impact of the newspaper, just without all the syntax and filler and communication that honestly we probably can’t afford in this economic climate anyway.  His write-up and gallery are available here.  I really, really like the decision to match the colour schemes of the papers to the Wordle clouds generated.

Over brunch a few days ago, we talked about expanding the scale and rigour of this project so that the newspapers would be monitored over a period of perhaps a year.  The benefit would be a normalisation of the subject matter reported—relatively isolated events that snag media attention briefly but totally, like Michael Jackson’s death, say, would tend toward more appropriate representation in the cloud.  Based on Wordle’s current customisation options, here’re my suggestions for such an enterprise:

  • Collect a snapshot of each publication’s content from a uniform position (like the front page, or from each article featured on the front page) at strictly regular intervals, as close to simultaneously as is practical.
  • Aim to use a capture technique that extracts information from images as well as text.  For example, generating a PDF of the webpages and then running the same OCR software over them would tend to liberate information from static image advertisements.
  • Begin the project only after deciding upon a universal “ignore list” of common words to filter out interface noise (if we’re not interested in that data).  Looking at Byron’s gallery, some words I might filter would be “news,” “am/pm,” “video,” “home,” “articles,” etc.  Possibly include the names of the newspapers themselves in the list.
  • Ensure that the final presentation of the Wordle clouds be generated according to the same rules and with the same settings (except colour scheme).  This way tags at a particular size could be more reliably correlated with their frequency across publications.

If anyone’s interested in working with us on this (and assuming that Byron’s still gung-ho after the one-week trial run), it might also be fun to collect a bunch of hypotheses from you folks about what sort of trends you expect to see in terms of register, emphasis, reading-level, etc.  This isn’t science, but it could still be a fun experiment.

Jul

20

Free eBooks and Ways to Find Them

Posted by Danny on Monday, July 20, 2009 at 11:06 am

If anyone’s still out there, you’ve probably had enough Blueberry music. I’m going to be posting a longer article shortly, but I wanted to bring your attention to some handy resources for ebooks, as I’ve just found them. First, there’s MobileRead, which I haven’t just found, but which contains this directory of free ebook lists and author websites. It’s geared mainly to non-academic reading in multiple formats (PDF, Mobipocket, EPub).

They seem to have a small selection of individually-published open ebooks, though I’ve added links to Lawrence Lessig’s author page and Matt Mason’s The Pirate’s Dilemma, so it’s possible that there are other open books to be had which have yet to be catalogued here. It’s a wiki—if you know of something tasty, you should add it to the directory.

In other news, I’m gearing up for thesis research and adding resources to my daunting pile of “oughtta-read-this-before-you-go-talking-to-other-people-dammit” books. It turns out that MIT Press includes a lot of CC books and article series, among them the following that are of interest to me and maybe you:

Iiyoshi, T & M. S. Vijay Kumar (Ed.). (2008). Opening up education: The collective advancement of education through open technology, open content, and open knowledge. Cambridge, Massachusetts: The MIT Press. (Link) I’ve been using articles from this collection since my first project in 2008; it’s full of tasty goodness.

Willinsky, J (2006). The access principle: The case for open access to research and scholarship. Cambridge, Massachusetts: The MIT Press. (Link) I haven’t read this yet but it looks pretty good, if you’re into the OA thang. Leslie Chan gets a mention.

Also, The MacArthur Series on Digital Media and Learning is available as well from the MIT Press.  I’m particularly excited to look at Davidson and Goldberg’s The Future of Learning Institutions in a Digital Age, not surprisingly.

If anyone’s still out there, you’ve probably had enough Blueberry music. I’m going to be posting a longer article shortly, but I wanted to bring your attention to some handy resources for ebooks, as I’ve just found them. First, there’s MobileRead, which I haven’t just found, but which contains this directory of free ebook lists and author websites. It’s geared mainly to non-academic reading in multiple formats (PDF, Mobipocket, EPub).

They seem to have a small selection of individually-published open ebooks, though I’ve added links to Lawrence Lessig’s author page and Matt Mason’s The Pirate’s Dilemma, so it’s possible that there are other open books to be had which have yet to be catalogued here. It’s a wiki—if you know of something tasty, you should add it to the directory.

In other news, I’m gearing up for thesis research and adding resources to my daunting pile of “oughtta-read-this-before-you-go-talking-to-other-people-dammit” books. It turns out that MIT Press includes a lot of CC books and article series, among them the following that are of interest to me and maybe you:

Iiyoshi, T & M. S. Vijay Kumar (Ed.). (2008). Opening up education: The collective advancement of education through open technology, open content, and open knowledge. Cambridge, Massachusetts: The MIT Press. (Link) I’ve been using articles from this collection since my first project in 2008; it’s full of tasty goodness.

Willinsky, J (2006). The access principle: The case for open access to research and scholarship. Cambridge, Massachusetts: The MIT Press. (Link) I haven’t read this yet but it looks pretty good, if you’re into the OA thang. Leslie Chan gets a mention.

Also, The MacArthur Series on Digital Media and Learning is available as well from the MIT Press, but start here for information and links. I’m particularly excited to look at Davidson and Goldberg’s The Future of Learning Institutions in a Digital Age, not surprisingly.

Jun

18

Wordle and Tag Clouds in the Classroom

Posted by Danny on Thursday, June 18, 2009 at 11:13 pm

Wordle cloud of my undergraduate research project

Dad was a devotee of computers probably minutes after discovering them as an undergraduate at Waterloo in the 70s.  He repeatedly tried to instil the same wonder and excitement in me, groping for ways to connect the nature of computing machines to my own interests and probably disappointingly artsy foci—anything, at least, to extend their significance beyond the video games I was playing.  One of the early uses, I learned, was for academics studying literature to compute word frequencies in texts, which was for me at once a completely novel idea, and seemed spectacularly boring and pointless.[1]

I hadn’t thought much about it until “tag clouds” started popping up on popular sites and the possibilities of this kind of data visualisation started revealing themselves despite my benightment.  Recently, Geoff brought Wordle to my attention. Read more »