Friday, November 7, 2014

Climate blog index again

About a year ago, I described a Javascript exercise I began mid-2013, when Google Reader discontinued. I thought I might write my own RSS reader, with indexing capability. I found that feedly was a good replacement for Reader, so that didn't continue. However, I thought a more limited RSS index of climate blogs would be handy. A big motivation was just to have an index of my own comments (to avoid boring the public with repetition).

So I set up a page, and set my computer to reading the RSS outputs every hour. The good news is that that has happened more or less continuously. The bad news was that junk accumulated, and downloading was slow.

So I've done two new main things:
  • Pruned the initial download. I had already reduced the initial offerring to just two days of comments. But I still downloaded details of all threads and commenters. More than half the commenters listed had only ever made one comment. They include of course various spammers, and typos. So I removed them, unless their comment was in the last month. I also divided the threads into current and dormant (no activity in two months). Current are downloaded at start; dormant can be added (button), or will come automatically if data more than two months ago is requested. It's faster, if not fast.
  • I've added a facility where a string is shown that you can add to the URL to get it to go to the current state. That includes selected index items (commenter etc) and months. The main idea is that you can store a URL which will go straight to a list of your own comments over some period (remembering that each month takes a while to download). Examples below.
I've added Sou's blog HotWhopper - of course it currently has a short history. Explanation in detail is at the page.

As with my blogroll, I've included blogs with broad readership; not necessarily the ones I recommend.

Here are some examples of selections:
Stoat, last two months (two months takes a few seconds to load)
My comments at WUWT, last two months
Posts by Bob Tisdale at WUWT in November


  1. Um. Your index is full of WUWT comments. If I want to wade through their trash, I'll go straight to the source. But I don't. So I think, to be useful, you need a convenient way to filter out the high-volume junk.

    1. William,
      "So I think, to be useful, you need a convenient way to filter out the high-volume junk"
      That's exactly what you can do. If you want the real stuff, go to the selection panel and enter "Stoat_". Everything else disappears. For even more wisdom, enter Moyhu_. Then you get both. If you never want to see WUWT again, keep the URL suffix, and use that.

      OK, here is that URL. It's just the last two days, but you can ask for more months.

    2. "Everything else disappears."
      Actually it doesn't immediately. But it does when you next reorder a column.

      Here is the corrssponding list for Oct-Nov. Takes a few seconds to load.

    3. The RSS feeds are triple-store. My earth sciences server is a semantic web application, so it is very simple to add these kinds of feeds to it.

      The semantic web, or something like it, is the future even though we may not realize it. Something like that will be needed to create order out of information chaos. Or maybe not :)

    4. Hi Web,
      You got me reading about the semantic web. Interesting stuff.

      I'm probably doing this the hard way. I use R to parse the XML, then just pick out stuff. There are differences - HotWhopper has the posts RSS in a different place to the comments, so I get a different URL (I use the URL to attach comments to threads). There are probably more semantic clues I should be looking for.

    5. NIck,
      You are not alone. RSS is an RDF format, and RDF is synonymous with triple-store. The beauty of triple-store is how extensible the linking and association of data sets can become. A variety of query languages such as SPARQL are available which allow you to avoid the XML parsing and concentrate on the semantic content.

      So don't worry. Very few people use it the way it is intended. Papers are written on expediency versus formality when it comes to adopting semantic rigor.

  2. Nick,
    The relative heights of the two frames lead to the situation that the selection boxes and the inner frame are almost fully visible in the outer frame, but not fully. Reducing the height of the inner frame a little (from 600 to 510, if I interpret the page code correctly) might make using the index more convenient as that would make in unnecessary to scroll at all the outer frame (whose scroll bar is in addition invisible unless the screen is wide, about 1650 pixels or more.).

    1. Pekka,
      Thanks, I've changed to 510. The outer frame is an ongoing prob;lem. I've expanded the outer frame, but now it can disappear. I may resort to absolute coordinates, but there are ways that can go wrong too.