Mark Higginson

staccato signals of constant information 
Filed under

web

 

Moving to Posterous...

I've had a self-hosted Wordpress blog for a while but my post frequency has dropped right off. I think this may in part be due to a feeling that however straightforward it is to use writing a post is an 'event' I need to set aside time for... hence I never make the time as there is always something more important I should be doing. I'm going to see if trying something new gets some spontaneity back into my blogging so I'm going to try using Posterous, a service where you post by email (note that you can do this with Wordpress too).

Swapping to Posterous has been pretty straightforward. There's an option to import your entire blog which works pretty well. The only thing that doesn't work is that comments don't come across, not a big deal in my case as there weren't many on my old blog so I'll probably include these as addendums to the original posts for posterity. They are apparently working on this.

As I intend to ditch the hosting I pay for I need to make sure the links to the photos in my old posts are changed as these point to my current webspace. I'm only going to use free services so I think I'll use a combination of links to flickr for the better photos and use Picasa for storing other images I want to use.

My next step is making sure everything is formatted properly then switching my own domain to point here.

One nice thing I found was if you email help@posterous.com it's Sachin Agarwal, one of the co-founders of Posterous who answers your question... he does so really promptly too!

Update: there's a good interview with Sachin on Download Squad introducing Posterous. They came through Y Combinator last year. Note how the first question in the comments gets a reply from Sachin straightaway. It's so good to see someone 'on it' like this. Same goes for the other founder, Garry Tan, on this article.

Filed under  //   web  

Comments [0]

I'm sorry Dave. I'm afraid I can't do that...

In the post Dashboards, scorecards and sentiment I wrote about why I don't think computers can accurately assess the emotional meaning of a sentence. This article from The New York Times entitled Mining the Web for Feelings, Not Facts touches on how performing exactly this function is a "growing business". What's interesting to me is how often I hear it repeated that these algorithms are "70 to 80 percent accurate" often with the addendum that people only agree on the meaning of something 70 to 80 percent of the time. You are being invited to commit a logical fallacy, in as much as the implicit suggestion is that the algorithm is about as accurate as human assessed sentiment. This isn't the case. The article does touch on this highlighting the following:

"A quick search on Tweetfeel, for example, reveals that 77 percent of recent tweeters liked the movie 'Julie & Julia'. But the same search on Twitrratr reveals a few misfires. The site assigned a negative score to a tweet reading 'julie and julia was truly delightful!!' That same message ended with 'we all felt very hungry afterwards' — and the system took the word 'hungry' to indicate a negative sentiment."

In my experience when the computer gets it wrong it gets it wrong in a way a human wouldn't. These monitoring companies are effectively saying that 20 to 30 percent of their data cannot be relied upon. Were the data to be assessed by people the areas of disagreement would actually be highly useful as there are probably interesting reasons as to why the disagreement was occurring, specifically to with context that cannot be assessed by looking at one sentence.

Although this figure of 70 to 80 percent accuracy gets thrown around I have yet to see a monitoring company that supplies a dataset that has been assessed by both its algorithm and a team of humans to prove that this measure of 'accuracy' is one that can be relied on. A set of results like this would also allow us to see where the computer and the human assessors disagree which, given it's people's opinions we are actually trying to quantify is something worth testing.

Most sentiment analysis systems place opinion in one of three buckets, either positive, neutral or negative. This sounds superficially plausible, but if you've ever looked at hundreds of mentions around a keyword or topic you quickly realise that this doesn't really fit with how people express opinions or have conversations. Combined with the lack of accuracy, the lack of nuance in these assessments reduces the value of these tools.

Many dashboards I've seen expect you to take the figures they provide as a given. If you go behind the percentages into the data you start to realise that you do not have a system on which to make reliable judgements, which is after all where the supposed value of these tools lies.

Scout Labs have a good post entitled How does sentiment work? And how accurate is it, anyway? that is worth reading as they try and address these issues which few other companies offering these services have even touched on. They mention the use of Mechanical Turk as a way of being able to assess sentiment using humans and point to a good paper on the problems with this way of doing things. My feeling is that for most of these companies the issue is that they are offering a volume service and that the only way to realistically process the vast amounts of data generated is to use a computer, which for now is an imperfect way of trying to provide what would be very useful information.

Filed under  //   web  

Comments [0]

Panopticon singularity...

I had been looking for a way into discussing what is defined as ‘social media’ when I encountered this funny post in which the author alludes to its panoptical nature. It provoked a lot of rattling of cell bars in the comments though no one recognised that the problem lies in the definition itself. In this shared reality people ‘go and do stuff on the web’. In the parallel world of marketing these people may or may not be described as ‘participating in social media’. It has been defined as:

"... a fusion of sociology and technology, transforming monologues into dialogues... the democratization of information, transforming people from content readers into publishers. Businesses also refer to social media as user-generated content or consumer-generated media."

Similar to this argument that the web is neither subject nor object we should question our acceptance of the idea that ‘social’ is part of the fabric of the ‘media’ itself rather than an outcome of the discourse it provokes. The definition is based on the presupposition that, in the context described, conversation ("monologues", "dialogues") is part of a power relationship ("democratization") in which currently the discussion is owned ("consumer-generated"). The implication is that this social technology places the means of production in the hands of content creators; the only other form that may exist being that of professional operations ("publishers"). This is a capitalist fantasy of how the social dynamics of the web function. Before the term was used by a groups of experts, was contributing to forums, newsgroups, using IRC or sending group emails social media? Was participating on Plastic back in the distant days of 2001 social media? How about chatting with friends? Or painting on the wall of a cave? Here’s my alternate definition:

"The monitoring of comment and opinion on the web by power elites for the purposes of reporting and response with the goal of altering the perception surrounding the interests of the organisation concerned."

Why the difference? Well, if you consider what the first definition is trying to describe then nothing has actually changed apart from the fact that what is loosely called ‘conversation’ can now be interrogated through the use of technology as part of a permanent and ever-expanding dossier on people’s opinions. Evidence for justification of this use of social technology is to be found in the language used; definitions are being created by self-appointed gatekeepers to knowledge who understand that this perception management can be geared towards attaining ‘competitive advantage’ for their clients and themselves and that in this context this is being driven by the profit motive. Take this chap, a self-described “evangelist of social media”, and his identification of this issue:

"How do these corporations intend to use these vast records of our behavior... corporations whose main motivation is not in service of 'customer empowerment' but on the traditional goals of manipulating behavior to grow their share of wallet."

Customer empowerment is one and the same as market share in the larger scheme of things. He can’t see beyond this, thinking our defining role in society is as consumers, confused as to which side of the bars he is actually on. This artificial distinction that is called social media by its proponents could more accurately be seen and described as an attempt to form a social technocracy via the co-option of ideas that define a framework that already exists, that people are widely aware of and is determined to be suitable for manipulation, i.e. the web. To further the achieving of this goal it is useful to question the 'nature of the thing', not to develop the 'thing' per se but to give it a discrete integrity that then allows the testing of the boundaries that presently define it, e.g. referencing The Enlightenment is an attempt to give it both a historicity and a validity for the purposes of advancing the overarching agenda.

Update: I am sobbing quietly. Responsibility rests with this post entitled: Social Media is the New Punk. There is a hideous video with sound and everything. I rest my case.

2 original comments:

Terrif post. I think you’re talking along the lines of the extraction of surplus value and the harnessing of mass intellect Marky Marxist.
I recommend that you have a look at the work of Adam Arviddson. Quite a few of his papers online.
He doesn’t write explicitly about social meed, but I think much of his argument about branding is very relevant here.

Comment by Chloe — 1 July, 2009 @ 2:08 pm

Thank you for the comment; it was your fantastic Bruno Latour post that set me thinking about this. I will go and look up Adam Arviddson immediately while I’m still occupied with these ideas.
Comment by Mark — 1 July, 2009 @ 2:15 pm

Filed under  //   dissonance   technology   web  

Comments [0]

Oh so quiet...

I start with a clean white Google search page, I have Adblock Plus installed and use Readability to bring the content I find to the fore without any distractions. Add quietube and I'm all set:

"... watch web videos without the comments and crap..."

Filed under  //   recommendations   web  

Comments [0]

Google's dominance of search...

The excellent Kottke.org has a post on Google's search dominance. When working on projects we need data to support our conclusions and rely on sources such as Hitwise, Comscore, Nielsen, etc. I often have serious doubts about the accuracy of the information supplied. Kottke references Comscore reporting search breaking down like so:

  • 64% Google
  • 20% Yahoo!
  • 08% Microsoft

His own stats come out as follows:

  • 94% Google
  • 03% Microsoft
  • 01% Yahoo!

It could be that visitors to Kottke.org fit a certain profile, perhaps not the kind of person who defaults to a portal when they open their browser. With this in mind I took a look at the analytics for a few sites I have access to, including blogs and commercial operations. The results came out as follows:

  • 91% Google
  • 04% Yahoo!
  • 04% Microsoft

I'd be very interested to see figures from other people as this makes it look like Comscore's data is wildly off. I wonder what the explanation could be? From my perspective Google have search completely stitched up and no-one is in much of position to challenge their dominance. The figures show that recent articles about Wolfram Alpha or bing are either written by people with no understanding of search or who are simply keen on attention-grabbing headlines over substance and inquiry.

Update: TorrentFreak illustrates the point I make above in this post: Nielsen Hugely Underestimates BitTorrent Traffic. The inaccuracy of the data from these measurement companies is highlighted by a recent news story from The Age that quotes Nielsen data for visits by Australians to BitTorrent search engines. Apparently mininova provided their statistics so their data could be checked out. Nielsen data turned out to be wildly inacurate with traffic to mininova alone being 600% higher than Nielsen reported to a number of BitTorrent search engines. In conclusion:

Data from measurement companies isn't worth much beyond a vague guide to broad trends

I can't actually find Nielsen saying anything about this directly but my question to them if this is true would be: "Why should we trust any other conclusions you produce if your data is apparently so unreliable?".

Update: I'm honour-bound to point out these results from Hitwise that show Google taking a 90%+ share of search.

Update: please see the comment below from Mark Higginson (not me, another one!) who is Director of Analytics at Nielsen in Australia. He points out that the figures they supplied were not a measure of visits as the TorrentFreak article stated but instead measured 'unique audience', a metric I'm not familiar with. This goes someway to clearing up this confusion but also shows how easy it is to make assumptions about what is being claimed from audience figures that may be being calculated in completely different ways. I really appreciate Mark taking the time to comment on this post and clarify this matter.

4 original comments:

The difference between those figures could be explained by the simple fact that Hitwise, Comscore, Neilsen etc is based on page views yeah? Where as your data and Kottke.org’s data is based on referrals. MSN is the default home page for people with internet explorer. A lot of people can’t be bothered, or don’t know how, to change their home page. So that means every time they open up IE, the MSN home page comes up. They may not use it for search, in fact they may even search for Google in Yahoo or msn/live/bing.
Additionally, people may use Yahoo or Bing, but may not find what they’re looking for and resort to Google.
Numbers are only ever a snapshot of a particular viewpoint, so they could potentially explain the discrepancy.

Comment by Scott — 5 June, 2009 @ 10:03 am

That’s a very good point because don’t Comscore et. al. collect data from a small sample of individuals with an app installed on their computers? I wonder if this means a ‘view’ is being confused with ‘use of’ as you point out?
Comment by Mark — 5 June, 2009 @ 10:07 am

Hi Mark,
Mark Higginson in Australia here. Long time between meetings. :-)
I was the one who supplied the figures to The Age in my role as Director of Analytics here at Nielsen Online Australia and part of the issue with the article mentioned was the journalist using the term “visits”. The numbers we supplied were actually Unique Audience, not visits, not even Unique Browsers – and comes from a panel methodology.
So whilst TorrentFreak didn’t know it, the comment in their article “This may sound like a lot of traffic, but since Nielsen reports the number of visits and not the unique visitors we expected it to be much higher” was incorrect. The comparison to the figures that Mininova report isn’t apples with apples – plus Mininova aren’t able to report on Unique Audience – the best they would be able to do is Unique Browsers, which with cookie deletion etc can be widely overestimated as an indication to actual user numbers.

Comment by Mark Higginson (the other one - in Australia — 30 July, 2009 @ 7:19 am

Hi Mark,
Many thanks for the comment and apologies for the delayed response, I’ve had a busy, busy August getting married.
I think what this shows is the confusion around the different measures we employ. I’d favour ‘daily unique visitors’ as a metric I can compare across sites that gives me a good idea of how popular they are in terms of a sustained readership, kind of equivalent to print ABC figures.
What does ‘unique audience’ measure and does anyone else use this? If this is a Nielsen only metric doesn’t this make these kind of comparisons impossible outside Nielsen’s own system of measurement? I’d be interested to hear your thoughts.
All the best,
Mark.

Comment by Mark — 25 August, 2009 @ 5:22 pm

Filed under  //   web  

Comments [0]

Dashboards, scorecards and sentiment...

More of my time is being spent preparing reports on what people are talking about on the web. There are a number of companies offering tools that do this kind of thing. The way they work is by identifying keywords in a dataset and pulling out pertinent information around the word(s) such as date of mention, where it occurred, on a webpage fitting what kind of recognised format, etc. This data is then presented in the form of a 'dashboard', i.e. a few charts, possibly with some sort of 'score' attached. I prefer to work with the actual data retrieved by a crawler for particular keywords rather than use an automated summary as I want to be able to check the accuracy of the underlying information. There doesn't seem to be an offering out there that doesn't provide some sort of bell-or-whistle that tracks 'influencers' or 'emerging trends' or promises the dreaded ability to analyse sentiment... however:

Algorithm-based sentiment analysis doesn't work accurately

If it were possible then natural language processing would allow me to have a friendly chat with Google when I wanted something and not have to parse my requests into a few pithy search terms. The reason sentiment analysis is a key part of tracking is that most of us who use these tools would like to believe the promise that they can discover when people are saying good or bad things about the topic we're interested in. Unfortunately this knowledge is not perceived as valuable enough to have a real live human read and assess every mention that has been discovered so inaccurate methods are employed in an attempt to achieve useful results. Conversations on the web are human conversations with all the nuance and multiple meanings afforded by the language used and the context in which the conversation occurs, e.g. correctly identifying sarcasm is at present an impossible challenge for a computer. If you're looking into using one of these tools then ask these questions of the supplier:

  • Can I export the data to CSV, XML, etc.?
  • How do you identify and remove spam?
  • On average what percentage of mentions identified constititute spam?
  • How accurate is your sentiment analysis?
  • Please may I see the human assessed sample of mentions versus machine assessed sentiment that you used to produce that figure?
  • Which academic / research papers would you suggest I read to find out more about the fields of natural langauage and sentiment analysis?

Dashboards and scorecards are only as good as the data that lies behind them so if you can't see the actual data or easily compare 'scores' across multiple keywords and understand what the differences mean you should run a mile. I've been through and am still going through trying to make monitoring work effectively and am currently working on an efficient way of working out sentiment that is not subject to the flaws outlined above.

Filed under  //   recommendations   technology   web  

Comments [0]

Readability

I picked up on this bookmarklet via a post on Matt Haughey's blog. I read a lot of webpages daily and this tool transforms that experience into something almost pleasant. It was created by arc90 and strips out the text of whatever page you are viewing and formats it so it's easier to read. Being able to get rid of all the clutter to get to the content I actually want to see makes me very happy. The image below compares a page from Wired.com against the Readibility version so you can see what I mean:

page_comparison_small

You can tweak the style to suit yourself when you create the bookmarklet. Perfect! Reminds me of the layout suck.com used back in the days of low resolution displays. Also, if you use Safari and sync your bookmarks with your iPhone you can use it on that device too:

readability_iphone
'Readability' in Safari on an iPhone

Go treat yourself right now. Arc90 assures us it works in most modern browsers.

Filed under  //   recommendations   web  

Comments [0]

Fourteen years on...

This month sees the launch of a UK edition of Wired, a second attempt after the version that first appeared in 1995 flamed out. Personally speaking, and as I say here, Wired piqued my interest in the web and the implications that a new communications technology held for society at large.

Wired 1.01
Looking inside Wired 1.01 from 1995

So what's changed?

  • I had no internet access
  • I didn't have an email address
  • I certainly didn't have a laptop... but we did own a shared family computer
  • I didn't have a mobile phone
  • I'd never purchased anything from a website
  • The job I do now didn't exist
  • The company I work for didn't exist
  • Google didn't exist

... I'm sure there's a lot more to add to this list, but just having a quick think makes me realise how much has shifted in those intervening years.

2 original comments:

Using and sharing digital media – photos, videos et al – is another thing you probably weren’t doing much of in 1995.
Comment by Simon Mustoe — 6 April, 2009 @ 11:25 am

I was a subscriber to the UK Wired first time round. Coincidentally doing some spring cleaning over the weekend, I unearthed the whole lot and read Edition 1 last night. It kicked off with the following statement from Marchall McLuhan:
“The medium, or process, of our time – electric technology – is reshaping and restructuring patterns of social interdependence and every aspect of our personal life. It is forcing us to reconsider and re-evaluate practically every thought, every action, and ever institution formerly taken for granted. Everything is changing… you, your family, your education, your neighbourhood, your government, your job, your relation to “the others”. And they’re changing dramatically.”
One thing that struck me was that the idea of free online content was virtually unthinkable – people were preparing themselves for the inevitable subscription models once traffic hit a critical point. What do we have to lose by Douglas Adams is worth reading to give a broader perspective about this issue, from that time.
Another thing we didn’t have back then was content subscription or RSS: I think Pattie Maes got it wrong with her view of software agents being necessary to handle the unthinkable complexity. We just needed free RSS subscription.

Comment by Jason Ryan — 6 April, 2009 @ 1:35 pm

Filed under  //   photos   technology   web  

Comments [0]

Daisy, daisy...

Funny haha, scary terrifying or an attempt to soften us up given I assume Google are working on this stuff already?

"Will CADIE herself at some point connect her own electromagnetic dots in some idiosyncratic manner which turns her into something we are no longer capable of understanding in any sort of productive way, much as that aforementioned toddler, waving at herself in the mirror, leaves primates forever behind in their own tragically limited world? We don't know. Did you really think we possibly could?"
The CADIE Team, 31st March 2009

One day.

LEGO Google logo
Dan and Caroline get building

Filed under  //   photos   technology   web  

Comments [0]

Good for a bump...

One of the challenges I face in my job is the paucity of decent data to support my theories of how people behave out there on the web. I like to be able to challenge my assumptions and this can only be done with the right tools and a decent set of results. Recently I did pick up a good dataset that proved a long held suspicion:

Links from popular sites do not deliver a sustained increase in visitor numbers

I overhear people talking about links 'driving traffic' to client's websites all the time so have often wondered what a link is worth in terms of additional visitors. Given that a principle of my current work is that attracting attention from popular sites is a way to become part of a 'network neighbourhood' I've wanted to put this to the test. I spotted this comment ages ago on a fairly high profile UK political blog that said:

"... we all know that such linkage doesn't do that much for traffic (Guardian, BBC and Telegraph all worth a spike of an extra c.200-300 visitors, if that)."

A link from a high profile domain is good for your natural search rankings as Google likes it when a high authority site links to you. Is it good for your visitor numbers though? Here's another post that highlights what actually happens:

"After sitting dormant for 9 months, suddenly someone found the site. And not just someone but a very popular code blog called Ajaxian. In one day the site’s visitors leapt from 0 to 400. The next day the site was picked up by a reddit user. At the end of the day, we had about 35,000 visitors."

This is an extreme case. reddit is all about aggregating content that people will then go on to visit directly; a post here that is voted up will attract high volumes of visits. The point is that after the bump from these referrers traffic settled back down to a low-level. This is all pretty obvious stuff, if your site is not a regularly updated content destination then people, having found it via a link, are probably not going to come back day after day. What about a fairly standard case where a link comes to you from a post on a highly popular site? Are you going to receive thousands of visitors from a site that has tens of thousands of visitors a day? In a word: no. Below is the bump taken from the Google Analytics of a recent project I worked on that received coverage from several very popular sites.

A referral bump

So, unless you experience slashdotting, not only will a link from a popular site not provide a sustained increase in visits it will also not deliver many additional visitors. Referrals are but a distant echo of the attention the post that linked to you received. You have absolutely no way of telling how many people read that post as although you may have a vague idea of the daily visits to the referring site you cannot know how many people actually viewed that post. Once that post has dropped off the front page and disappeared into the mass of content that forms a popular site so the attention disappears. This could be called the content 'decay rate' of a given site and will vary depending on the rapidity with which new content is added. I think most people dedicate most of their attention to certain familiar sites when they're online. If they read a post on a favourite blog they read it in situ and rarely follow a link out. Check out the site statistics for a few of the very popular Gawker blogs: Gizmodo: average 1.1 pageviews a visit Jalopnik: average 1.6 pageviews a visit io9: average 1.5 pageviews a visit People's attention is extremely limited, even on popular sites. Short of anonymised browsing data becoming available to really figure out what's happening I'd take promises of attracting attention from referrals very lightly.

Update: I've been doing a little more reading and found this post from December 2006 entitled Sharecropping the longtail which makes the following point:

"... web traffic appears to be growing more concentrated in a few sites, not less... what's being concentrated... is not content but the economic value of content. MySpace, Facebook, and many other businesses have realized that they can give away the tools of production but maintain ownership over the resulting products..."

The popularity of social networking sites has meant that increasing numbers of pageviews are concentrated on these domains (though these views are scattered across several million profile pages). I think this bears out what I'm driving at in my post above.

Update: the traffic patterns discussed in this post are indicative of what I'm talking about.

Filed under  //   web  

Comments [0]