More of my time is being spent preparing reports on what people are talking about on the web. There are a number of companies offering tools that do this kind of thing. The way they work is by identifying keywords in a dataset and pulling out pertinent information around the word(s) such as date of mention, where it occurred, on a webpage fitting what kind of recognised format, etc. This data is then presented in the form of a 'dashboard', i.e. a few charts, possibly with some sort of 'score' attached. I prefer to work with the actual data retrieved by a crawler for particular keywords rather than use an automated summary as I want to be able to check the accuracy of the underlying information. There doesn't seem to be an offering out there that doesn't provide some sort of bell-or-whistle that tracks 'influencers' or 'emerging trends' or promises the dreaded ability to analyse sentiment... however:
Algorithm-based sentiment analysis doesn't work accurately
If it were possible then natural language processing would allow me to have a friendly chat with Google when I wanted something and not have to parse my requests into a few pithy search terms. The reason sentiment analysis is a key part of tracking is that most of us who use these tools would like to believe the promise that they can discover when people are saying good or bad things about the topic we're interested in. Unfortunately this knowledge is not perceived as valuable enough to have a real live human read and assess every mention that has been discovered so inaccurate methods are employed in an attempt to achieve useful results. Conversations on the web are human conversations with all the nuance and multiple meanings afforded by the language used and the context in which the conversation occurs, e.g. correctly identifying sarcasm is at present an impossible challenge for a computer. If you're looking into using one of these tools then ask these questions of the supplier:
- Can I export the data to CSV, XML, etc.?
- How do you identify and remove spam?
- On average what percentage of mentions identified constititute spam?
- How accurate is your sentiment analysis?
- Please may I see the human assessed sample of mentions versus machine assessed sentiment that you used to produce that figure?
- Which academic / research papers would you suggest I read to find out more about the fields of natural langauage and sentiment analysis?
Dashboards and scorecards are only as good as the data that lies behind them so if you can't see the actual data or easily compare 'scores' across multiple keywords and understand what the differences mean you should run a mile. I've been through and am still going through trying to make monitoring work effectively and am currently working on an efficient way of working out sentiment that is not subject to the flaws outlined above.