All posts by Mark

I’m the founder of The Hawaii Project, a new book discovery engine. Previously I was responsible for Product Strategy and Product Management at Telenav, after they acquired goby. Prior to that I was the ceo of Goby, since acquired by Telenav. Before that I did time at Endeca, PTC, Netezza, Evans & Sutherland in a variety of R&D, professional services and business development roles. When I’m not obsessing over work, I’m a proud husband and father of two great kids, love to play tennis, am a compulsive reader and book collector, and am really into way too many different kinds of music. (What’s with the Viking you might ask? While the vikings were known to split a skull or two, I mean more the verb than the noun, as in “to go adventuring” in the sense of the Old Norse fara í víking. I’ve always been interested in the vikings and started using viking2917 as a handle to avoid spammers way back when, and have just kept using it….)

The future of Blog Search

Does Blog Search have a future?

Blogs are one of the richest sources of information for certain classes of information. Yet they are frustratingly hard to find or extract information from, and the state of the art (Google, Technorati) feel 100% stagnant. Here are a couple of example use cases I have that aren’t well served by existing tools:

1. Recruiting. When I recruit for a particular role, I’m looking for thought leaders or people with insight and passion. Usually these people have blogs. If I could see, for example, a list of all the people in the Boston area with blogs that blog about web development, I’d probably find some rock star developers. No easy way to do this today.
2. Travel planning. I’m thinking about a stay in southern Utah at a Bed & Breakfast. Who’s blogged about their trips there that might have some good perspective for me?
3. Music Discovery. Great blogs like Aurgasm, Quietcolor or TheMusicSlut are great ways to find music. But how many others like that are out there?

The current serious choices are pretty much limited to using normal search (Google, Bing etc), or using a Blog search engine like Google Blog Search or Technorati. With Google Blog Search, you get pretty much a toned-down version of Google: a search box with 10 results – you can’t really search for *blogs*, you can only search for *posts*, with the relevance ranking determined by some version of PageRank. There’s no real sense of the authority of a blog (other than that of PageRank), and no real opportunity for discovery – just punch in your keywords and hope for the best.

With Technorati, you do get some increased power. You can search for blogs as entities distinct from an individual post, and blogs do get assigned an authority score. But the experience seems to fail as often as succeed. A search for “boston web developer” blogs on Technorati returns three blogs, all with an authority score of 1 ( the minimum) – pretty sure there are more than 3 of this kind of blog in Boston! And there’s no way to sort the blogs by their authority score, at least that I can see. The Technorati blog directory also seems to be mostly limited to “authoritative” blogs – personal blogs (for example my own) seem to have little or no representation. But on long tail topics (say, music reviews of obscure artists), blogs by “real people” are often the only place to find this kind of commentary. Most importantly, there seems to be little innovation happening in Google Blog Search, Technorati, or more generally – the field is stagnant.

What would the characteristics of a good blog search tool be? What’s lacking in today’s approaches?
1. Comprehensiveness. There’s hundreds of millions of blogs (Billions?) – yet Technorati doesn’t seem to find many of them. (Google is more comprehensive, but limited by the “search box + 10 results” interface).
2. Ranking of blogs relative to search query and/or authority of author. (Of course this ranking problem is non-trivial. There are some interesting ideas on authority for twitter accounts which could perhaps be leveraged, e.g. http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/)
3. A faceted, searchable directory of blogs supporting discovery. Categorization technology has come a long way. It ought to be possible to categorize every blog against a reasonably detained taxonomy or facet set (say, the Open Directory categories, or something better), with 80% accuracy, across some common facets: topics, locations, age of blog, date of last post, and so forth. Even at 80% accuracy, this asset would be quite effective. And with a little UGC thrown in, the 20% that matter, and are wrong, will quickly get corrected. Using a microformat like hCard, blog authors could document their blog’s metadata quickly and accurately. Blogs also typically have some consistent thematic elements, such as an “About” page or a Blogroll list, that could be mined for interesting metadata. This kind of experience would power a new way to discover fresh and interesting blogs & content.
4. Recommend other, similar blogs. Powered by the facets above, or by a blogroll analysis, or something similar, a recommendation feature for similar blogs could be implemented, based on information readily available in an “almost standard” format.

Verticalized Blog Search Engines might also provide some task-centric capabilities. As I’ve written before, the future of search is about providing task-centric search capabilities. In music for example, The Hype Machine has some very interesting behaviors it can support, simply by virtue of being focused on music.

The obvious question: what business or investment model would support this kind of vertical search engine? In the Goby world of travel and entertainment, there’s a long history of various ways to monetize that kind of content. In the “pure content” world of blog search, it’s less clear – a pure page-view based CPM ad model isn’t likely to work. If the New York Times can’t make that kind of model work, a startup probably can’t either. Perhaps some form of interest-based, downstream ad retargeting approach might get enough leverage that it could get to critical mass. Alternatively in some domains a “freemium” model might work, where additional tools (say for recruiters or brand managers looking for a competitive edge). Given the scale of the problem, it’s not clear a bootstrapped company could take this on – the infrastructure requirements (bandwidth for crawling, servers, etc) probably require a non-trivial level of investment.

What blog search tool do you use? Do you use a blog search tool? or just Google? Is anyone innovating in the area?

An African music sampler

Have been on a binge recently of listening to contemporary African music….as mentioned in my post on seeing Angélique Kidjo in concert, African music is really fun because it’s completely accessible, and yet constantly challenges your ear because it moves in unexpected directions. Spurred on by a few recent conversations I thought I would use this post to share some of my recent favorites, plus try out the Yahoo Media Player javascript tool, which works really well.

A number of the tracks are from the Festival in the Desert (http://www.festival-au-desert.org/), which is on my list of trips to take before I die – it’s a festival held out in the desert in Mali. Mostly the performers are African musicians but Robert Plant did a cameo there once (track below). I also included a track from Toubab Krewe, who are, strictly speaking, from North Carolina! – but spent a significant amount of time in Mali. Enjoy!

I Ka Barra by Habib Koité & Bamada
Aldachan Manin by Tinariwen
Fafa by Vieux Farka Touré
Soweto Daal by Wasis Diop
Teme by Alpha Yaya Diallo
Politique by Adama Yalomba
Win My Train Fare Home (If I Ever Get Lucky) – by Robert Plant & Justin Adams
Devil Woman by Toubab Krewe

Some thoughts on the convergence of Search, Travel, Local & Social

There’s a convergence coming, between the worlds of search, travel, local, and social. It used to be that if you were traveling, you used a guidebook and map and talked to the concierge, then you graduated to TripAdvisor and Expedia (and if you were adventurous, Kayak). People’s use of search engines tended not to intersect with their travel planning. In recent years of course Google has become a de facto part of the travel planning experience – although by no means a perfect one. And some search engines have introduced travel products (notably Bing Travel). And for planning your weekend, search engines have historically not been of much use at all – they don’t understand the concept of time or location very well (“this weekend” is just a few keywords to them), and don’t understand your task (when I search for beaches on Cape Cod, why do I get back results for restaurants with the word “beach” in them?). Robert Scoble has some thoughts on this subject, here. Google appears to be moving in this direction, with their rumored acquisition of ITA, which powers many airfare metasearch sites including Kayak. Their abortive attempt to acquire Yelp shows how search & local are converging as well.

But there’s a new game in town – social/local gaming, in particular with things like Foursquare and Gowalla, that combine social gaming with local-search-like results, allowing people to broadcast where they are and what they’re doing. There’s an evolving “stack” of technologies, including location databases and engagement tools, nicely summarized by Chris Dixon. (I disagree with his assertion that location databases will become commoditized – the information is too hard to come by, and companies like InfoUSA make hundreds of millions in revenue providing this kind of data. Not to mention the startups like SimpleGeo and Locationary and for that matter Goby, that are tackling the problem, but I digress).

This kind of engagement is going to have a profound impact on how people plan travel and figure out their weekends. DeepDish Creative (http://deepdishcreative.com/wordpress/2010/02/foursquare-for-tourism/) is talking about how destination marketing organizations can leverage these tools to promote their destination. But I see two problems with this generation of tools as they apply to this problem:

  1. They are after-the-fact. I tend to engage with Foursquare after I’m already AT someplace – Foursquare isn’t really involved in my decision process, it simply records what I’ve already decided. As a result, it has limited use (not no use, just limited use) in making decisions.
  2. These tools only recognize a limited set of entities, primarily businesses (in fact, primarily restaurants). It’s hard to check in at a U2 concert, because it’s an event, and it’s hard to check in at the Grand Canyon, because it’s not really an entity, it’s a generalized (and off-the-beaten-track) place. God help you if you want to check-in on a hiking trail!

Addressing those last two elements would create a resource that will not only appeal to my vanity & let me broadcast what I’m doing, but more importantly help me decide.

The key need here is a semantically meaningful database of things, to key all your features off of, and search tool to find & organize them – not just a pile of URLs. The system needs to know that Yo La Tengo is a band playing at the Fillmore on the 23rd of April, with a date and a location – not just a pile of keywords without any meaning. Any system like this needs to cover hotels and restaurants as well as non-business entities like hiking trails or concerts, and once you leave hotels/restaurants, this information is hard to come by. Once you have the database of entities, it is straightforward to build a platform for people to engage with their networks, in the context of that content. Once you have a strongly categorized, rich database of things to do, and a strong network of people telling you what they are interested in, you can provide compelling recommendations as well as support discovery. And, strangely enough 8), that’s where we’re headed with Goby – we plan to be right at the intersection of this convergence.

Apple buys Siri – quick thoughts on how Google is being flanked….

Following rapidly on my recent blog post about what Apple’s search strategy might be (https://www.viking2917.com/is-apple-building-a-search-engine-should-they/), they’ve bought Siri. A full frontal assault on Google is futile, but Facebook and Apple are both finding ways to attack Google in a flanking manner. Apple is doing it through entertainment, apps, rich advertising (Quattro acquisition), and task-centric, information-rich applications (Siri). Facebook is plastering “Like” buttons all over the internet, thus gaining very valuable information – “Like” is essentially a human-driven relevance signal, one that is much more direct and personalized than the algorithmic PageRank of Google, which relies on computing links and link values. Facebook’s “like” button gives them a simple popularity metric – how long before they use that to build a new kind of search & information access tool? Certainly such a metric is gamable, but at scale, it may not matter….

TV

Recorded today in NY for Shelly Palmer’s Digital Life, an NBC show in NY focused on consumer tech. My first time doing TV. Makeup & the whole bit. First observation: These people work early! The makeup person told me she started work around 3:30am. ick. Good news for her: home by 10:30am.

Recording was fun. 3 minutes (my segment) goes by SO fast when the lights go on. You’ve got to have your message down to so few words to get it in there crisply. Watching Shelly was fun – so matter of factly creates so many facial expressions, rarely makes any mistakes. When he looks at you on camera, it’s with such focus that I had a hard time not feeling like a deer in the headlights…

They had these cool robotic cameras (for some reason my picture didn’t come out), but one of the other guests was telling me that those cameras replace 3 or 4 cameramen, driving costs way down. They seem very similar to the robots that are in many factories and warehouses these days.

Came down the night before and caught Los Campesinos, a welsh punk band at the Fillmore. Those guys rock. Was going to mention it on the show, but NBC doesn’t really seem like a “welsh punk” kind of show 8)