Just recently I’ve had a similar conversation with three different people – people planning on setting up some kind of online business. The interesting thing about online vs offline business is that with an online business, success can be more of a problem than failure. If you fail, you simply slip away, unnoticed. If you succeed, you bring the server down (and smile while you’re doing it).
One of the problems with online is scaling doesn’t work particularly well. It’s hard to find stuff – good stuff, that is. We have more search engines than we know what to do with, but as the web expands away they increasingly become of diminishing value. Google, once the darling of the geek world, has drifted into the shoals, sandbars, and shallows; overloaded with bloat.
What went wrong? Nothing exceptional – Technorati has gone the same way – nowadays they let anyone use the web. Anyone with any kind of access can set up any kind of blog and/or web site and go for it. And very often people feel sufficiently fired up about something that they burst into print for – ooh sometimes at least two postings. They link up to a couple of other blogs. Add a little frisson with a site meter and adsense, and before you know it, the web is a slurry of shallowness. And then add some splogs (blogs set up purely for spam purposes) and before you know it the search engine spiders are running around in ever dimishing circles.
It is extremely hard to find (online) good writing on any subject. This is for two reasons. First, good writing is hard to find. It’s a rare gift. Second, there is so much slurry, the odd good writing is buried deep within the ooze of the rest.
The challenge then is to develop smarter search engines – and if I knew how to do that I’d be away making billions rather than writing here. But in the same way that I don’t know much about art (but I do know what I like), I figure this has been addressed before. The issue isn’t one of storage, the issue is of retrieval. As an example, if the issue was simply storing books, libraries would simply become giant freezers – we’d wheel in the books, dry them carefully, freeze them as low as possible; and as long as they stayed frozen they’d last a long time. Of course this would make finding and using them difficult. The web mimics this – storage has become ever cheaper, but finding quality information has become ever harder.
The problem is that automated processes have not yet achieved the artificial intelligence to be able to interpret value.
Google has attempted to emulate value based on the more links a site has to it the higher value it must be, and therefore it should be higher up in the listings. On the face of it that seems a good idea, but it would be possible to take a ‘beowulf’ approach – create a master site, and then create several thousand (million?) notional sites all with links to the master site. Which is what the splogs are all about. Of course, Google doesn’t like that kind of behaviour and endeavours to suppress such things.
In 2005, I realised that people were using my aquaculture blog as a focused search engine. I act as an editor – I find good content online – often material that is hidden simply because of its limited interest circle, and then take the link, review it, and put it into a searchable database. Over time the resource has increased in value, primarily because of the rules I apply to the inclusion of material in the resource – i.e. no commercial sites, no shallow content, sound research based material, and freely available; coupled with filing by categories and the search ability being limited to within the database – not the entire web. Each month the number of visits to the site increases, but, interestingly (pleasingly) the number of links to the site does not. I expect that people interested in aquaculture twice might bookmark the site. The reason why twice is that if it was just a casual research or a kid using the site for a school project they wouldn’t visit again. And for someone interested in the specific topic they might be better to link directly to the site or page I’ve pointed out to them. I have mentioned on the site that if people have a research paper published they can contact me for a review and inclusion.
A focused search engine is a repository formed from the old catalogue or directory static web page approach to building value, with the more accessible and useful tools we have today – rss, database driven searches, update from anywhere, time and date stamped, and the ability to add meta search terms – categories or tags.
The weaknesses in this system is I get distracted by my job, life, or most anything. I only search and write in english. I might (hopefully not) have an agenda and only put forward some filtered material – that’s the spin doctors’ job of course. I could be not overly discerning about the content I include. The solution to all of this is to have more than one person adding to the repository.
I’ve spent some time this year (2006) building swicki – search wikis – see the side bar for links. I believe these are a great supplement to the idea of a focussed search engine. What a swicki allows is for a search focus to be built, and then seeded with key words. Users either use the existing word swarm, or search using their own key words. When the administrator (me, in this instance) notes a new word I think is useful I can add it to the key word swarm. In my case, if a word is used more than a couple of times then I’m inclined to add it to the swarm. This approach keeps the search engine current, but never entirely loses the less frequently used words – a kind of longest tail of search terms. It also diminishes the issue of the limits of my knowledge – I work on the premise that all of us are smarter than one of us – and if we work together then the swicki – the tool – becomes increasingly valuable. The swicki system is not without its bugs and weaknesses, but this is more about the implementation than the idea in itself.
The idea does appear to be catching on with some of the bigger fish. Wikipedia founder, Jimmy Wales, is in the process of wrapping wikipedia technology around search engines, according to Information Week.
The search engine, code-named Wikiasari, would combine open source technology and human intervention to deliver more relevant results than the algorithm-based systems used today, Wales said Tuesday. “Human intelligence is still the best thing we have, so let’s let humans do what they do best, and computers do what they do best.” Wikiasari combines the Hawaiian word for quick, “wiki”, with the Japanese word “asari”, which means “rummaging search”.
Yo! Jimmy!! Over here! I’ve been thinking and quietly working on this for years. There’s a zillion to one chance Jimmy will ever see this writing, of course, because the search engines are so encrusted… oh wait, that’s where this writing started. All I’m trying to find is a couple of gardening blogs, well written, frequently updated, nicely photographed etc. I tried to get some sense out of Google, tried some blogrolls – but in the end, while I’m more than prepared to fight for your right to have another kitten blog, it’s not what I want right now, and I just haven’t found one that delivers the goods yet. I think ultimately I’ll just write my own…