It seems Ask Jeeves may release their desktop search application as open source. On meeting with Mozilla.org folks, Ask says:
Ask Jeeves Blog: Mozilla’s On Fire: “We discussed Ask Jeeves desktop search and the notion of open-sourcing it. We’re open at two levels. Contributing just the core desktop indexing technology or possibly the entire desktop search application. They discussed how/what they would evaluate before accepting a major piece of code/product contribution: code size, internationalization, etc. Whether or not we partner with Mozilla on this effort, Chris and team thought it was a good idea for us to pursue overall.”
Lot’s of folks think good open-source desktop search can already be easily implemented with tools like Lucene. But desktop search has exacting requirements.
- Download size should be small, which rules out Java and C#, since you can’t afford to require a large runtime environment.
- Lots of document formats must be supported. Yet many document format conversion tools are quite large, too large for inclusion. So, a good desktop search application might need to implement its own format converters, no small burden.
- The performance requirements of desktop search are not too demanding, since the number of documents is unlikely to exceed a few million, but indexing must be unobtrusive. It needs to run in the background when the user is idle. Ideally it shouldn’t greatly disrupt the virtual memory working-set, or else, when the user returns the system will be sluggish. This probably requires platform-dependent code.
In the end, the core search and indexing code (like Lucene provides) is only a small part of the application, and Java, while cross-platform, requires a runtime that’s too big for convenient download, and doesn’t give easy access to platform-specific scheduling features.
The Beagle folks have defied these odds, albeit for a not-yet-mainstream platform.
There’s still hope for mass-market Lucene-based desktop search: GCJ is cross-platform, makes it easy to invoke platform-specifics, and may soon have a tiny runtime. A C++ port exists and a C port of Lucene is underway. Machines and networks keep getting faster; scheduling and download-size issues will diminish. In the meantime, perhaps Ask Jeeves will fill this gap.
February 14, 2005 at 9:48 pm
My comment was that GCJ could do this. I’ve been trying to get some time to work on a native app written in GCJ but just haven’t had the time.
I have a trivial implementation of a desktop search based on Lucene but haven’t had time to release it! I’m such a bad OSS developer!
http://www.peerfear.org/rss/permalink/2004/10/28/LotsOfInterestInLuceneDesktop/
April 20, 2005 at 3:51 am
I prefer the small download size in the long run because I believe the browser requirements will disappear pretty soon.
Mike
October 9, 2005 at 1:09 am
where can i download the source of desktop Lucene myemail ddong0524@yahoo.com.cn
February 24, 2006 at 2:04 pm
One of the considerations has always been to keep Beagle portable to let it move to other platforms.
The problem is that there is little demand for such engine today on Windows or MacOS X, considering that the space is fairly well served today and is likely going to improve. Anyways Firefox is a perfect example that I might be wrong.
Now, regarding large runtime downloads: Mono can be cut in pieces, this is routinely done by folks distributing Mono-based applications on MacOS X: they only ship the libraries that they need, which usually amounts to four to six megabytes uncompressed.
There are two other bits of good news: we have been working on a “linker” for .NET libraries which will help people in shipping only the bits they actually need: today the granularity is at the library level, in the future we will make this happen at the function level.
The last good news is that Mono provides a mechanism to bundle the runtime, the libraries and the application into a single binary if they want to.
Anyways, am big fan of all your work.
February 27, 2006 at 4:29 am
We do have few open source desktop search applications which I find are on their way to become stable and provide robust search features. Though nascent we should be soon seeing some action here. I have found two of them and mentioned my experience with them here.
April 14, 2006 at 5:05 am
I have to admit that I’m not up to date on the desktop search scene. For example I don’t know why do we call it “desktop search” when it can be run on laptops and doesn’t connect to a computer’s wallpaper. And I don’t know what all the OSS options are. But anyway…
I recently downloaded Windows Desktop Search that comes as an add-in for the MSN toolbar, and my complaints are two-fold. 1) it doesn’t let me limit my search to particular directories. 2) It doesn’t read enough meta-data in XML/HTML file formats. It is however a vast improvement over the Altavista desktop search I downloaded about six years ago- that thing wouldn’t find the documents I was looking for and at the same time gave my oodles of false positives.
I don’t know what Google’s search is like because I haven’t tried it. But I hope the open source technologies can provide good competition for the proprietary offerings.
BTW, just my two cents: a small download size would be advantageous but shouldn’t be a requirment in desktop search. Like the other guy said, download time really doesn’t matter much.
September 25, 2006 at 10:56 pm
Hello,
Can I know any information regarding security features proovided by Lucene desktop search.
Apart from that can I implement any other security feature in it like EFS to make it more powerful.
Waiting for early response.
Yours Truly,
Vivek Singh.–>
September 3, 2007 at 6:37 am
I’ve found those two on SourceForge.net:
http://sourceforge.net/projects/docfetcher/
http://sourceforge.net/projects/docsearcher/
The first one is quite easy to use, but only runs on Windows at the moment (though the project site states it is platform-independent) and it doesn’t support Microsoft Word documents (not yet?), while the second one does, but its GUI really needs some cleanup.
October 2, 2007 at 1:51 am
Desktop Search, these i’ve found:
Red-Piranha Search and Knowledge - Community Edition - Java J2EE Tomcat Lucene Xml Rdf
http://red-piranha.sourceforge.net/
SourceForge.net: Lucene desktop index
http://sourceforge.net/projects/lucenedesktop/
SourceForge.net: DocSearcher
http://sourceforge.net/projects/docsearcher/
SourceForge.net: DocFetcher
http://sourceforge.net/projects/docfetcher/
Main Page - Beagle
http://beagle-project.org/Main_Page
personally i’m using http://sourceforge.net/projects/lucenedesktop/
mostlyu because i’ve found it early and like its spartan interface
(umm, i assume that regexp file inclusion/exlusion masks can appeal only to programmers … thats plus for me, not neccesairly for average joe :)
September 4, 2008 at 2:10 pm
Познавательно написано. А это все на основе личного опыта?Позвольте поинтересоваться :)