|Submitted: Dec 8 2004|
Updated: Dec 5 2005
KrawlSite is a web crawler/spider/ offline browser/download manager application. It is a KPart component with its own shell, so it can be run independently in its shell as well as it can be embedded into KPart aware applications like Konqueror.
To integrate with Konqueror, open the file associations page in the configuration dialog, select text/html mime type and in the embedded viewers list choose KrawlSite_Part. Now when you right click on a web-page in Konqueror, in the preview in menu, you'll see KrawlSite. Selecting it embeds the component into Konqueror as in the second screen shot. The first screen shot shows the shell in which the component runs. The third component is the configuration dialog.
If you like it please rate it as good :)
Feel free to send in your bug reports and comments. I'll look into them when I have some spare time.
Also, I am lousy at creating icons, so if someone out there likes this applications(a lot), please make an icon for this app. I'll include your name in the credits. :)
To use this app to download tutorials, set offline mode on, start crawling from the start of the tutorial. If the start page of the tutorial is the TOC, set crawl depth to 1 or if the start page has the TOC along with the first chapter, set crawl depth to 0. If only next & previous links are present per chapter page, set crawl depth to number of chapters.
I'd like to put in all this information in the handbook, but due to lack of time, not been able to do so. If someone understands the functionality and is willing to write the handbook, pls contact me.
If someone develops an rpm for this, pls contact me, so that I can link your rpm from this page. Many thanks!
*crash free(afaik!), esp after kde 3.4 came around.
*support for html frames
patch to v 0.6
* removes a bug that crashes app.
* removes bug in multiple job mode
This one took a long time to come out, but it removes almost all of the bugs that caused the app to crash intermittently, apparently without any reason! There's one KNOWN BUG:
* If icon thumbnail previews are generated real time as files are created/deleted the app crashes. This has something to do with the internal implementation of the file browser(a KDE component), so to remove this bug, I'll have to write my own component( lot of work ), or i am doing something wrong with it ( will look into it). Thumbnail previews is disabled by default(but can be enabled by the context menu)
*) almost crash proof :) (see above)
*) new file browser, much cleaner to use.
*) more work on the leech mode, so its easier to use as a download manager.
If you use this app, with some regularity, i strongly suggest that you upgrade from 0.5.1, not because of any major new features but a much easier and crash-less experience. :)
Last of all, thanks for bearing with the crashes. I know it must have been exasperating.
* corrected a bug in leech mode
Some more features:
* leech mode finally functional. In Leech mode, the app simply parses through the html file and presents the links and images as checkable items. Select the files to download and save it to disk. handy when you need to download 20-30 links(files) from a list of 50-60-100 (rather than right-click and save link 30 times).
* Multiple job support with drop target window. click on drop target window, and drop urls on it. then you can configure each url to have different crawl settings, that is you can crawl the first url to depth 1 in offline mode, while 2nd url to depth 2 in simple mode, and so on. By default each url takes the current main settings.
* notification window. notifies when all job(s) have completed.
* user can jump to next link(in case current link is unresponsive), to next dropped url, pause and restart crawling.
* UI improvements(hopefully!) :-)
* corrected a bug in downloading external links.
0.4 is a huge jump from 0.3. Almost everything has been spruced up, and some new features added, though Leech mode is still unimplemented.
* total rework on offline mode browsing. now links are correctly cross-linked.
* handles dynamic content correctly.
* tar file support fully functional. turned out tougher to implement than i thought initially, thanks to the tar:/ protocol. the archive tool in konqueror is really simplistic and doesnt do the job right. My version does. :-)
* regular expression parsing to correctly parse html pages.can parse through almost 12000 links(in one page) in no time. :-)
* a proper file manager with drag-support.
* spruced up URL list view.
* quick set options available on the page
* UI improvements.
* offline browser mode added. crawl through a site with this setting on, and the app modifies the links in the parsed files to point to local files if they exist on local disk.
* improved error reporting. errors encountered are reported in a separate window in real time.
* file types can be excluded(dont dowload these file types) or exclusive(only download these file types besides text/html)
* UI improvements in main window & config dialog.
* web archive support - not working completely. more complicated than i thought initially. right now, only creates a compressed tarball.
* leech mode - not implemented as yet.
* more code cleanup.
* major code cleanup.
* ugly qt event loop hack replaced with elegant threaded model
* ugly crashes due to ugly qt event loop hack removed.
* minor UI improvements