Wikipedia Dump Reader


KDE Education

Minimum required   PyKDE/PyQt
Downloads:  3458
Submitted:  Aug 30 2007
Updated:  Aug 16 2009


This simple programs display the text-only wikipedia compressed dumps, currently available at http://download.wikimedia.org/backup-index.html, generally named something like pages-articles.xml.bz2.

It's fairly useable now although lots of rendering issues occurs

Features includes a Qt viewer with basic text markup, following links, ability to read directly on the .bz2 compressed file (altough some index creations step is needed on first run), tab-like list of articles with load-in-the-background by default, a simple but useful keyword search, very light source-code, optional latex rendering.

The code requires PyQt4

Older versions has been tested on Fedora Core 4 and Kubuntu with PyQt4.1 (Python 2.4, Qt 4.2), and Ubuntu Gutsy.

See included README

Note that the development tree is now hosted on launchpad. See https://launchpad.net/wikipediadumpreader/

Any comment is welcome.


Updated to 0.2.10:
- Use a new indexing scheme for the entrylist - articles load faster now
- Upgrade path for old indexing scheme
- Utf8 fixes for non-ascii pathnames
- experimental RPM package - feedback welcome at the project website : https://launchpad.net/wikipediadumpreader

(jul 09: updated the ubuntu package for Jaunty's Python2.6 compatibility)

Updated to 0.2.9:
- make it able to load Wiktionary non-uppercased words
- Ability to load a 64-bits module - Thanks to Michael Heide
- added a small UI layout - Thanks to GreenReaper
- Better corrupted files handling

Updated to 0.2.8:
- Sorry : no program changes, but a much more friendly opening dialog
Built a rough Ubuntu package, to ease installation for unexperienced users running Ubuntu Gutsy or Hardy

Updated to 0.2.7:
- minor rendering fixes
- a few more macros

Updated to 0.2.6:
- better wikisyntax parsing
- minor bugfixes

Updated to 0.2.5:
- Bugfixes and improvement in rendering.
- Moved the development tree to lp
- optional fontsize

Updated to 0.2.4:
- Optional Latex/texvc call to render math. thanks to Mathieu Beliveau

Updated to 0.2.3:
- Fixed an obvious overflow bug in the index creation code.
Rebuilding the index is necessary, sorry. To force it, delete the two *idx files before running the program, and be patient (English dumps index creation takes several dozen minutes)
- basic table and footnotes support

Updated to 0.2.1 : fix a bug when reading articles on blocks boundaries
Updated to 0.2.2 : improved wiki rendering for lists and definitions

(Extractable Program (with source))
Ubuntu(Ubuntu debian package)
other(experimental, alien-converted RPM package)
 nice to see this is

 by REMF on: Jan 3 2008
Score 50%

still in active development. congrats and my thanks.

Reply to this


 where next?

 by REMF on: Feb 16 2008
Score 50%

are you working on any further improvements you can tell us about?


Reply to this


 Re: where next?

 by slyfoot on: May 4 2008
Score 50%

I like this, but can anyone tell me what code I need to add in order to increase the size of the fonts? I'm visually impaired and it's too difficult to see!

Reply to this


 Re: Re: where next?

 by benji2 on: May 18 2008
Score 50%

Hi !
I just uploaded version 0.2.5, which ease fontsize changing. From the README:

Q. Can i change the text size ?
A. Font Size can now be changed, altough you will have to manually modify
the program : Edit the "dumpReader.py" file, go to the line which says
"fontSize = 9" and change "9" to whatever point size fits you best.
This will only change the font size of the text area.

Note that i don't put any "preferences" dialog in the application itself, as i don't feel it's yet needed.

Reply to this


 Re: where next?

 by benji2 on: May 18 2008
Score 50%

Sorry for the delay. As I didn't have much time to work on it, i only did minor updates. I guess i may occasionally hack on it, but not very actively. I moved the (source) code to the launchpad code hosting for people who may be interested.

Reply to this


 Possible BUG

 by applegrew on: Feb 19 2008
Score 50%

I have recently tried to run dumpReader over a dump from en.wiktionary. It gets into infinite loop whenever there is redirect,e.g. whenever I try to open the article Garbage, I get the message (in the console) "Garbage" redirects to "garbage", and this message repeats forever and the application hangs. (Even when I try to open garbage (it starts with small 'g'), even then I get the exact same output and the application hangs again.

Another note: When I start dumpReader.py
I get the following errors in the console.

dumpReader.py:11: RuntimeWarning: Python C API version mismatch for module bz2: This Python has API version 1013, module bz2 has version 1012.
import bz2
Error while loading math parser

I have 2.5.1 running in Kubuntu Gutsy Gibbon.

Reply to this


 Re: Possible BUG

 by benji2 on: May 18 2008
Score 50%

Thanks for the report. I first need to get a fresher english dump to trigger the bug, hope to have time to fix it soon.
Regarding the python error, it's pretty safe to ignore it. If it bothers you, see included README on why it does and how to fix.

Reply to this


 further development?

 by REMF on: Jul 28 2008
Score 50%

Hi there,

is there any further news on what will happen next with this excellent program?

forgive my ignorance, but will it work on KDE4, specifically Opensuse 11.1 using KDE 4.1.2?


Reply to this


 Re: further development?

 by benji2 on: Aug 11 2008
Score 50%

Hi again,
Wikipedia Dump Reader doesn't use any "KDE" features, only PyQt4. Therefore, it should work the same either on KDE 3, 4, or any non-KDE-based environment, as long as PyQt4 is installed.

Regarding future development, i don't have clear plans currently, as it already does what i intended it to do (+ i'm lazy).

Do you think some major feature is missing for a convenient use ? Maybe the suggested cleaning of non-reachable links should be on my todo list...

Reply to this


 Re: Re: further development?

 by REMF on: Aug 14 2008
Score 50%

that would be an awesome start, i will give it some more thought and see what i come up with in addition to the link clean up.

many thanks

Reply to this


 on maemo of Nokia N800

 by sinosure on: Aug 12 2008
Score 50%

Can this reader run under Maemo of Nokia N800.

It seemed that maemo don't have pyqt4 :(

Reply to this


 Delighted to see more developm

 by REMF on: Sep 22 2008
Score 50%

This is the only real linux competitor to the windows based wikitaxi application.

I note you mention something about ubuntu packages, is there any chance you could provide the same convenience for opensuse?

Mant thanks

Reply to this

