User:Improv/PWPD

Intent
Create a subset of articles, with or without images, for use on a particular set of embedded devices, along with software to browse them.

Issues

 * What content should be included?
 * There are constraints on size and professionalism
 * There are other projects that attempt to do the job, but they have a poor idea of what should even be on Wikipedia, much less on a static dump of it.
 * Is WP:V0.5 suitable?
 * Getting unvandalised versions of content is important. If "vetted" versions can be made, that would be even better.
 * This is a continual project - I expect people to update the version on their devices every so often.
 * Featured and Frontpaged articles have completely dropped the ball when it comes to keeping things encyclopedic. They're not at all useful for this project.
 * My first intuition is to start with Portal:List, cutting out portals that are less encyclopedic (like Television, Video Games, Pokémon, Nudity, and Pornography) and using the other portals as pathways to appropriate, well-done content. I need to find a good way to do this. Perl will probably be my friend.
 * I will probably make two versions of the content dumps, one with images, one without
 * I intend to exclude fair-use images
 * How should that content be acquired from Wikipedia?
 * I initially think combining database dumps and wget (for images) will be appropriate
 * I should research better ways to get images. Automagically handling license issues is important
 * What format should content be on the devices?
 * Database? I need to see what (if any) databases are available on the system when the prototype hardware is ready. Postgres would be ideal.
 * I need to write software to browse whatever format I choose. I don't think this will be too hard -- I can probably reuse the Wikiparser from POUND, modifying it slightly to be more compliant with MediaWiki's syntax
 * What formalities need to be observed to keep this legal?
 * Avoiding fair use gets me part of the way there.
 * Do I just need a list of contributors from the history pages, or do I need more?
 * Getting the disclaimer right is important. I don't want to put the company giving me the prototype at legal risk either.

Interested People
Feel free to add yourself - much of what I'm doing may be of use on other devices, and when the device I'm working on it for is released (or if you have prototype hardware too), I'll be glad to have company. I expect/hope that interested parties actually have something useful to contribute to the project - if you can help with any of the above, that would be fantastic.
 * Improv (Obviously)