User talk:Clutch/mod wiki

If the rewrite is going to be this fundamental, Pike 7.4 + Caudium 1.3 ought to be seriously considered as a development environment. I've been looking into that particular combination lately, because I'm trying to decided on the development platform for a big project I'm startin' work on. Anyways, I'm being consistently blown away by its strengths and capabilities--I can't hardly believe Pike has gotten so minute attention over the years. A few years back, Perl was the buzzword; today, Python seems the buzzword; I wonder if the future buzzword will be Pike. It's definately something that should be seriously looked into as an alternative, at the very least. An additional benefit, both Pike and Caudium are GPL. So it's arguable that they're more "free software" than Apache is. Regardless of "freeness", Caudium actually appears to be faster and better than Apache in some regards (especially in relation to writing modules and extensions). --Jizzbug

Some things worth considering:


 * Revision history - Wikipedia stores complete copies of every revision. A more efficient format can certainly be found.
 * Case sensitivity - Having article titles case sensitive is not very helpful.
 * Disambiguation - multiple pages with the same name. Wikipedia currently solves this manually, perhaps it should be somehow supported by the wiki
 * Redirects - avoid endless redirect loops (A: redirect B - B: redirect A). Use existing syntax? It's a bit ugly, but people are used to it.

I should go into more detail on what I have in mind for doing redirects. Will do once I wake up :) But having only a single level of redirection prevents any loops; also doing it that way lets us use a constraint in the database itself (look ma, no code!) to ensure that no loops happen. --Eloquence


 * Case insensitivity: I agree with you.


 * Revision history: I'm not sure that a more space-efficient format is that important right now; it is time-efficiency I am worried about. We could definately store older articles in gzip format though, and uncompress them when we want to look at them; Postgres handles binary data just fine. It will take study and experimentation; I'm interesting in hearing your thoughts on the matter.


 * Disambiguation: I'm not sure what you mean by that. Are you talking about articles in different namespaces having the same names as each other? --Clutch 17:31 Nov 27, 2002 (UTC)


 * I think a better revisioning system is /very/ important (especially with concern for the future of Wikipedia). And if there's going to be work on such a fundamental redesign, it's silly not to implement a better revisioning system as it would be so terribly easy.  I mean, you could even just employ existing tools and/or systems.  Keep only fulltext of most recent versions and diffs of previous versions, or use cvs directly (cvs is smart enough to resolve edit conflicts), etc. --Jizzbug

Just did an experiment; even articles as small as 600 bytes can save 1/3 of their size with gzip; I was surprised to find that at these small sizes bzip2 actually performs worse than gzip. Gzip it will be then :) --Clutch 17:34 Nov 27, 2002 (UTC)

bzip2 works by exploiting long-range coherence: short articles don't really have enough long-range structure, and a simpler model will do better -- Anon.

Turns out Postgres compresses text fields anyway, so we don't need to do it explicitly; it's all transparent. --Clutch


 * Yes, but I would presume this only regards individual cells -- what about articles that have been edited 20 or 200 times -- do we want to store 20 or 200 versions, even if individually compresed? Perhaps older versions should be stored in a single row, with some separator between the versions, to make the best use of Postgres' compression. --Eloquence

Check out http://www.wakkawiki.com/WakkaWiki, it has a very simple and easy to use access control list based permission scheme. You may want to limit the ability of users to change permissions to a) certain users, b) certain namespaces (ha, there they are again, the namespaces you detest so much!). --Eloquence

I changed my mind. Once someone explained that namespaces are used to render different types of pages differently, I saw how useful they were. In fact, we should use more of them for things like TeX support, and integrating binary files into the same tables as the regular articles. --Clutch


 * See my reply on the list -- and would you please stop referring to yourself in the third person there? That's annoying. ---Eloquence

Ok, I looked at the WakkaWiki page, and their ACL scheme is the same as the first one I came up with. I don't like it however. It involves client side parsing, instead of allowing a single SQL query to determine whether the user can read, write, or delete the page. --Clutch

Client-side parsing? Their ACL table is not properly normalized (stores list as text), but if it was, it would be fairly trivial to query access rights by a specific user. --Eloquence

That's right. But I think it would be more Wiki-ish to have a concept of user groups, and group permissions. But there arises the problem of hierarchy and privilege; how do we keep this as flat as possible while still allowing working groups to prepare their documents in private before unleashing them to the public? --Clutch

How about group templates?

MEMBERS                 GROUPS

GID UID                 GID  UID   NAME 1  323                  1    323   infoAnarchists 1  521                  2    444   Unification Church 2  444 2   666 etc. Any UID can create a group template for themself, and any UID that has the right to create ACL can add a group template to an ACL instead of an individual user. So I could go to page Foo and add the group infoAnarchists to those who can read and exclude everyone else. --Eloquence

The whole point of Wiki's, there magic as it were, is that noone "owns" a particular page. Even group permissions don't sit right with me, but I think we could make do with groups; if only one individual should have permissions on a page, then the page can have a default policy of no access to everyone, and then make a custom group and put that user in it, and give that page permissions for that group. I'm still thinking this over. People need to be able to do things for themselves, but accountable, transparent sysop/root class is needed too. --Clutch

I tend to agree with you, pages shouldn't be owned except in certain cases. But many sites that have an editorial staff will want to have sections that only said staff can edit.

The solution seems to be to have an optional owner attribute for every page. Any owner could add/remove individuals/groups to and from the access control lists of an owned page. It would also be possibly desirable to set the default ACLs for newly created owned pages. But not everyone could become an owner. There should be separate user rights to


 * change the ACLs / ownership of any existing page (admin rights)
 * create owned pages, with ownership transferable.

So a typical site could have 2-3 sysops with general chown/chmod rights, 6-8 staff members with the right to create new pages owned by them (being able to add individuals and groups to the access list) and hundreds of users being able to edit only the open pages.

It may make sense to allow multiple owners, so that several people can administrate the access control lists.--Eloquence

Maybe we can figure out a permissions scheme more easily if we have a complete list of what operations one can do on a Wiki; each operation will need it's own permissions policy, and potentially, a way of configuring that policy. Here is my initial list; please add to it, or discuss your thoughts on any particular item; who should be able to do it, when and why. --Clutch


 * read the page
 * edit the page
 * create a page
 * delete a page
 * rename a page
 * redirect a page
 * see that a page has been edited on recent changes
 * find the page when doing a search
 * block an ip (with expiry date)
 * block a user account (with expiry date)
 * create a user account
 * delete a user account
 * create a group
 * delete a group
 * add a user to a group
 * remove a user from a group
 * alter a pages policy with regard to a particular group
 * alter a pages policy with regard to logged in users
 * alter a pages policy with regard to anonymous IPs

Well, I must say the table design is looking pretty nice! Much cleaner naming than our current code, too. :)

Another thing to think about is whether to store data from multiple language sections and meta in a single database. The primary advantages to this are better tracking & integration of interlanguage linking (for a page that doesn't exist but is lang-linked to, we can still show the links back to the other languages; likewise with links to/from meta-pages etc) and integration of the user account space, so a single logon (name/password) can be used on all languages without dealing with a dozen separate account creations, password changes, and cookie madness. Less important, but still potentially fun and handy, is showing optionally combined lists for eg Recentchanges, and a more integrated multi-language search tool than the current hacked-up one.

This could be implemented by adding another field to the titles table, similar the namespace field... alternately, namespaces could be grouped; ie, the namespace id/name list might include 0->"en:", 1->"en:Talk", ... 8->"eo:", "9->"eo:Diskuto" etc, and the parsing would internally tack on the current section's language code if there's not one specified.

Postgresql supports UTF-8, I believe, so keeping everything in a single consistent character set is quite doable. For browser compatibility we can run a simple convert to/from Latin1 where necessary.

The 'wikipedias' table or similar would still serve well for InterWiki-like links that go to other sites; other wikis, but also other reference sites. --Brion 10:25 Dec 5, 2002 (UTC)


 * Might it alienate other language groups who have gotten used to their autonomy? I have my eye on scalability issues, and worry that adding in all languages to the one database could slow things down.  I also worry that maintaining the code so we could have some languages be autonomous, and others be in the same database together, could get very crufty.  And organizing a grand migration of every language to the main database could involve some serious re-processing and delicate tool-making to make sure all articles get properly renumbered, and all links get transferred intact.


 * So, I'm not saying I'm opposed, but I am wondering if we are going to roll everything into one database, or keep it somewhat distributed as it is right now. --Clutch 10:55 Dec 5, 2002 (UTC)


 * As far as scalability, the English wiki is larger than all the others put together; practically speaking, this will probably continue to be true for some time. Since the smaller wikis are limited by the much greater pressure of the English wiki on CPU time, database connections, and apache connections, I see little difference in putting them all into one package. If we can get acceptable performance out of the English wiki and a number of smaller ones as separate databases, we should be able to do about the same with them all together.


 * Autonomy is a tricky thing to measure (Autonomy is just another word for being ignored. ;). Requests for common login have come principally from the French wiki. Sysophood is currently per-wiki, but it could be made per-section easily enough; the developers who actually maintain the server and software work on all of them, so there's practically speaking no autonomy there except at the personal level -- if you want more control over the software beyond the response you get from the existing developers, you take it by getting involved and joining the development team. (That's how I got here -- a year ago I just wanted to add support for Esperanto's special letters; now I spend all my free time keeping the damn server runing. ;P) There are default options which can be set differently per-language, and that could again continue to be done with a central database just by loading different defaults for non-logged in users and new accounts depending on which section you're viewing in. Renumbering existing entries would take a new conversion script, but it's certainly doable.


 * Practical improvements in autonomy over what little we have now might include sysop promotion by other sysops, or even automatic promotion of some sort, so people don't have to bug a foreign developer to manage their own language sections; a live-editable interface translations file (by sysops only; and obviously messages and code need to be separated for this, can't have people breaking the whole wiki with a misplaced comma, or inserting malicious code).


 * I'd looooove to hear some actual feedback on Thoughts on language integration, which has been largely ignored for the couple months its been up. The only requests I've ever heard for a separate server were from some of the Enciclopedia Libre folks, who probably would have been served just as well by having a developer in the main-server loop. --Brion 11:23 Dec 5, 2002 (UTC)


 * Ok Brion, I'm convinced. Give me a few days to think over how this will affect table structure; could we make the languages and meta be namespaces in their own rights, and have them all on one server?  That would make the design simpler I think. Then the User, User_talk, and Image namespaces could all be shared. --Clutch 11:30 Dec 5, 2002 (UTC)


 * That last I've been undecided on how best to deal with... Wikipedians who are actually active on multiple languages will probably want separate user pages in each language, as they have now; and of course the name of these namespaces should remain localized. Images similarly could have localized description pages. But of course, if we know that they are in some sense equivalent through all languages (same name, same referent), then we can automatically link (or even automatically load?) the page in another language where it does exist if there isn't a page in the current language. Eg, en:User:Anthere and fr:Utilisateur:Anthere both exist, so if I click on Anthere's name on one of here edits here, I'll see her English page; on one of her French edits, her French page. But, say, there's an en:User:maveric149 but mav has no French user page, so if he made an edit to a French page (links or formatting, say), and I click on his user link there, I should be in some way directed to his English page. --Brion 11:41 Dec 5, 2002 (UTC)


 * I've gone ahead and added a "lang" field to the appropriate tables; now it acts as a set of namespaces orthogonal to the namespaces proper. I'm going to need to define the index on three fields now: ns, id, lang. Probably have to make another index on the user and lang column too. What does Jimmy think of having all the encyclopedias in one database? --Clutch 15:33 Dec 5, 2002 (UTC)