User:Jim Grisham/RFC Drafts/Notability scoring

Wikipedia - Article notability

RFC for addressing notability and deletion concerns
URL (initial): https://en.wikipedia.com/wiki/User:Jim_Grisham/RFC_Drafts/Notability_scoring

Other storage locations: \.. GitHub: https://github.com/jgrisham/wikipedia-rfc-issues * *(proposed; that repository does not yet exist)

Version: 0.1

Created: 2022-07-03 by Jim Grisham Modified:

This proposal implies both technical _and_ policy considerations … I imagine that many proposals for tackling some long-term issues from just one of those dimensions, due to understandable logistical factors (i.e. finding a time when both the editor communities and developers have available resources is probably rare, and hybrid solutions will likely take more time to inact - especially if they are to be integrated into MediaWiki proper -, making them unsuitable for resolving more urgent concerns).

Description
‘Notability score’ for each article * … or words: never / maybe / basic / standard / full / absolute * Blank = maybe brand-new articles, until a UI button (e.g. “Publish”, “Go Live”, or “It’s ready”) is pressed by any editor during a page edit. * But we already have the ‘Preview’ button! * This would allow refining, over multiple edits, before the article is generally available (if for ~no other reason~ than to minimize the effect of browser crashes / tab reloads, or the fear thereof) * (which are notoriously common on mobile devices - yes, I do many of my edits on a iPad or iPhone, and some potential editors may not even have ready access to desktop/laptop computers… at least not during the free time they may be willing to spend on Wikipedia/Wikimedia projects) * This could also aid in loose collaboration between multiple editors who may decide (consciously or not) to create a new article together. * This could reduce the complexity (especially cognitively) for newer (or much older) editors of understanding ‘User:’, ‘Sandbox:’, and ‘Draft:’ namespaces and the complexity of moving articles from one namespace to another * Compare and contrast ease and breadth of collaboration, for both novice and ‘power’ users, with other systems, e.g.: * Analog systems: chalkboards / whiteboards, physical card catalogs, etc. * Traditional e-mail (individual and discussion lists) / Usenet / Lotus Notes / Forum software (e.g. PHPbb, traditional pre-WWW BBS systems) * GitHub, StackExchange, etc.               * Social media * Office365 / Google Docs / etc. * Evernote / OneNote / etc. * Slack / MS Teams / etc.           * Compare and contrast ease of non-collaborate authorship * Analog systems (handwritten): notecards, notepads, mindmaps, etc.               * Analog systems (typed): card catalogs, typewritten documents * Word processors, ‘office suites’ * Movable Type / Wordpress *        * Never = user page / sandbox / ‘talk’ page * Always excluded from Special:Random * Maybe = ‘draft’-style articles * Might be visible / searchable based on dynamic notability metrics * Always excluded from Special:Random * Basic = ‘stubs’, auto-patrolled users, articles under a minimum age *        * Standard = well-established articles with (near?) unanimous consensus of notability * These pages are excluded from many dynamic notability metrics * Would be suitable (at least in terms of notability) to be included in a hypothetical CD-rom or downloadable ‘offline’ version of Wikipedia * Full = well-established, nearly ‘iron-clad’ notability * ‘Override’ - these pages are excluded from ~nearly~ all (or all for now) dynamic notability metrics * Would be suitable (at least in terms of notability) to be included in a hypothetical printed version of Wikipedia * e.g. United Kingdom, Money * includes, for example, all topics that were notable enough to appear in catalogued paper legacy encyclopedias (e.g. List of articles from the 1903 Encyclopedia Brittanica           * articles from a community-maintained allowlist?        * Gold = special pages            * A ‘master override’ - these pages will never be affected by dynamic notability metrics            * e.g. Wikipedia            *      * Existing NPP process would ‘promote’ a new article to ‘basic’        * * New articles should probably ~provisionally~ be shown for a period of time to allow for visibility of ‘current-events’ articles, such as an article written         * An article written by an editor with the ‘auto-patrol’ permission, or an article where an editor with that permission makes some (e.g. minimum of 1000 characters of non-template visible text * Some sort of (long-term, to minimize gamesmanship) voting process, a-la StackExchange, could also promote or demote articles within the ‘intermediate’ notability levels * Notability score could also be weighted by various, long-term, usage analytics; e.g.: * number of article views * number of unique editors who edit a particular article at least n times on at least m different days / months * number of incoming wikilinks * presence of active companion articles on other language sites * incoming links from a selection of trusted / allowlisted (whitelisted) external sites, such as government domains * site search requests over time * e.g. if a significant number of searches for a particular term such as “AMC Gremlin” or “Star Wars Episode XV” appear over an extended period (to reduce abuse / bot potential), then that term would be marked for automatic ‘Basic’ notability status if and when it is created (a perfunctory NPP check could still be required, perhaps by a larger group, to guard against arbitrary abuse). * This type of anonymous analytics data could also help editors and WikiProjects which topics to prioritize for new article creation. * That effectively ‘mutes’ or ‘shadow bans’ low-notability pages * Draft namespace articles already do this on redirect pages, if a ‘promising’ tag is present on that redirect * Link color for wikilinks targeting the _lowest_ level could be different, too * (e.g. orange or black) * Articles with the …? * The same sort of search ranking can be applied to article ‘quality ratings’ or other future metrics * More granularity and transparency, for both editors and readers * auto-expiring ‘Draft namespace’ could be retired * Existing ‘notability’ / deletion debates may become much less contentious and resource-intense * More editor time would then be available for other projects without risking site quality * Editor retention and acquisition may increase * Debates could be closed faster by requiring a ‘strong’ or ‘nearly unanimous’ level of consensus * They could be much longer, too, allowing infrequent editors or even just regular readers to weigh in over months or years * most of the current needs for urgency (including editor workload) would likely disappear * this would result in much more opportunities for articles to be improved, or for references to be found, or even for readers to decide to create accounts and become editors * Competitive review * Social media sites are nearly the most permissive, only restricting or removing obvious bad behavior (copyright violations, gross misinformation, calls to violence, etc.) * Some sites, like StackOverflow, do sometimes close questions as off-topic * Those closed questions are blocked from further answers (and perhaps demoted in search results?), but any user can see them, and edit/improve them to cause re-opening. * All content (and history) remains *    * Deletion would only be required for ‘speedy delete’ topics, such as        * obviously fake or joke topic, obviously trivial, obviously abuse; e.g.             * List of types of cheese forming the mantle and core of the Moon * mass creation of an article for every stoplight in a particular town * Articles deemed off-topic, egregiously non-notable, etc. that are not ‘speedily deleted’ could be temporarily locked from new edits (e.g. quality and notability scores cleared; auto-protected for 30 days, but article text (and importantly the full edit history; useful for not just revisions but also for the ‘edit summaries’j) would remain accessible for those who still had a link to the page.   * An issue with deletion that seems to get less attention is that (usually?) the corresponding ‘Talk’ page, including its edit history and any archive subpages are also deleted.        * These ‘Talk’ pages could contain useful references or other information, and also may provide more context to why the article was created and deleted than might exist in a publicly viewable ‘AfD’ discussion archive.    * Requires up-front effort to design policy and technical framework    * Potentially less ‘manual’ control * May be seen as less ‘arbitrary’ * Reduce admin time spent on some bureaucratic matters and policy disputes * Limits opportunities for bad-faith admin action - ‘level playing field’ * Easier to tune than fixed policy, * easier to address systemic issues that occur * e.g. combatting new methods of policy abuse or circumvention * Compare to: * Spam Assassin scores and e-mail filters * Google Search vs. the original Yahoo Directory * A CMS]-based web site vs. a [[Gopher site       * A Relational database vs. a simple Spreadsheet
 * e.g. ‘stars’, number (1-3 or 1-5), percentage (0-100%, or just 0-100), etc.
 * New articles start at lowest score*
 * Articles at less than minimum notability level could be excluded from search engine indexing, and would only appear last (or if no other results) on local site search
 * Per-user settings (with an appropriate default) could hide all articles below a selectable notability level, much like how the ‘age filter’ functions in streaming video platforms
 * Benefits
 * Drawbacks
 * Why a dynamic system

Further Details
This kind of notability ‘rating’ would *not* replace article quality metrics.
 * Article quality is much more likely to

* Taking into account: (ed: illustrate concept with a ‘spider’ graph??) * Notability score * Quality score * If quality score is up-to-date and above a certain level, ‘notability score’ can likely be ignored for those articles (at least until everything else is transitioned and well-proven both technically and policy-wise) * Percentage of edits * Completeness / stability score * Frequency of edit reversions vs overall edit frequency * Talk page (and Talk archive subpage) existence / number of contributors / * Popularity score * Already exists in terms of ‘number of recent views’ * Could be combined with in-bound internal, interwiki, or external links as mentioned above, or could be a separate parameter * (i.e. similar to the original Google PageRank system… is that out of patent yet? If not, can the Foundation get a free and perpetual license to use something similar in MediaWiki and any derivative works, perhaps using the established contacts used to set up the 2022-era ‘Wikimedia Enterprise’ program) * Category or WikiProject inclusion * ‘Locked’ articles * User preferences * i.e. only show articles with an ‘x’ minimum notability rating and a ‘y’ quality rating, or edits with a minimum age of n, a minimum number of edits of ‘m’, or a minimum number of editors of ‘o’ * e.g. Jane only wants to see articles that are both High Quality ~and~ High Notability * Automatic (and private) determinations using an offshoot of existing ‘CheckUser’ technology * Once the framework exists for more granular control of article visibility (crawling, search visibility, and, e.g. wikilink color): * additions or changes to the above parameters (and associated ‘policy’-based ‘business rules’) can become trivial from a technical standpoint, and therefore nimble. * For example, a contentious policy change can safely be trialed for a short period of time. * Both granular, number-line-style (e.g. the proposed notability rating), parameters and boolean parameters (e.g. article locked”, “search-visibility-override”, etc.) * For an analogue to such a parametric rating / decision system, think of the operation and logic of e-mail SPAM rating systems (is it SpamAssassin? that does that) * There’s likely published ‘Information Science’ research on areas applicable to this proposal and its potential implications and implementation, possibly in other languages (ask other language Wikipedia staff / editors for input & leads on finding non-English-language research ) * Library Science and CS schools may be willing to provide input and help brainstorm both Info Architecture, policy (including archiving / retention), and technical ideas and model implementations or review our ideas and concerns
 * The technical methods discussed above could control article visibility in a highly parametric way:

Additionally...
It doesn’t all have be done at once! * Not actually used, nor any policy changed, during a trial period * Roll out in stages to existing articles, before a ‘real-time’ rating system is fully developed & deployed * Automatic assignments for the transition: * All articles that have already cleared NPP process (and older articles that predate it?) can be auto-assigned a notability rating of ‘Basic’ * … or perhaps be assigned a ‘virtual’ notability rating such as ‘Basic-legacy’ * Articles with a top quality level could be assigned a (real or ‘virtual’) notability rating such as ‘Full-legacy’ * Articles with a minimum number of edits and or editors (and length) * …   * People who have written past improvement proposals * [ed: e.g. the one I saw in late June 2022 about animations and other ‘rich content’] * Put article in monthly newsletter (name?) * Also in Monthly Admin newsletter Administrators%27 newsletter * Tech News: Special:MyLanguage/Tech/News/ * Other communities * various StackExchange sites * Quora * Twitter * non-English-language announcements * Archivists; e.g. National libraries, Internet Archive, etc.   * Sitewide banner (in multiple languages) * Mass e-mail to inactive, non-banned, editors who * have a non-trivial edit history (> 25 lifetime edits?) * have ~ever~ commented on a ‘Talk’ page (including their own, perhaps with an more than a threshold number of self-edits to it), contributed to a ‘WP:’ namespace article, have other non-article activity, etc.
 * e.g. start with a backend and UI for notability ratings
 * Enable notability filtering for new articles
 * Do what we can, but ~also~ use discussion also as a ‘vision board’ for Wikipedia2025, Wikipedia2040, etc.
 * Reach out creatively to possible stakeholders

Process
Keep a positive attitude * Even if an particular idea doesn’t appear to be feasible ~now~, from either a policy or technical standpoint * Make a note of why that is, but then ask: “But for” that valid concern, how might this idea be of use * Doing that with each objection shows respect, builds up a ‘FAQ’-type history for late comers, and perhaps most importantly allows re-visiting the idea far in the future if the conditions predicating the objection no longer exist. * Different people’s brains work in all sorts of different ways… sometimes at the end of a ‘ridiculous’ path something of great value, perhaps obvious in retrospect, is found.
 * Improv comedy ‘Yes, and…’ philosophy

Pre-publication Notes
Editor’s notes (remove before publishing final version) * Concerns * Understandability * Executive Summary? * Genesis * Why did I bother writing this? * What problems do I see that need attention? * What disciplines / areas does this affect? * Are there any proposed solutions? * Am I duplicating anyone else’s past or current work? * Complexity of proposal * Release in stages? * Display in multiple sections or pages? * Be mindful of research on attention span * Change management for the actual proposal documents *    * Wikitext * Enhanced with collapsable sections * Use HTML or MediaWiki templates for this? * Copy at my personal site and/or wiki? * Threaded discussion forum? * a BB       * something like Phab / GitHub * Twitter * PDF * full proposal or just a summary * Outline * Interactive? * Static? * i.e. HTML or PDF * Multimedia * Static or animated UI mockups / demo * (e.g. Sun’s “StarFire”) * Charts, graphics, mind-maps, DB schema charts, process flow diagrams, etc.       * Spreadsheets, tables, or graphs displaying things like cost over time * Some ideas might be better deferred for * Future Wiki use * Use to enhance existing or create new to non-Wiki computing / knowledge management systems * Keep in mind attention span of both individuals ~and~ groups * Provide a fully basic, sample proposal, to aid in the explanation * Axis 1: Separate concerns / solutions * At least in a non-published outline, as a check on logical and rhetorical consistency * Axis 2: Separate policy and technology * Technical time and money * Policy time * Implementation / policing * Communication * Opportunity costs * … of inaction * … of action * Current policies requiring editor or admin effort: * Reviewing pending changes * Requests for permissions/Pending changes reviewer * Deletion-related * Requests for undeletion * Deletion review * Proposed deletion * Criteria for speedy deletion * Revision deletion * Deletion process * Guide to deletion * Deletion policy * Category:Wikipedia deletion guidelines * New-article-related * Articles for creation * AFC submission *            * Drafts * Help:Userspace draft * Articles for creation * WikiProject Articles for creation * Article Wizard * Temporary or special content pages for edits * Workpages * User pages *                * Help:Link                * Finding subpages                    * Special:PrefixIndex “All pages with prefix” report                    * Special:PrefixIndex/fullpagename/ using the search box                    * Add a ‘Subpages’ link to the user’s ‘Tools’                    * Active list:                        * Use the list subpages template                        * See also:                            * subpages                            * search link            * Special:SpecialPages            * Special:Search    * Share workload    * Explore different viewpoints - will result in a better product    * Automated / assisted tools and workflows        * converter / workflow between WikiText markup and:        * setext / Markdown        * HTML / XML        * OneNote        * OPML    * Automatically assign formats        * wikilinks to words/phrases? with matching articles        * hyperlinks or ref/cite tags for URLs        * formatting for ordered lists, etc.    * Other        * HTTP proxy to allow pasting wikilinks into browser bar (i.e. like how 12ft.io functions)            * …or a bookmarklet to do so based on the device clipboard or by manually pasting into a dialog box field    * Change management / version control        * MediaWiki history        * Git / GitHub    * References with no other home:        * Shortcut index    * Random bookmarks of pages in progress:        *         * User talk:AmandaNP
 * How to best present this?
 * Keep scope of initial proposal tight
 * Clearly define concerns
 * Estimate costs (both initial and ongoing maintenance)
 * Can I find partners to help with this before publicly announcing the RFC?
 * Tools that can help the authoring process

Scratchpad
(in-article temporary sandbox)