User:TronBot/QA

Quality & Importance Metrics

 * Quality content is the Wikimedia project top concern.
 * GRE are assesed by machines
 * Systems that have done quality assement

Information Metrics

 * 1) Entropy/Information (compresed size)
 * 2) Normed Entropy (compresed size/word count)
 * 3) Unique 3-grams (unique triplets)

Linking Structure

 * 1) WikiLinks
 * 2) Orphan, near orphan
 * 3) Catagories
 * 4) Catagories Relevance (Kullback-Leibler divergence | cross entopy across the catagory)
 * 5) Incoming Link Count | Out Going Link Count (Kullback-Leibler divergence| cross entopy | PLSI - across the catagory)
 * 6) Incoming Link Quality | Out Going Link Quality (Kullback-Leibler divergence| cross entopy | PLSI - across the catagory)
 * 7) Link Rot

Stability - Add to Recency
Logic indicatres that High Quality articles would become complete and require no more editing. Research indicates that editing can degrade an article's quality.
 * Edit spikes
 * Edit time series model (i.e. how long till next edit is expected based on full history|recent 7 edits)

Importance Metrics

 * 1) Cache Traffic Rank.

Verifiability

 * 1) Citation Metrics.
 * 2) Citation Count.
 * 3) Citation Density.
 * 4) Citation Relevance (Covrage)
 * 5) Citation Quality.
 * 6) BLP Citation Density.

Media

 * 1) Images.
 * 2) Image Count.
 * 3) Image Caption.
 * 4) image - fair use
 * 5) image - commons
 * 6) video
 * 7) audio
 * 8) Quoate
 * 9) Poem
 * 10) Source
 * 11) Template Count.
 * 12) Problem Tags (Inline|Section|Article).
 * 13) Non Problem Tempates.
 * 14) Table Count and Mass.

Social

 * Words per editor (mean, standard deviation) for (Article;Talk;Both,Only Talk,Only Article).
 * Edits per editors (mean, standard deviation) for (Article;Talk;Both,Only Talk,Only Article).
 * Permission per editor (none, ... ,admin)
 * Edits per editors Good and Featured articles (mean, standard deviation).
 * Editors Contribs per Namespace (mean, standard deviation).

Calls for Action (Huggle/Twinkle/common content templates)

 * Tot Template Count
 * Tot Inline Template Count
 * Red Link Count

Conflict (Researchers believe that Conflict is a marker of Quality)

 * Number of non vandalism edits reverted,Deletion, Rollbacks (based on via use cosine between diff vector and current version)
 * Edit Wars
 * 3Revert Violation
 * Afd nomination
 * Move event
 * Semi/Full Protection events

Coordination and Communication

 * Talk page Discussion
 * History Comment length
 * Inline Template count
 * Page/Section Template count

Style
The style guide is full of both quality features and small task sources. Many are context sensative. Is there research on style guide complience? I think style complience is seperates article C and below and B and above.


 * Title compliance
 * Title is Sentence Case; ;
 * Title starts with A, An, The
 * Title is starts with Noun/Noun Phrase
 * Title is ends with Punctuation
 * Title formatting


 * Hat Notes - must be at start


 * Has Lead
 * Section Titles
 * Section Templates


 * Appedecies in this order:
 * Works or Publications
 * See also
 * Notes and/or References
 * Further reading
 * External links
 * Navigation templates (footer navboxes)[6]
 * Geographical coordinates (if not in Infobox) or
 * Persondata template
 * Defaultsort
 * Categories
 * Stub templates
 * Interlanguage links


 * deprecated items
 * horizontal rule

Categories included may be disputed (have infoboxes

etc.....

Lexical

 * word count
 * word count - top       0-   99 most frequent
 * word count - top   100-  999 most frequent
 * word count - top 1,000-9,999 most frequent
 * word count - core
 * word count - 3 times or less in Wikipedia
 * word count - words unique to article
 * sentence length (mean,sd)
 * collocations
 * compound words
 * lead to section seme match
 * typos
 * style errors
 * english varients
 * us
 * uk
 * other
 * non english
 * IPA info

===Sentiment ===
 * Sentiment is the most interesting are of research in this area.
 * It requires sophisticated NLP analysis.
 * It allows to compare similar articles with the level of sntiments they express.
 * It can be used to discriminate Promotional or attack articles from regular ones.


 * classify sentences into facts and opinions.
 * opinion senteces have
 * object
 * opinion (implicit or explicit)(comparative or direct)
 * emotion
 * opinion quintuples $$(o_j,f_{jk},oo|{ijklmhi},t|l) $$in d


 * Opinion sentences -
 * The Primary Emotions
 * love
 * joy, surprise, anger, sadness and fear


 * cool,ok,sucks,lousy
 * too (adj|short|long|expensive)
 * is (good|no good|bad|awful)

Semantics

 * redirects to article

Others

 * watch listed by users
 * markup sophistication score
 * bold,
 * italics,
 * templates,
 * math,
 * ref,
 * head1..head5,
 *  ,
 * time line,
 * links: internal, external,inter-wiki,catagories,images,pipetrick
 * isbn
 * image,
 * bullets,lists,indents,
 * tables, etc...
 * microformats

Readability
Domain experts are important for suplying the most scarce resource - dependeable new content with professional refrenceing. They should not be penalised for other issues such as style etc. Readability adds a new dimension to quality - rather than agrageate information metrics it informs us, how well written the text is. Writing more readable text is more difficult than writing non readable text.


 * SMOG (Simple Measure Of Gobbledygook) looks like the recommended metric - requires 30 sentences.
 * Gunning fog index
 * Flesch-Kincaid readability tests:
 * Flesch Reading Ease
 * Flesch-Kincaid Grade Level
 * Accelerated Reader ATOS
 * Automated Readability Index (ARI) - character dependent, fast to calculate, langauage independent.
 * Coleman-Liau Index - character dependent, fast to calculate, langauage independent.
 * Dale-Chall Readability Formula
 * Flesch-Kincaid readability tests:
 * Fry Readability Formula
 * Gunning-Fog Index
 * Lexile Framework for Reading
 * Linsear Write
 * LIX
 * Raygor Estimate Graph
 * Spache Readability Formula

Action
It would be great if we could per edit asess if readability has gone up or down. Since more readable articles will provide greater value for readers. Also repeated decrease in readability should trigger a request for a copy editing expert to improve the text.

A CSCW system that aims to assist editors to improve readability should be able to highlight the problematic sentences/section using color or other visuals.

Integration

 * PHP Readability kit

Issues

 * 1) These metrics are english only.
 * 2) These do not look at syntax or semantic complexity.
 * 3) Most are these are language dependent.
 * 4) To cook a new formula incorprating the above text for calibrating against different school-university grades would need to be located. Or we could check against other formulas....

Other Facets
=See also=

=Refrences=