Wikipedia:Bots/Requests for approval/WP News 1.0 bot

WP News 1.0 bot
Operator: User:Araesmojo

Time filed: 02:28, Tuesday, October 27, 2020 (UTC)

Function overview: Bot to review news sites, summarize the news contained within them, collect the news into groups based on article categories, and then create summarized link groups for editors to be able to quickly review recent news about a topic.

Automatic, Supervised, or Manual: Automatic, run once per day

Programming language(s): Python

Source code available: Not currently available. Likely use Github repository

Links to relevant discussions (where appropriate):

Edit period(s): daily, review news sites, build single summary page

Estimated number of pages affected: 1 per day

Namespace(s): Articles

Exclusion compliant (Yes/No): Yes, does not interact with most of Wikipedia. Main use is summarizing external news sites and creating single summary page each day.

Function details:

Reason for Creation
To support the rapid and efficient summarization and survey of news stories created each day for use by Portal:Current_events. The Covid-19 epidemic has created a somewhat unique opportunity in that editors are now looking at a fairly wide ranging selection of news sources for current events. However, they are almost entirely focusing on articles related to the epidemic, and with a little bit of support, would be able to also gather stories with numerous links in support of other concepts (US election, military issues, the recent Muslim issues in Europe, ect...). They would also be able to gather these stories from a broader perspective. Example: The Muslim issue is primarily covered by European news, yet Arabic news sites often have their own personal view.

Belief in Validity of Concept

 * Bots have already been used for external scraping of data sources outside Wikipedia
 * History_of_Wikipedia_bots
 * History_of_Wikipedia_bots
 * History_of_Wikipedia_bots
 * Bot does not "deep dive" into news sites
 * Bot only skims top level pages
 * Bot does not read articles
 * Bot only touches each site once per day
 * Bot does not "mass edit"
 * Bot creates a single page each day
 * Bot only summarizes links and stories
 * Bot is not on any of the Frequency Denied Bots
 * Bot does not appear to violate any of the known bot policies

Possible Name Conflict

 * A bot named "User:NewsBot" appears to have a name
 * Has no user page or information.
 * Appears to have gone nowhere
 * Seems to just be a placeholder

Possible Sockpuppet Issue
Bot was accidentally created without using the user crosslinking application.

General Guidelines

 * Bot is run once each day
 * Bot accesses each news site once
 * Bot explore only top level front page, not full article text

Method of operation

 * 1) Explore news sites
 * 2) While (news sites remain unexplored)
 * 3) Open news site from list below in Bots/Requests_for_approval/WP_News_1.0_bot
 * 4) Explore top links for news site
 * 5) Summarize major topics into line items with links to stories
 * 6) Only explore text, no imagery, video, or other high bandwidth information
 * 7) Create error note if news site has changed configuration recently that breaks bot function
 * 8) Perform summary operations
 * 9) Add links to single page on Wikipedia
 * 10) Summarize headlines into word tags or phrases
 * 11) Calculate frequency content on word use (IE: Covid-19 during most days of 2020 would be a high frequency tag)
 * 12) Create list of clickable tags to allow editors to explore sites with common stories
 * 13) Compare links to links from prior day
 * 14) Create map with clickable regions and bubbles showing relative increase in "new" stories
 * 15) Finish bot operations and cleanup any memory or other resident data necessary

User Interaction

 * Users do not interact directly with bot
 * Users interact only with objects created by bot
 * Interaction with tags
 * User clicks tag
 * Frame populates showing links to news sites with relevant articles
 * Bot does not explore articles itself
 * Interaction with map
 * User clicks on bubble
 * Frame populates showing links to new articles from "today"
 * Possibly expand into sub-frame with sub-hierarchy based on high frequency tags
 * Bot does not explore articles itself

Current News Sites of Interest

 * Summarized from Portal:Current_events
 * Other news site suggestions are welcome
 * Sites that need Google Translate to function will likely be lowest priority

https://www.reuters.com/

https://www.aljazeera.com/

https://www.foxbusiness.com/

https://www.bbc.com/

https://www.nst.com.my/ (note, www.nst.com is a security camera company) - appears to be Malaysia

https://sea.mashable.com/ (also Malaysia)

https://www.aa.com.tr/en (Turkish)

https://translate.google.com/translate?sl=auto&tl=en&u=https%3A%2F%2Fwww.nrc.nl%2F

https://www.abc.net.au/news/

https://abcnews.go.com/

https://www.rte.ie/ (Ireland - possible issues with cookies)

https://www.ft.com/ (Financial Times - possible issues with cookies)

https://www.politico.eu/

https://www.politico.com/

https://apnews.com/

https://www.washingtonpost.com/

https://en.dailypakistan.com.pk/

https://www.bloomberg.com/

https://www.nbcnews.com/

https://www.cbsnews.com/

https://hungarytoday.hu/

https://www.euronews.com/

https://www.thehindu.com/

https://translate.google.com/translate?sl=auto&tl=en&u=https%3A%2F%2Fnews.kompas.com%2F (Indonesia)

https://www.newindianexpress.com/

https://www.canberratimes.com.au/

https://globalnews.ca/

https://www.thestar.com/ (Canada)

https://www.trtworld.com/ (France)

https://translate.google.com/translate?sl=auto&tl=en&u=https%3A%2F%2Fwww.ilmessaggero.it%2F (Italy)

https://www.cbc.ca/news

https://vancouversun.com/

https://bc.ctvnews.ca/

https://www.stcatharinesstandard.ca/

https://www.france24.com/en/

https://www.goal.com/en (Soccer / Futbol news)

https://www.abc12.com/

https://tolonews.com/ (Afghanistan)

https://www.military.com/

https://www.todayonline.com/ (Singapore)

https://english.radio.cz/ (Czech)

https://bnr.bg/en (Bulgaria)

https://english.alarabiya.net/News (Middle East)

https://today.rtl.lu/ (Luxembourg)

https://www.dw.com/en/ (Deutsche Welle (DW) is Germany)

https://translate.google.com/translate?sl=auto&tl=en&u=https%3A%2F%2Fwww.beritasatu.com%2F (Indonesia)

https://www.channelnewsasia.com/ (Singapore)