User:RossO/sandbox/Current Events Portal Archive Cleanup

The purpose of this project is to implement a layout technique for the Portal:Current events archives and constituent components that improves (or creates) a mobile-friendly layout. The pages impacted are the 'Month pages', the individual 'day pages', the Calendar elements and the Sidebar elements.

Survey 1 - Month Pages (September 2017)
There are a variety of inconsistencies in the Current events portal archive pages. Here's a quick list of what I found on my first survey:

Part 1. Unnecessary TOCs
Most pages do NOT show a TOC. They are being triggered on the listed pages above and I've noted the reason for it. Most of these additional sections could be moved to a Sidebar box, integrated directly into the listings of the individual days (Film releases) or removed entirely.

Part 2. Migrating contents to day-specific pages
These were the targeted changes for migrating in-page contents to external included content (day-specific pages)


 * 2003: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec
 * 2004: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec

These pages seem to have broken the entire assumed configuration of using the sub-pages included into the primary page. This might will take a lot of work to complete. The contents of each day for each month will need to be copied into a new page for that day following the strict naming format.

Technique: Saved Tabs and Bookmarklets
 For each month I follow a process of opening tabs, running two bookmarklets on each tab and then copying and pasting the contents of the month page from a text file. When working on June 2004 I use the following steps:


 * 1) Launch a pre-saved set of 32 tabs that are saved in my browser's bookmarks. The bookmarks are the January 2003 archive page, plus 31 day-specific tabs. Chrome allows me to open these up in one click.
 * 2) Edit the first bookmarklet:   and replace the year and month as necessary. It sits on my bookmarks bar for easy access.
 * 3) Place the mouse cursor over the bookmarklet and rapidly alternate between clicking the mouse and keying   to apply the bookmarklet to each tab.
 * 4) Edit the second bookmarklet:   and replace the year and month as necessary. It sits on my bookmarks bar for easy access. The pre-fills the content of the page with the appropriate template calls and sets an appropriate Edit summary.
 * 5) Place the mouse cursor over the bookmarklet and rapidly alternate between clicking the mouse and keying   to apply the bookmarklet to each tab.
 * 6) Cut the all of the daily contents from the source of the month page and paste it into a text editor. Apply the following search and replace regex:   and   to clean up the bullet points.
 * 7) Scan the entries for obvious vandalism and sanity checks. Edit the entries as needed.
 * 8) Cut and paste the daily content into each day's tab's content editing box.
 * 9) Place the mouse cursor over the 'Save' button and rapidly alternate between clicking the mouse and keying   to save each tab.
 * 10) Update the month page edit summary to   and click save.

This process usually takes 10-15 minutes depending mostly on how much time is spent in the editing step.

Day 1
Using a  command, I pulled all of the Month archive pages into a single text file. While I found an amazing amount of consistency, I also found a few items that were worth cleaning up. I will tackle these manually, in advance of replacing the month page contents with a hyper-consistent template or Lua script that will generate the contents.


 * 1) Most pages have   but some have
 * 2) February 1999 notes "There were no full moons in this month."
 * 3) First and last months of each decade are noted.
 * 4) A minority of months have "(See Holidays and observances, on sidebar at right, below)" or similar which we may want to remove.
 * 5) The following tags (I don't recognize) are on some pages:  . I will identify these and see if they need to be kept or removed.
 * 6) Pages in 1997-1999 have  . This may be taken out or extended to the rest of the months, including Dec 1996.
 * 7) Some months do not have a call to a sidebar template
 * 8)   was used up through 2013. 2014 and after use  . Will investigate proper usage.
 * 9)   seems to be missing for 2017.
 * 10)   began use in January 2001 and later.
 * 11) A variety of earlier months list holidays. These should be moved to sidebars. They can be removed if judged unnecessary at that time.
 * 12) Standardize the varieties of
 * 13) Standardize the varieties of
 * 14) Some months have preamble text like "The month was marked by…" which will be moved to a sidebar box.

 In order to survey the Month pages, I used the following command in a terminal session to collect all of the contents into a single text file.
 * Process -  command used for survey


 *  curl "https://en.wikipedia.org/wiki/Portal:Current_events/{January,February,March,April,May,June,July,August,September,October,November,December}_[1996-2016]?action=raw" >> archive.txt 

I then used a variety of techniques (such as sorting the lines alphabetically, removing duplicate lines, and various regular expressions) to identify non-conforming content.

Once I have completed these clean up steps and normalized all of the month pages, I will look at creating a Lua script to generate the contents based on a single month and year variable. This will have the added benefit of creating a single location for the shared code used for the page layout. At this point it will be a simple process to make the layouts work in a mobile-friendly manner.

Day 2
240 months surveyed (1997-2016) and cleaned to some degree. The remaining issues are below.


 * 21 months have "International holidays" sections that need to migrate into sidebars.
 * 2007: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec
 * 2008: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec
 * 2010: Sep
 * Many months list holidays for the following month. These should be migrated to correlated month.
 * 2 months (Dec 1999 and Jan 2000) have sidebars (Recent Deaths for December and Holidays for January) but this is prior to the Jan 2001 start. These should be evaluated for removal for consistency.
 * 1 month (Feb 2009) has a preamble about the uniqueness of a February with 28 days starting on a Sunday and the sequential month Fridays-the-Thirteenth. This should be investigated for removal. (Moved to talk page for further evaluation.)

Day 3
All month pages from 1997-2016 have identical layout except for the following items. (Surveys results from previous days have been moved here and updated.)


 * Months from 1997 through December 2000 do NOT have sidebars. From January 2001 onward, all months have sidebars.
 * 2 months (Dec 1999 and Jan 2000) have sidebars (Recent Deaths for December and Holidays for January) but this is prior to the Jan 2001 start. These should be evaluated for removal for consistency.
 * 2 months (Sep 2005 and Dec 2005) have " " for some reason. These should be investigated for removal.
 * Months from 1990's have
 * Months prior to December 2013 have . January 2014 and after use  . These should be reconciled.
 * 13 months have non-English category tags. These can easily remain on the pages.

Notes for later: Parts to add to a Month page generator script:


 * Sidebars begin in January 2001. The template will need to take these into account.
 * See Also sections:
 * 6 months (June 2004-November 2004) have "See Also" links for Sports and 1 month has a See Also link for Science. These should be removed or folded into the Month page template.
 * " News collections and sources. "
 * " News sources – This has much of the same material organized in a hierarchical manner to help encourage NPOV in our news reporting. "

Prototype Month page
The prototypical Month page will have the following code. # will be numbers (mostly years) and * will be letters, often month names. 

Once this consistency in place, we will look at extracting the table-based layout and replacing it with a div-based layout using flexbox attributes.

Day 1: Expressing the Opening Paragraph
I have created the module and it comprises the following pages:


 * Module:Current events monthly archive - The Module and its code
 * Module:Current events monthly archive/doc - Initial docs on how to call it
 * Module:Current events monthly archive/testcases - Test cases as written
 * Module talk:Current events monthly archive/testcases - Test cases as run
 * User:RossO/sandbox/Current events monthly intro - A variety of test cases
 * Portal:Current events/September 2011/Sandbox - A test case in situ, after the text it will replace

Currently this module will only fill in the initial paragraph, but I will add arguments that will allow it to express the parts needed to support the layout of the page contents.

Day 2: Expressing the Page Structure
I have moved the Module forward to express very simple HTML stings that can be used to produce the layout of the page. It uses Flexbox styling in the same way that the Portal:Current events page does now. I would like to run this by the people interested before applying this to all Monthly archive pages. I have not updated the documentation or the testcases yet.


 * Module:Current events monthly archive - The Module and its code
 * Portal:Current events/September 2011/Sandbox - A test case in situ, after the text it will replace