Wikipedia:WikiProject Józef Piłsudski Institute of America/Scalable Archive Project

There are 3.5 million categories on the Wikimedia Commons
We will develop and implement categorization practices that can be scaled, adapted and will help other GLAM projects grow. Categories on Wikimedia are currently underutilized by GLAM projects that would greatly benefit from them, such as document-based archives. Developing categorization practices will help other archive projects organize their digitized collections on Wikimedia and grow their open-source digital collections as more items are digitized. The Piłsudski Institute of America GLAM-Wiki Project will be the test-bed for deeper, meaningful, implementation of Wikimedia categories.

More categories, more context
Archives using a structured data source (such as an archive using the EAD metadata standard) will be able to generate several levels of categorization for their Wikimedia Commons collections. We think that implementing intermediate categories that give context to the documents presented in our collections will make GLAM projects more attractive for Wiki users. Generating these categories from a data source structured to current broadly implemented standards will make the process of creating a GLAM collection on Wikimedia easier.

While there are millions of categories available on Wikimedia, the vast majority of current Wikimedia Commons collections have a broad category, such as “paintings,” followed by an alphabetical listing of content. There is no automatic solution for categorization either during or after a batch upload, but we would at least like to develop ways to ease the linking stand-alone category trees to the millions of categories already available. Intermediate categorization based on already available metadata would allow an archive to contextualize their work in ways that are easier to navigate for the Commons user.

Making it easier to link stand-alone categorization to existing categories would benefit both Wikimedia and GLAM projects. Rather than an alphabetical listing, a collection of paintings would be grouped by era, style, or artist, depending on what information the metadata contains and what Wikimedia categories exist. Individual items could appear in multiple categories depending on how they were tagged in the metadata, utilizing a key feature of the Wiki format.

Project goals
To help GLAM projects harness the power of the Wikimedia Commons and make them better to browse for users.

Project plan
Our script is thus far able to create a category tree by mimicking fonds and folder hierarchy used by the Piłsudski Institute archive. The results of running the script can be seen here:

Collections of Józef Piłsudski Institute of America by fonds

These “fonds” are automatically added to files by the [Template:Piłsudski Institute document], while category description is created by [Template:Józef Piłsudski Institute of America Category Description] and [Module:Józef Piłsudski Institute of America Template:Józef Piłsudski Institute of America Category Description]. The “Accession number” field in [Template:Piłsudski Institute document], adds links to those new categories.

Step One
We hope to utilize not only folder numbers but also titles and document tags to enable better browsing. This will be generated from metadata provided it is available and desired. Collections that utilize this information will be easier to create and become friendlier data sources to browse.

Step Two
We would like to make connecting stand-alone categories created from metadata to already existing Wikimedia categories a less manual process. The more experience we gain by experimenting with our own Wikimedia Commons collections, the more streamlined we can make the process for others.

Step Three
Documenting our findings is the only way that other projects will be able to use our work and we hope to be thorough in providing it. Ultimately we would like to have contributed something that other projects will want to adopt.

Step Four
We hope that our project could serve as another place where categorization could be discussed, with GLAM projects specifically in mind (but not exclusively).

Activities
We have already developed code to transfer our archival categories to our Wikimedia Commons collection. At the same time as code is developed we want to focus on creating sensible categorization practices. We will:


 * Implement the code for our whole Wikimedia Commons collection.
 * Continue uploading original documents to the Wikimedia Commons using the script we have developed.
 * Improve workflow with the idea that continual human interaction with uploaded content is critical to curating a digital collection.
 * Improve upon the script and code as we gain experience with it.
 * Create documentation so that other projects can benefit from our experience.

We assume the following improvements will be necessary as we work with the current code:


 * Make the code easy to implement for others.
 * Develop categorization meaningfully. That is: to give context to collections rather than make them more confusing.

Community engagement
The Piłsudski Institute of America GLAM-Wiki project has a record of engaging a volunteers. It has thus far resulted in donating about 1,200 documents from the Institute archives to Wikimedia Commons, along with over 50 new Wikipedia articles in both English and Polish. We are working on this project because we would like to add documents to Wikimedia Commons faster and more meaningfully. With a working script, the institutional volunteers we regularly attract to our project will be more efficient in sharing our collection.

After creating a resource page that gathers links to Wiki and non-Wiki resources documenting categorization practices, we hope to create a discussion space where categorization can be collaboratively brainstormed and align our project with practices developed there. We hope that such a discussion might attract collaborators and make our work useful for a broad audience and not just ourselves. Using our work, we hope that other Wiki projects will be able to streamline their own uploading processes and contextualize their collections in a more meaningful way, thus hopefully make their projects more attractive for Wikimedia users and their own project volunteers.

Potential Collaborations

 * WikiProject Categories
 * Category:Wikipedia Categorization - relevant active category discussion pages on Wikipedia.
 * Commons:GWToolset
 * GWToolset users

Sustainability
The project will continue as we continue to add documents from our collections. Other GLAM projects will be able to use our work and hopefully make the code better as well.

Quantitative

 * Upload 500 new files to Wikimedia Commons.
 * Complete 50 Wiki articles based on files uploaded to our Wikimedia Commons collections.
 * Engage 5 institutional volunteers for our project.
 * Establish a collaborative relationship with a GLAM or other Wiki project.

Qualitative

 * Outreach in the form of:
 * Blog posts
 * Open, online meetups (i.e.Google Hangouts) and information sessions to explain the project
 * Contacting potential collaborators
 * Helping other GLAM projects by:
 * Improving documentation regarding categorization practices
 * Making our code freely available at an online repository
 * Creating a User's and instructions guide for the code and its implementation

Project Manager
Lukasz Chelminski is a doctoral candidate in the History department at the CUNY Graduate Center and an adjunct instructor at Brooklyn College. He is a proponent of digital pedagogy and has used web resources in the classroom extensively. Lukasz is the Wikipedian-in-residence at the Piłsudski Institute of America. Through his experience at the Institute he hopes to develop a pedagogy strategy which will introduce his students to Wiki-editing through collaborative class projects.

Volunteer Coder
Jarek Tuszynski

Volunteer Metadata Specialist
Marek Zielinski

Community Notification
Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?
 * | Wikimedia NYC (through "discuss" mailing list)

Endorsements
Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. (Other constructive feedback is welcome on the talk page of this proposal).


 * OR drohowa (talk) 15:43, 29 September 2014 (UTC)
 * Marek Zielinski - Categories are very important in GLAM partnerships. The project has a great potential to utilize the GLAM's own metadata in expanding and improving the categorization of uploaded files. 20:06, 29 September 2014 (UTC)
 * Categories in GLAM greatly facilitate research for scholars such as myself. As a doctoral candidate in European history at University of Wisconsin-Madison, I can better organize my sources if my access to the materials at such institutions as the Piłsudski Institute is faster and easier. Piotr Puchalski (talk) 03:16, 30 September 2014 (UTC)