User:MingoBerlingo/Visual content assessment tool

This article describes the initiative to develop a monitoring tool designed to track the visual content within articles of Wikiprojects: The Visual Content Assessment Tool (or simply VCAT)

A working version of the tool is available at: https://aquets.github.io/VCAT-dashboard/

Why a tool?
To monitor the progress of articles, many Wikiprojects use the system of Content assessment, which generates tables through WP 1.0 bot accessible at wp1.openzim.org. These tables allow the identification of incomplete articles that require focused editing and efforts to enhance their quality.

The presence of images is a defining characteristic of high-quality encyclopedia articles (as stated in the Featured article criteria). But, as a member of the Graphics Lab, I have noticed some difficulties in identifying articles where images can be inserted, modified, or created.

This tool could bridge this gap and assist in identifying articles that need improvement in terms of visual content, thereby increasing the overall quality of the articles. Specifically, it could be used to create to-do lists or identify requests for the Graphics Lab.

The project
This tool is an interactive dashboard which, through filters and visualizations, gives an overview of the gaps of visual content in a Wikiproject and allow to identify articles in which new images are needed.

Some of the actions that can be done with it are:
 * Monitoring the visual content coverage in a Wikiproject
 * Identifying articles needing images (eg. articles without images)
 * Identifying low resolution images to improve (eg. raster diagrams to be vectorized)

I produced a first version of the using React.js, the React UI kit Ant Design and hosting it on GitHub.

VCAT is available at: https://aquets.github.io/VCAT-dashboard/

Functionalities
VCAT is made of two parts: the Extraction tool and the Dashboard.

Extraction tool


The Extraction tool is a command-line interface designed to extract data from Wikiprojects, articles, and images. It generates files that can be explored through the Dashboard. Data can extracted both from a Wikiproject or a custom list of articles.

The data extracted includes:
 * Number of images (in each article)
 * Categories
 * Assessment metrics (quality and importance)
 * File Type (jpg, png, svg or gif)
 * Image resolution (in pixels)

The source code of the extraction tool is in python and it is available on GitHub.

Dashboard


An interactive Dashboard enables the exploration of images within a Wikiproject and the detection of gaps in the visual content.


 * Overview: there is a panoramic of the Wikiproject (eg. how many articles without images) and a selection fo articles and images that could be improved.


 * Articles: the list of all the articles of the Wikiproject can be explored using filters and orderings. There is also an alternative view of the section, focused on the visual styles, that shows only the images.


 * Images: is similar to the Articles section but with images that can be filtered and sorted by resolution, file type an so on.


 * Change data: a data file (extracted using the extraction tool) can be uploaded. Otherwise can be used a data sample already collected by me (MingoBerlingo)

How to use it

 * 1) Download the Extraction Tool: Select and download the correct version for you operating system from the download area and extract the zipped files.
 * 2) Run extraction_tool.exe: Which opens a command-line interface with a menu.
 * 3) Extract the data: You can either extract data from a Wikiproject or from a custom list. Follow the instructions in the extraction tool. (With big Wikiprojects or long lists the extraction could take more than 1 hour. Data are saved gradually, allowing you to stop and resume the extraction.)
 * 4) Get the output file: You can find all the extracted data in the output folder, named as the Wikiproject or the list’s file.
 * 5) Load the data file: Upload the .JSON file in the dashboard section of the website and explore the dataset.
 * 1) Get the output file: You can find all the extracted data in the output folder, named as the Wikiproject or the list’s file.
 * 2) Load the data file: Upload the .JSON file in the dashboard section of the website and explore the dataset.
 * 1) Load the data file: Upload the .JSON file in the dashboard section of the website and explore the dataset.
 * 1) Load the data file: Upload the .JSON file in the dashboard section of the website and explore the dataset.

How it works
To extract data related to articles in a Wikiproject I used WP 1.0 API, while to extract data related to images in each article I used MediaWiki Action API. The extraction is executed through the Python script used in the Extraction tool.

The dashboard is a hosted on GitHub, and it is a website built using React.js and the React UI kit Ant Design.

Further explorations
To analyze the visual content with a broader look, I used PixPlot to create a cluster interactive visualization of all the images organized by similarity.

This visualization is available at https://aquets.github.io/Wikiproject-Chemistry-images/

Feedback
I really appreciate any kind of feedback: Do you think this tool can be useful? Is there anything to improve?

If you have any idea on how to improve this tool with new features, new data or bug fixes please contact me. Leave a message on the talk of this page or on my talkpage (User:MingoBerlingo)