User:Chicagohil/sandbox

Call Details

 * Date: 2023-03-21
 * Topic: LD4P3 (Linked data for production: closing the loop) team presenting on their integration of Wikidata information around musical works into a Cornell library catalog prototype with a focus on the Wikidata aspects of work
 * Presenters: Huda Khan, Cornell University; Steven Folsom, Cornell University, Kevin Kishimoto, Stanford University, Astrid Usong, Stanford University

Presentation Materials

 * Slides
 * LD4 Discovery Affinity Group Notes and Recording on discovery aspects of the work
 * Demo video of the prototype
 * LD4 Discovery Affinity Group presentation on discovery aspects of the work
 * LD4P3 work

Background

 * LD4P3: Closing the loop: aims to create a working model of a complete cycle for library metadata creation, sharing & reuse.
 * Discovery:
 * Using linked data to support & enhance discovery
 * Work on integrating into production
 * BAM-WOW: Trying to leverage work of MLA LInked Data Working Group
 * Capturing thematic catalog identifiers in Wikidata: information not usually found in catalogs.

Music Library Association LInked Data Working Group (LDWG):

 * Emphasis on exploring & experimenting with linked data
 * 15 members, mostly MLA music catalogers
 * Goals include
 * build practical skills
 * learn & understand concepts & theory
 * build connections
 * Initially focused on BIBFRAME but then branched out into Wikidata
 * Projects focused on individual or institutional interest

One project: thematic catalog number concordance in Wikidata

 * Thematic catalog is a music reference book that aims to be a comprehensive list of composer’s works–like catalogues raisonnés in art
 * Each work includes other info, such as historical information, musical characteristics
 * Most important: thematic authors often assign a number (identifier) to each work
 * Different authors assign numbers that do no line up
 * Antonio Vivaldi
 * wrote more than 800 compositions and most instrumental (500 concertos with title concerto and 100 sonatas with title sonata)
 * Different catalogs for Vivaldi have different numbering depending on who published
 * Example of one work: Vivaldi, Antonio, $d 1678-1741. $t Estro armonico. $n N. 6 (uniform title). Known by many designations! (op. 3, no. 6 ; RV 356, etc.)
 * Title pages for the same work show differing numbering systems–a frustration of many music catalogers!
 * Needed way to look these up easily in a structured data manner, so can use them easily

Data Harvesting & conversion to Wikidata

 * Found list of Vivaldi works by RV on Internet: copy & paste into spreadsheet #1
 * Searched id.loc.gov for works by Vivaldi and exported results (Atom to CSV)
 * Batch searched LCCNS in OCLC Connexion & export authority records
 * Import batch NAR file into MARCedit & converted fields/subfields into spreadsheet (spreadsheet #2)
 * Queried Wikidata to find what items existed for Vivaldi works and exported to a spreadsheet (spreadsheet #3)
 * Imported thee spreadsheets into OpenRefine to join data from on various matchpoints
 * Humans verified/corrected data in final spreadsheet
 * Created/edit Wikidata items using OpenRefine Wikidata extension (most of them didn’t have items)
 * Wikidata item for Violin concerto in A minor (RV 356): https://www.wikidata.org/wiki/Q116050394
 * Started working from spreadsheet from Kevin
 * Screenshots from catalog grabbing some information from Wikidata and supplementing the catalog
 * Included works have work info buttons next to them that give you knowledge panel with some data–can click through to author/work browse page
 * How does this work?
 * Solr index that sits behind the catalog has a field where you can obtain the author title headings associated with an item.
 * Then query id.loc.gov with LCCN to find Wikidata entity
 * On Wikidata works, have catalog codes number and which edition it is in
 * Path to choose number: LCCN → Qnumber → P528 (catalog code) → PS: P528 catalog code (414)
 * Path to choose label: LCCN → Qnumber → P972(catalog code) → Label

Usability testing: Process

 * Goals:
 * Design for incorporation information
 * Usefulness of properties
 * Participants: 5 total (3 grad students, 2 staff members)
 * Timeline: December 2022
 * Think alouds with feedback questions
 * Virtual
 * Given tasks to find specific information

Usability testing: Outcomes:

 * Very much appreciated & received
 * Easy to find included work knowledge panels
 * Wanted identifying info & labels higher up

Usability testing: Wikidata properties

 * Which would you find interesting?
 * Catalog numbers (all)
 * Instrumentation (three)
 * Librettist (three)
 * Tonality (two)
 * Opus (one)
 * Caveat: not a survey, but relied on participants’ memories but provide starting points to ask more questions

Lessons & questions: Design

 * Generating use cases: Would be great to display information, but what about search?
 * Would require more focus and work: entire design/indexing work cycle on its own
 * Generic (catalog numbers very helpful) vs distinctive title (incorporating multiple languages would be useful)
 * Typical design: bio panel and then holdings to the right
 * Existing author buttons are driven by presence of an authority record, then they can look for equivalencies in external data
 * Did design brainstorming
 * Add expandable section for each included work
 * Inline option?
 * Landed on work info button

Lessons & questions: Models

 * Prototype makes it look easy, but is jumping across multiple sources of information.
 * Library catalog item
 * LC item
 * Wikidata item
 * There’s often not a one-to-one correspondence while jumping to multiple data sources: how to deal with discrepancies
 * Tag your WEMI levels: Follow the yellow brick road…To Where?

Lessons & questions: Data

 * Data connections are like yellow brick road
 * “Selections” in music uniform title is often a catch-all: a bucket that doesn’t always map to LC heading.
 * Goal: Find points of connections
 * Somewhere over the rainbow…

BAMWOW into production: models

 * Everything seen here is in prototype and the next step is to bring it in production
 * Is there ever a need for a work info button for the main entry or is inline integration preferable?
 * If in-line is preferred, should we commit to sorting fields into a WEMI-like order?
 * What will happen when expanding to non-music works (such as Wizard of Oz uniform title example)

BAMWOW into production: Data Quality

 * Catalogs are built in ways to disallow connections
 * Wikidata has qualifiers on many statements that we may want to take advantage of
 * Trying to figure out how to drop questionable statements if there are constant violations

More possibilities and questions:

 * Adding Identifiers in MARC: Cornell catalog
 * FOLIO working group for entities
 * Would thematic catalogs provide additional context, such as musical incipits

Questions

 * Is it accessing Wikidata dynamically?
 * Yes, it is accessing Wikidata dynamically. No Wikidata is stored in Cornell’s catalog; when page is rendered, there’s a call to Wikidata and it’s brought into the page
 * Have you determined the core set of statements for sheet music, in this case, Vivaldi? Every work item should/must have? Have you considered when you have the physical sheet in the collection to add a statement to highlight Cornell library’s collection? Or is this not a focus of the project?
 * We’re not focusing on published manifestations, but rather the intellectual works themselves
 * Have you run up against inconsistently used qualifiers or properties in Wikidata? Did that create challenges for querying?
 * Yes! Some of the inconsistencies we ‘correct’ and others we just leave and add our own data. Often the inconsistencies are misinterpretations of the properties constraints
 * Regarding the catalog numbers, the most inconsistent thing is which prefix is used. Our group is trying to use a language-agnostic form–take the number that is in the book (in most cases)
 * Is the question of useful information solely based on the question of searching/identifying for the work? I find all of the information useful, though not always necessary for searching
 * Regarding Wikidata properties that were chosen, we basically had a group discussion in one of our LDWG meetings: “Which Wikidata properties would be cool to add to a MARC-based catalog?” These choices were based on properties we were using in our own project
 * Don’t intend to be any sort of authority on topic, but are what they came up with
 * Can you show again how your catalog displayed the Wikidata info retrieved by the query?
 * How was the info button inserted in your catalog? Is it on the discovery layer only?
 * It’s client-side code in the prototype, so yes. It checks for information in the Solr index and then matches with the included works list and places it appropriately
 * Did the question arise of what people want a library catalog to do? Some of these examples suggest to me that a kind of mini-Wikipedia page would be more useful. (And is my question constrained by the relatively primitive nature of current library catalogs?)
 * A lot of what we focused on is identifying/disambiguation use cases, but there are some properties that lean more toward a broader context for the work
 * Some of these attributes are also in MARC authority records. Are you just ignoring them if they are there? What if they are there and not in Wikidata?
 * Prototype just displays everything right now, but we considered similar questions fr Discogs. In that case, the catalog actually checks for equivalent fields that are already populated, and only shows supplemental Discogs info when those fields are not populated
 * There’s also Syndetics info in the Cornell catalog, it’s interesting how much smarter we can be with the Discogs vs. that
 * Would love to know how you got Discogs in there!
 * Documentation!!

Call details

 * Date: 2021-02-23
 * Topic: Adding Bibliographic Data to Wikidata
 * Presenters: Jason Evans, Wikimedian in Residence at the National Library of Wales

Presentation Materials

 * Library data as linked open data (Article)
 * Slides
 * Some queries used for visualisations -
 * https://w.wiki/4u6 - subject by genre (Peniarth MSS collection)
 * https://w.wiki/4u7 - scribes connections (Peniarth MSS collection)
 *  http://tinyurl.com/y74vkfuw - University of Wales press books on Wikidata
 * https://w.wiki/FbG - Publisher works count

Questions

 * Saving time in MARC to Wikidata workflows
 * People want a programmatic way to do this, but creating mapping for authors without unique identifiers or works can be difficult. Make the initial cataloging as clean as possible (example: adding ISNI identifiers)
 * Is modelling the manuscript extensively (slide) labor intensive?
 * Were able to semi-automatically take out names and match them to people, many already in Wikidata. Fairly labor intensive but there may be ways to automate much of the work to a good degree of accuracy. Would be tricky for a giant collection of books.
 * Any plans to apply process to materials beyond books?
 * Always trying new datasets, discussing musical scores. Would love to do sound recordings or video, but you can’t share the actual recordings (likely to have copyright issues) which takes away some benefits of sharing data.
 * Any advice on thesis and subject headings in Wikidata?
 * University of Edinburgh has shared a thesis collection (Ewan McAndrew)
 * Adam: Looking at converting EDTs into Wikidata, both proposing new subjects to Library of Congress, and also creating Wikidata items (often already items, but creating when needed). Trying to figure if LCSH headings can be mapped, especially free floating subdivisions (use main part, use entire field). If Wikidata URIs can go in MARC fields, that would document exact Wikidata item).
 * Library of Congress has done mapping work, some have links some don’t. So many subject headings don’t have items that could be created.
 * National Library of Wales has volunteers tagging photographs, would be cleaner and easier with identifiers.
 * When do you switch to Wikibase for things you can’t describe with Wikidata? Is this all scalable? Think about what you’re trying to achieve, are Wikibase or Wikidata the best option?
 * Advice for mapping books not in English?
 * Tried to make sure the language was there, and labels were correct (English versus Welsh)
 * MARC to Wikidata mappings?
 * Will do a lot of heavy lifting when it’s done, people are working on it. Universal mapping would be very useful, take care of basic stuff.
 * Any pushback from Wikidata folks?
 * No pushback, asked in several areas if it would be acceptable to add that much information. No pushback or complaints, but uploads will get bigger and bigger and changes may be needed. One of the reasons this was done was to advocate for structured data generally and Wikidata can be used to take a sample and show how what can be possible for the future at a larger scale. That doesn’t mean everything in Wikidata, but it’s a fantastic showcase.
 * Are some of the visualizations online?
 * Shared slides in the agenda, a couple may be on commons
 * Some charts could be interpreted as music
 * Did a hackathon where someone turned data into music
 * Has anyone considered putting preferred terms (over problematic subject headings) in Wikidata?
 * Not something Jason has had to deal with on Wikidata
 * Jim: would be interested in collaborating on how to open up these preferred labels. It seems the lists for preferred labels are closed or internally managed currently.

Working on Alison Turnbull:

She studied from 1975-1977 at the Academia Arjona in Madrid, from 1977-1978 at the West Surrey College of Art and Design, and from 1978-1981 at the Bath Academy of Art in Corsham.