User:Sarkasten/sandbox

Intro to Lexicographical data on Wikidata

 * Relevant to abstract Wikipedia project (based on corpus of lexemes)
 * Working with lexicographical data
 * Wikidata items and linguistic information
 * Termboxes include labels, descriptions, aliases
 * Depending on linguistic practice pronouns can be different
 * Differentiating everything would require a lot of statements and become unwieldy
 * Termbox itself won’t cut it
 * Naming schemes not always consistent across languages and this can make statements unwieldy
 * Demonyms: can apply to masculine/feminine subjects depending on language
 * Need a separately constructed namespace for linguistic information that can accommodate different language paradigms. This is where Lexemes come in.
 * What are lexemes
 * “I made myself house my house in the House’s house.”
 * How many tokens are there?
 * 10 (each component separated by spaces)
 * How many words are there?
 * 8 (I, made, myself, house, my, in, the, House’s)
 * How many lexemes are there?
 * 7 (I, make, house (verb), house (common noun), in, the, House (proper noun)
 * For lexemes we consider a word form and forms related to it, primarily through inflection, also grammatical function, and specific meanings they take on
 * I, myself, my are same lexeme (all first person singular concept)
 * house (verb) and house (common noun) and House (proper noun) separate
 * Lexemes can show differences in means (i.e. senses, like different definitions in a dictionary)
 * Lexemes have Lxxx numbers (example: L512)
 * Lexemes can have descriptions in other languages to clarify meaning
 * Lexemes can also show differences in inflection (forms)
 * If the meaning changes with inflection usually worth having a different lexeme (house versus houze) (L23571)
 * Each use changes the form of the word, therefore needs a separate form on the lexeme
 * No minimum requirement for number of forms on a lexeme, it will depend on the needs of the language
 * Differences in written forms can also be on lexemes (languages written in multiple scripts, English vs. British spelling, etc.)
 * Differences elsewhere: We can also add etymologies, usage examples, etc. to lexemes
 * We can make direct connections between individual senses, add images, etc.
 * Example: English adjective blue, L3269
 * Current languages landscape
 * Where can we find Lexemes?
 * Ordia: view of individual lexemes ( https://ordia.toolforge.org )
 * Hauki: ( https://hauki.toolforge.org )
 * Query Service ( https://query.wikidata.org )
 * d:Special:AllPages/Lexeme: /s!
 * Translation networks (see materials above for links)
 * Etymology paths
 * Sense usage locations
 * Annotating sentences (for future reference)
 * Can give it a sentence and annotate it with lexemes being described by a specific word
 * This is one way to explore generating raw content from text
 * Constructing sentences by importing lexemes in the future? Project in work.
 * How might we add lexemes to wikidata?
 * Wikidata Lexeme form tool
 * Tool evolves
 * Bulk mode is available
 * Developing library to create and manipulate lexemes
 * Q&A
 * How is “ ‘s “ handled? ‘s is another form of lexeme, should probably be in there
 * How many lexemes and relationships to wikidata items exist? 62866 relationships between lexemes and wikidata items, 487239 lexemes. Projects are generally language specific, and some projects are more active in connecting lexemes to senses