Template talk:ISO 15924 script codes and related Unicode data/Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Bad move

I object to the move from template space into article space. Maintenance is easier in template space. Also, some quirks are introduced. I asked Eldizzino. -DePiep (talk) 08:34, 14 July 2015 (UTC)

Very bad move which has caused havoc with pages that transcluded the template. I have moved it back to template namespace. BabelStone (talk) 22:34, 24 July 2015 (UTC)

Subpages are lost by now (/doc, 5-column tablerow, ....). -DePiep (talk) 00:10, 25 July 2015 (UTC)

Adlam

There should be a better way rather than reverting changes to the statement "Not in Unicode". --Shervinafshar (talk) 20:17, 20 May 2015 (UTC)

I already wrote about this on your talkpage. I maintain (as U:Babelstone does, IMO well familiar with Unicode) that we should only publish defined not proposed Unicode versions. All these "proposed/projected/to_come/in_the_pipeline" characters do not describe reality (unless in a subsection like Unicode#Future). -DePiep (talk) 20:33, 20 May 2015 (UTC)

Reordering ISO columns

Split off prom previous ACCESS-section. Treat as separate topic. -DePiep (talk) 21:01, 21 June 2015 (UTC)

thx. (As an unrelated sidenote, wouldn't it be better & nicer to open a row (lefthand) with the readible ISO name, followed by code and number? Might be even bold/rowheader...). -DePiep (talk) 07:31, 18 June 2015 (UTC)

You're welcome; that said, I'll wait until there's some more response and consensus on the general discussion before I push to promote the sandbox here (and I agree with you on the sidenote; however, the first five columns in each row of the table actually are created by a template {{ISO 15924 script codes and Unicode/5 cells by ISO code}}, so that template would need to be changed. I tried creating a sandbox for that one but it doesn't seem to work. And of course, any table using that template would have to have the column headings rearranged, which suggests replacing the manual headings with their own heading template.) Thisisnotatest (talk) 08:18, 20 June 2015 (UTC)

Yes that's needed. Can do that. But I don't want to interfere with the major sandbox topic now open, so I won't make this edit now. Unless you say it's OK and won't confuse. -DePiep (talk) 21:01, 21 June 2015 (UTC)

Time for a proposal. -DePiep (talk) 20:12, 25 July 2015 (UTC)

Proposal

I propose to change the column order into: 1. Name 2. code 3. ISO number 4+. unchanged. IT shold read: "code assigned to a named script". Technically I see no issues. -DePiep (talk) 20:12, 25 July 2015 (UTC)

Proposal: Remove columns Version and Chars

Proposal withdrawn, but discussion kept for future development -DePiep (talk) 16:57, 3 February 2021 (UTC)

I propose to remove the columns "Version" and "Characters" from this table.

They are Unicode details: 'Version' = when introduced into Unicode, and 'Characters' = # of chars in Unicode.

Since the table aims to give an overview of ISO 15924 Scripts and their relation to Unicode (and their enwiki link), the U-version is not of interest. This overview is not about script's history. While "# Characters" is not historical I agree, I do not see what this adds to the ISO-relation nor to the table information goal.

-DePiep (talk) 22:15, 20 January 2021 (UTC) @Drmccreedy and BabelStone: pinged.

I mildly oppose the removal of these columns. The name of the template after all is "ISO 15924 script codes and related Unicode data". I'm not sure if there's another place I can clearly see when each script as adopted into Unicode or how many characters it garners. That being said, maybe a compromise is to add the version info to Template:Unicode blocks before deleting these two columns from this template. DRMcCreedy (talk) 02:43, 22 January 2021 (UTC)

{{Unicode blocks}} doesn't work, since blocks don't necessarily cover a single script - often mixing common in with others - and most scripts are covered by more than a single block. If {{ISO 15924/overview}} doesn't work for purpose, then the solution might be a new ISO 15924 template. Another option might be to enable column hiding using a parameter in this template - and quite frankly, that might be a good long-term alternative for the plethora of ISO 15924 templates in general. Van Isaac_WS^cont 05:41, 22 January 2021 (UTC)

Now you've made me doubt ;-)

Original thought: the table is about the connection ISO-id and Unicode-id (+enwiki article link). Some script properties should be in all right (say, script-defining in ISO and Unicode). But essentially, no single-sided secondary facts in here (a 'side' being ISO or Unicode).

However, since you have other perceptions, maybe we should think first about the purpose of this overview in mainspace (=ISO 15924, Script (Unicode), Unicode character property). Maybe we need a second, more dedicated table (e.g., for Script (Unicode) with its 159/200 script only, and more U+ data?).

More extended data (like: 'this script is in Unicode blocks: ...') could be added in {{Infobox writing system}}. This infobox is underused btw (79 articles only?).

I will keep chewing on this.

A technical overview is {{ISO 15924/overview-templates}}. I am working in that area. -DePiep (talk) 18:39, 22 January 2021 (UTC)

Personally, I much preferred the simple table of Unicode scripts we had before DePiep started meddling with it, turning it into the over-complicated hybrid table of ISO 15924 codes and corresponding Unicode data that we now have. Now DePiep wants to remove the important Unicode data from the table, completing its ugly metamorphosis from a list of Unicode scripts to a list of ISO 15924 codes. My preference is to have two separate tables (templates if you will, but I see no great advantages of templatizing data that is only used on a single page), one a list of Unicode scripts like we originally had on the Script (Unicode) page, and one a list of ISO 15924 codes. BabelStone (talk) 12:13, 23 January 2021 (UTC)

Yeah, "meddling" is a niceway to start a conversation. -DePiep (talk) 12:15, 23 January 2021 (UTC)

Why can't you just state your preferences & thoughts without making personal jabs? Makes it easier for others to incorporate your ideas. -DePiep (talk) 12:17, 23 January 2021 (UTC)

I withdraw this proposal. But I will use this Talk for further development, in a different direction :-) -DePiep (talk) 16:57, 3 February 2021 (UTC)

Geok issue

Geok = Khutsuri (Asomtavruli and Nuskhuri). More to follow. -DePiep (talk) 21:27, 16 June 2014 (UTC)

ISO 15924 is published on the Unicode site. Unicode adds a "Property Value Alias" (PVA) to script codes, for scripts in Unicode. The PVA is usually a short name for the script (see the template list for differences).

ISO 15924 is published at ISO 15924 Code Lists

1, ISO: There is a link "Table 5. Alphabetical list of four-letter script names (normative plain-text data file)" (filename: iso15924.txt.zip; datafile unzipped is named iso15924-utf8-20131012.txt)

2, PVA: And "The Property Value Alias is defined as part of the Unicode Standard".

The ISO file contains these rows:

Geor;240;Georgian (Mkhedruli);géorgien (mkhédrouli);Georgian;2004-05-29
Geok;241;Khutsuri (Asomtavruli and Nuskhuri);khoutsouri (assomtavrouli et nouskhouri);Georgian;2012-10-16

The pre-last data position is the "PVA" value, being "Georgian" for both.

The PVA file says:

# Script (sc)
sc ; Ethi                             ; Ethiopic
sc ; Geor                             ; Georgian
sc ; Glag                             ; Glagolitic

(so, no Geok script data present)
This appears to be a contradiction. For now, I have added "Geok" (PVA: "Georgian" too) to the (PVA/ISO 15924) Alias list, and so it shows in this template table. -DePiep (talk) 22:09, 16 June 2014 (UTC).

(in reverse, the ISO file is not updated for new PVAs (e.g., Bass has no PVA in there). However, this does not contradict.

I don't get why the normative, defining file is not updated, while its definitions are used in a published version.) -DePiep (talk) 08:36, 17 June 2014 (UTC)

I don't see a problem here. Just because the ISO 15924 Registration Authority is hosted on the Unicode site does not imply that the Unicode Consortium is responsible for ISO 15924 (the actual ISO 15924 standard is not "published on the Unicode site" but on the ISO site), or that there is necessarily a one-to-one relationship between ISO 15924 script codes and Unicode script property value aliases. ISO 15924 recognises two varieties of Georgian script, Geor and Geok, but the Unicode standard only recognises a single Georgian script with PVA=Geor; therefore there is no "Geok" script code in Unicode, and the Unicode alias column of the template should be left blank. The situation is analogous to that of Latin, Gaelic and Fraktur: ISO 15924 defines Latn, Latg and Latf codes, but Unicode only defines a single Latin script (PVA=Latn). There is no contradiction. BabelStone (talk) 11:13, 17 June 2014 (UTC)

PVA is defined by Unicode, not by ISO 15924. Even in the ISO file. So this way Unicode defines and publishes two different definition lists. That is a contradiction by Unicode. End of story. Whichever Unicode definition list one chooses, it introduces an error. More so in automated applications. -DePiep (talk) 11:56, 17 June 2014 (UTC)

Accessibility of table

Including the title of a table as a heading row is not appropriate, as it is not part of the tabular data. Adding a grouping row make the table a complex table. Wikipedia does not provide WCAG 2.0 accessible markup for complex tables, therefore the table needs to be converted to a simple table, which I did.

I think the accessibility of the table would be further improved if the ISO code and ISO name columns were to be swapped, since the name is more meaningful as a row descriptor. However, if the name cannot be assumed to be unique, then perhaps the code is the most appropriate row descriptor. Since this is potentially controversial I did not include this in my accessibility cleanup.

Also, as a side effect, the VTE links (navbar template) seem to float above the template. I didn't put it into the caption because semantically, the template edit links do not describe the table.

Thisisnotatest (talk) 21:55, 6 June 2015 (UTC)

I reverted. Maybe in minor points you are right, but the general move is wrong.

1. w3c you link to says about that 'every cell must have a row & column header'. That was served, nothing wrong. I note that the T-heading structure exactly and clearly reflects the two offices involved.

2. 'Title should be external'. - We can make that. Use the sandbox.

3. Concluding to 'convert to a simple table' is bad, exactly because we need two top headers.

4. Swapping ISO name and ISO code: no opinion. Yes this is up for discussion/improvement, but this does not matter to the bad edit I reverted :-).

5. v-t-e links must & will always follow any table outcome. No issue.

6. The editsummary with my reversal should read '... ill-concluded'.

-DePiep (talk) 22:21, 6 June 2015 (UTC)

DePiep, thank you for taking this seriously. I disagree on whether the point is minor and whether the table is compliant, so I am adding an {{Accessibility Dispute}} template to this template (point 1 of discussion). I am linking to my accessibility edit of this template for reference. I will also post a link to this discussion at the Village Pump.

1. That every cell must have a row and column header is necessary but not sufficient. The W3C page I linked to able also requires that each header cell in multilayer headings have an id attribute and each data cell have a header attribute. Actually, it appears that there is an alternate way to make complex tables accessible, that of the colgroup. However, Wikipedia doesn't seem to support the colgroup tag.

2. Thank you. I'll do that.

3. I'm not clear why we need two top headers as opposed to distinguishing the ISO headings by adding "ISO" as I did in the reverted table.

4. Awaiting comment from others.

5. No disagreement. — Preceding unsigned comment added by Thisisnotatest (talk • contribs) 23:29, 6 June 2015 (UTC)

About #3: Below the title, the main essential and core point of this table is that column values are defined by ISO or defined by Unicode. So that is what the table must show. IMO W3C exactly allows or wants this. The rest is mice meat for now. -DePiep (talk) 23:40, 6 June 2015 (UTC)

I've made a sandbox version of {{ISO_15924_script_codes_and_Unicode/sandbox}} dealing with a couple pieces of mouse meat. I agree that W3C allows for complex tables. Where we disagree is whether Wikipedia's ability to code complex tables meet the requirements of WCAG 2.0. IMO, if it does not, then it needs to be replaced with a simple table, that is, without column groupings. Anyway, I'll go post at the Village Pump so we can get others to weigh in. Thisisnotatest (talk) 23:50, 6 June 2015 (UTC)

Sandbox looks good. (I require the T-lining to group columns L-R). v-t-e box will end up OK. Unless W3C or WP:ACCESS proves 'unacceptable' (so far, you did not), I'll agree. (Will not / can not respond fast from now). -DePiep (talk) 00:31, 7 June 2015 (UTC)

See the sandbox, I edited. Simply: this is how wiki/w3c does show a table. -DePiep (talk) 00:40, 7 June 2015 (UTC)

Restored accessibility dispute template to sandbox and awaiting input from others. The sandbox no longer reflects my intent as mentioned in my previous comment. I know that a sandbox is a sandbox but now it is once again structurally the same as the original template and now pointless as a display of our differences. (Although it might also not reflect your intent either. Your comment on your last revision reads "how wiki-w3c *does* a title (namely, by |+))" but your sandbox revision itself does not reflect the |+code. I'm guessing this is an accidental omission; I'm still unsure what your remedy was as reflected in your revision comment. And even if the code reflected the change your comment intended, I still dispute the accessibility of the table in the sandbox, your and my versions both.) Thisisnotatest (talk) 01:15, 7 June 2015 (UTC)

I don't get the impression you are interested in w3c/access improvement at all. Happy fly catching. -DePiep (talk) 01:54, 7 June 2015 (UTC)

I think we're miscommunicating. Who was lol'ing about the sandbox in their edit comment? It's hard to tell tone in print, and I, rightly or wrongly, took it as a lack of seriousness about the issue. I suppose it's refreshing to be disagreeing over whether something is accessible rather than over whether it should be accessible, but I'm not feeling particularly refreshed right now.

It is true that W3C accessibility allows for complex tables. It's just that implementing complex tables requires additional tagging (id attributes on th tags, header attributes on td tags) that is not easy to do or keep maintained and may or may not be supported by Wikicode. Therefore it is better to use a simple table. Anyway, only two of us have commented on this issue so far. It would be helpful to have others whose views are on various sides of the argument. Thisisnotatest (talk) 07:03, 7 June 2015 (UTC)

What tagging do you mean? A colspan makes an complete and correct header. -DePiep (talk) 10:11, 7 June 2015 (UTC)

This has already gotten too back-and-forth to follow. I liked the numbered points. Can we start again with those, based on the current sandbox? What issues remain? — SMcCandlish ☺ ☏ ¢ ≽^ʌⱷ҅_ᴥⱷ^ʌ≼ 15:56, 14 June 2015 (UTC)

No, the current sandbox [1] is unacceptable. A pity you could not follow this thread, but IMO the line is: current live version is OK, objections are making the table worse. And most of all: those objection sources are not clear or even present. In this thread I've written why the current version is better. -DePiep (talk) 20:50, 14 June 2015 (UTC)

@SMcCandlish, at this point my position is the numbered points stand as I last numbered them, plus that the sandboxed version would be accessible. I believe DePiep's position is the numbered points are where he last numbered them, plus a negative opinion on the sandboxed version.

@DePiep: Please explain how the sandbox version is making the situation worse, aside from aesthetics? Is the Direction column mislabled? That is, would it more correctly read "Unicode direction"? (If "Unicode direction" is not more correct, then the original table is wrong by applying the Unicode grouping header to the direction column.) Thisisnotatest (talk) 04:45, 15 June 2015 (UTC)

Because it has removed those columnheaders that are over multiple columns. Some columns are ISO, some are Unicode. That is what columns are about, that is what we want to convey. (I still have not gotten why accessability would prohibit such colspans). -DePiep (talk) 20:26, 17 June 2015 (UTC)

I've now added the ISO or Unicode to every column so that the sandbox version is now conveying the same info as the current version. Accessibility does not prohibit colspans, but WCAG 2.0 AA requires that they be accompanied by id attributes in the headers and header attributes in the data cells per WCAG 2.0 accessible markup for complex tables. The larger issue is that Wikipedia doesn't support such tables, and if it did it would have to do so in a way it was reasonable to expect editors to follow. I posted the larger discussion at the WikiProject Accessibility talk page Thisisnotatest (talk) 05:58, 18 June 2015 (UTC)

Zzzz count

@BabelStone: asked "Why exclude the 66 noncharacters from Zzzz? It is clear from http://unicode.org/Public/UNIDATA/Scripts.txt that noncharacters are also Zzzz". The answer is that I excluded them for the v9.0 update of this page bgraphic ecause they were excluded in the v8.0 update of this page. I'm not sure why we've been excluding them but the Scripts.txt verbiage makes me agree that we shouldn't: "All code points not explicitly listed for Script have the value Unknown (Zzzz). @missing: 0000..10FFFF; Unknown". I'm happy to include the noncharacters in Zzzz. DRMcCreedy (talk) 15:01, 23 June 2016 (UTC)

This is about unassigned code points, right? Reading ISO 15924, it quotes Script is defined as "set of graphic characters used for the written form of one or more languages". To me that reads: no character, no script. And this should take priority over the "All code points not explicitly listed for Script ... quoted above, correctly, from Scripts.txt. Sure Unicode can not include a "character" into a script that is not a character (by thier own definition). And Zzzz is a ~~regular~~ script (say, a list of graphic characters), the only issue being that Unicode(! not ISO 15924) has not encoded them into an Unicode-covered script. -DePiep (talk) 07:13, 24 June 2016 (UTC)

I cannot source this right now, but I am quite convinced that Unicode adheres to the ISO 15294 definition of script. Unicode does not re-define "script", there is no definition "Unicode scripts" (so bad WP article title. I advocate naming it "Scripts in Unicode"). Ans ISO-scripts are defined to consist of graphic characters.

Then, Unicode normatively defines this quality of a any code point in General Category. So by their own rules, Unicode should exclude these other code points from any script (like Zzzz). That's mainly Control characters, formatting characters. Some individual border issues might exist (SHY is not a graphic character ...).

The blanket script text in script.txt, is not normative, and should be corrected. -DePiep (talk) 08:09, 24 June 2016 (UTC)

ISO 15924 does not define what characters belong to any given script; only the Unicode Standard does that, and as the character statistics in the Wikipedia table are derived from data in the Unicode Standard, I think we should be consistent and use the implicit count for Zzzz given in the Unicode Standard. BabelStone (talk) 17:59, 24 June 2016 (UTC)

Not disputed. Still, ISO requires that they are (readible) characters, not formatting or control etc stuff. So Unicode can not add non-characters to a script (any script) that by ISO definition can only have graphic characters. -DePiep (talk) 00:24, 25 June 2016 (UTC)

Reordering of the table

I have performed an info-reordering of this table. -DePiep (talk) 20:36, 3 February 2021 (UTC)

Changes

- Reduce distinction "ISO" and "Unicode". Removed the bold grey vertical border. Some properties are script generic (directionality, Anc/Hist).

- By parameter: |version=, |chars=, |note=; can be entered by parameter, not by || construct.

- New parameter: |unicode-status=, distinct from |note=. Will report only when |alias=<blank>.

Report "Not in Unicode": now in same columns as "Alias, Version, Chars". So: either "In Unicode" or |unicode-status=.

- |note= kept for general note (independent of status). btw, appears that Anc/Hist status is provided by Unicode.

- ISO codes like Axxx are anchored. So one can link to them, from inside & outside. Like § Hani.

Can do (content): + Use internal links like § Hani more often.; + Clarify mixed & merged scripts like "Georgian", and 'See ...' notes.; + Rename the template, see § Redesign thoughts; + ref's can be added right into this page (|note=, |unicode-status=).

-DePiep (talk) 21:16, 3 February 2021 (UTC)

Done: Removed column "ISO ID number" from the table. For article ISO 15924, we will re-add but for now it is not needed. (+hoping to gain some friends here) -DePiep (talk) 21:24, 3 February 2021 (UTC)

Redesign thoughts

1. Aim of this template: This template was created in 2010 [2]. Originally, its core was to connect ISO-ID Xxxx with Unicode-ID Alias. As Unicode does, and as the current name "ISO 15924 script codes and related Unicode data" says. However. Since 2010 new developments have arrived at this encyclopedia (like wikidata), and likely this "ISO-definition first"-approach is not good enough for an encyclopedia.

2. Checks & Improvements possible: Meanwhile, info-consistency has made me to research & check & align four ID's wrt their definitions and their data: ISO_15924, Wikidata, Unicode, enwiki. That is: "How is this 'script' present in Unicode?, in ISO?, ... ". Immediate next question: and how are properties represented, like "R-to-L"? Then there is RL too, think 'scripts' conceived as such but not in scope of these four; could be in some lang-wiki? For those check, esp. cross-ID chcecks, see {{ISO 15924/overview-4id}} (& {{data sets}}).

3. Refine target and rename: My idea is to move this Template (table, list, overview) into a more generic "List of scripts". Then, when used in Unicode-related pages, a subset can be presented (say, only those In Unicode), similar for ISO 15924. -DePiep (talk) 18:56, 3 February 2021 (UTC)

4. TL;DR: Can you agree with the wider concept of this template being "List of scripts", with local adjustments (say, table has all script-data and is fine-tuned when in article Unicode). -DePiep (talk) 18:56, 3 February 2021 (UTC)

Rename ideas

Frivolous suggestions for now please, to reflect design ideas. -DePiep (talk) 18:56, 3 February 2021 (UTC)

~~Template:Script list~~ says it well, for me. -DePiep (talk) 19:00, 3 February 2021 (UTC)
Template:List of scripts, better (and no name errors!). -DePiep (talk) 21:42, 3 February 2021 (UTC)

Nomination for deletion of Template:ISO 15924/footer

Template:ISO 15924/footer has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page.