Template talk:Charmap

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Before rollout[edit]

Comments on the charmap template before rollout can be left here. VanIsaacWScontribs 10:34, 15 June 2012 (UTC)[reply]

Unicode position and SGML/XML/HTML numeric character reference are redundant, even more so because you don’t use U+… notation. It’s better to leave out the latter. People who use &#…; and &#x…; should know where the number comes from anyway. Concerning the example, Cyrillic Che, I don’t know where you took the named character reference from, but it’s not in the HTML/XML/SGML list that the row header links to. Providing PUA locations also doesn’t make much sense, since font developers can allocate characters there anywhere they want, so you would at least have to provide a semi-standard body, such as MUFI. — Christoph Päper 11:51, 17 June 2012 (UTC)[reply]

I had thought about adding the "U+" to the Unicode hex field, but it's not present on the tables in letter articles. As for HTML numeric character reference, I think that this is probably a good thing to have, as users who don't really know about these things can still do a copy/paste with that field to use them. That field is found in the mapping tables for many character articles, so I can't really justify removing it either.
The example is just an example to demonstrate template features, and is not intended to be encyclopedic. It is based on the entry for the Cyrillic letter Che, but is not identical to it. The named character reference, image instead of plain-text character, upper case input, lower case input, mixed case input, and PUA mapping for only the second character are all included in order to show the full functionality of the template, namely that you can replace the plain text character header with an image, that it supports named character references, that inputs are case insensitive, and that it will properly align character mappings even when not all characters map to a particular encoding. VanIsaacWScontribs 08:50, 21 June 2012 (UTC)[reply]

I have rolled out this template at template:Charmap. This talk page has been moved and redirected to template talk:Charmap. I won't actually start converting any articles for a few days, but if anyone wants to use the template, it is live now, with the caveat that it might change slightly in functionality for a couple weeks. VanIsaacWScontribs 01:10, 22 June 2012 (UTC)[reply]

10 chars?[edit]

What's the use case for having up to 10 characters supported by this template? Aren't most uses of such tables comprised of two characters only (upper and lowercase)? --Waldir talk 15:22, 27 July 2012 (UTC)[reply]

Yes. Most use cases will have two characters, an upper and lower case. However, there are exceptions: Georgian characters have an upper case, lower case, and ecclesiastic lower case; Japanese kana articles may have as many as 7 - a hiragana, small hiragana, archaic hiragana, katakana, small katakana, archaic katakana, and half-width katakana for "e", not to mention any manyōgana. I can't say for sure that there isn't a usage case for arbitrarily large instantiations of this template, especially considering the Unicode has started looking at encoding hentaigana, so I limited it based on page width; 10 characters with short names will essentially fill the page. I'm currently working on Braille articles and templates, but one of the tasks on my list is to start converting character articles to the template. VanIsaacWScontribs 21:03, 27 July 2012 (UTC)[reply]
The problem is that this template gets quite complex when we use that many possible combinations, and 10 possibly won't even cover all possibilities, if I'm reading you correctly. How about making a family of templates, the main one having two characters only, then others for the specific cases? That would also allow parameters to have more intuitive and specific names, and therefore allow passerby editors to contribute improvements (to the articles) more easily. --Waldir talk 11:34, 28 July 2012 (UTC)[reply]
What, exactly, are you trying to do that you can't pull off? It's actually a pretty simple template, if you can wade through all the nested brackets and pipes to parse the conditionals, but it is a bit daunting when you see that wall of verticals. VanIsaacWScontribs 12:30, 28 July 2012 (UTC)[reply]

I haven't tried anything that I couldn't pull off, but that's beside the point, especially since I've been working with templates for a while now. What I meant was that for casual users (and sometimes even for template developers, as exemplified in the next section), a simpler template would be much easier to understand and use. Its internal mechanism would be more straightforward and anyone editing it would be able to load its structure into working memory with less effort, since it would be shorter. For example, I believe you can agree that this is more intuitive:

{{charmap 
| uc_unicode     = 0398
| uc_name        = Greek Capital Letter Theta 
| lc_unicode     = 3b8
| lc_name        = Greek Small Letter Theta
| lc_image       = [[File:Greek lc theta.png|10px]]
| map1           = [[ISO 8859-7]]
  | uc_map1      = c8
  | lc_map1      = E8 
| map2           = [[Code page 737|CP 737]]
  | lc_map2      = 87
  | lc_map2      = 9F 
| map3           = [[Code page 860|CP 860]], [[Code page 861|861]], [[Code page 862|862]], [[Code page 863|863]], [[Code page 865|865]]
  | uc_map3      = E9
| namedref1      = [[TeX]]
  | lc_namedref1 = \theta
}}

than this:

{{charmap 
| 0398
| name1       = Greek Capital Letter Theta 
| 3b8
| name2       = Greek Small Letter Theta
| image2      = [[File:Greek lc theta.png|10px]]
| map1        = [[ISO 8859-7]]
  | map1char1 = c8
  | map1char2 = E8 
| map2        = [[Code page 737|CP 737]]
  | map2char1 = 87
  | map2char2 = 9F 
| map3        = [[Code page 860|CP 860]], [[Code page 861|861]], [[Code page 862|862]], [[Code page 863|863]], [[Code page 865|865]]
  | map3char1 = E9 
| namedref1   = [[TeX]]
  | ref1char2 = \theta
}}

(Note that I expanded the examples above to use one parameter per line, which might look more verbose than the example currently in the documentation, but the idea here is to demonstrate clarity of the parameter names, not of the whole template itself) I could even quote your own words: "if you can wade through all the nested brackets and pipes to parse the conditionals" and "it is a bit daunting when you see that wall of verticals". In fact, I'd go as far as to suggest the examples I give above aren't simplified enoug: we could have parameters for common charmaps rather than including all the links manually. Then we'd have something like

| uc_860_1_2_3_5 = E9

instead of

| map3      = [[Code page 860|CP 860]], [[Code page 861|861]], [[Code page 862|862]], [[Code page 863|863]], [[Code page 865|865]]
| map3char1 = E9

And the links would be produced by the template itself. That makes using the template easier, and we could still have some generic |extramap1=, |extramap2=, etc for rarer maps that the template doesn't include natively. Additionally, if the links were to be changed (say, several code page articles merged into one) we would only have to fix the links in one place. --Waldir talk 09:46, 29 July 2012 (UTC)[reply]

Well, if you really think that a dedicated double-case version is worth having, we can certainly do something like that at a place like {{uclcmap}} - it can even just pass its arguments to this template. I would certainly be happy if you had any suggestions for making this version's naming scheme easier, but a lot of the world's scripts are uncased, so uc/lc only works for a small fraction of the possible usage cases. Heck, even regular Latin letters have superscript, subscript, and half-width forms that can be included. As for the "uc_860_1_2_3_5" suggestion, I don't even know how you would pull that off without bloating this guy into having thousands or even tens of thousands of conditionals. VanIsaacWScontribs 22:17, 29 July 2012 (UTC)[reply]
I assumed that upper/lowercase would be the most common variations, which would justify a dedicated, simpler template (but under another name than uclcmap, please :P). If you say that's not the case (please confirm), then I guess we can keep the current system. The documentation will need to be fleshed out to make it more readily understandable, though. I'll take a stab at that once the main functionality is stabilized. (By the way, can you list all the variations you know of —upper/lower-case, sub/super-script, full/half-width, etc.—, or is the list so long as to be impracticable?)
I'll take a stab at all the Unicode variants of the regular latin letters I can find. We have the regular capital letter/small letter, small capital letter, and dotless forms of "i" and "j"; both capital and small letters can also be a modifier letter (ie, superscript), subscript, Black letter, Gaelic, Insular, Script, combining, double struck, circled, parenthesized, full width, and in the Mathematical alphabets: bold, italic, bold italic, script, bold script, Fraktur, double struck, bold Fraktur, sans-serif bold, sans-serif italic, sans-serif bold italic, and monospace. VanIsaacWScontribs 22:56, 30 July 2012 (UTC)[reply]
I'll say I'm convinced, but one last question: do all those variants correspond to different code points in the charmaps? --Waldir talk 16:30, 31 July 2012 (UTC)[reply]
To answer your question: yes, each of those variants is a separate unicode code point(s) for at least one basic latin character. Although black letter, insular, and gaelic (although those might actually be the same as the insular variants) represent variants of very few basic latin characters, most of the other ones are nearly complete in their coverage (no modifer letter q as of yet for some reason). I wasn't trying to convince you of anything, just trying to answer your question. I personally think that an upper case / lower case version is probably a good idea - actually, now that I think about it, we could simply do it as an alternate set of parameters within this template. I may work on that later on, but I've got a bit of a busy day. VanIsaacWScontribs 18:16, 31 July 2012 (UTC)[reply]
Sorry, I didn't mean to imply we were arguing over a point. I did have a different idea of the charmaps landscape and you've shown me it was incorrect. That's what I meant about "being convinced" :)
I only suggested the uppercase/lowercase because I initially thought it would cover the vast majority of uses of this template, but since that's not the case, I'm not sure of the benefits of having separate templates for this. Dedicated parameters would probably make sense, though. --Waldir talk 20:58, 31 July 2012 (UTC)[reply]
Secondly, the suggestion to have dedicated parameters for specific maps was never intended to provide complete coverage of all possibilities, only the most common ones (like the namedref parameters we currently have). I assume the distribution of charmaps would be very uneven, with a few taking a large percentage of common uses. If that's not the case, we should probably drop the idea. Again, please confirm. (Regardless, |unicode= as an alias for the [first] unnamed parameter should IMO still be done).
The most common encodings are probably going to be the 8859 and Windows 125x families. There are 16 and 9 separate encodings, any of which can have the same or different encodings for a particular character - although the Greek and Cyrillic code pages are probably going to only have characters that are completely unique, or shared completely with the whole family - so let's say that there are 21 code pages across which a character can freely vary: that would be 2^21-1+3+3 (no worry about a character in none of them, plus the three greek and three cyrillic possibilities) or 2,097,157 combinations that we need to hard code into the template - obviously not going to happen. We might be able to get away with it by parsing 8859 = and WinCP = parameters, but we'd probably want to have a couple separate instantiations of each, seeing as some characters may have different encodings in different 8859 or Win code pages. VanIsaacWScontribs 22:56, 30 July 2012 (UTC)[reply]
Thanks for the thorough explanation. Let's leave it at that then :) --Waldir talk 16:30, 31 July 2012 (UTC)[reply]
Finally, whether we're keeping the current structure or not, the template code definitely needs to be formatted with use of more whitespace, html comments for newlines/indentation, etc. I would do it but I think you're the most qualified person to take the first stab at it. I'd gladly help polish any rough edges afterwards, more confident that I wouldn't break anything. --Waldir talk 13:57, 30 July 2012 (UTC)[reply]
But if I put comments in there and make the code human readable, then other people will think that they might be able to understand it, and they'll defile my perfect template with their dirty hands and exotic dependencies! VanIsaacWScontribs 22:56, 30 July 2012 (UTC)[reply]
Is that any better? VanIsaacWScontribs 03:00, 31 July 2012 (UTC)[reply]
That's awesome!! Much more readable. I might look at it later more carefully and do a few minor whitespace changes (I'll wash my hands before ;)), but overall it looks great as it is now. --Waldir talk 16:30, 31 July 2012 (UTC)[reply]

Named references[edit]

The HTML/XML named character references are not the only ones out there - thanks for finding the automatic template for those, Waldir. I'm wondering if we shouldn't allow for others, like TEX. VanIsaacWScontribs 21:28, 27 July 2012 (UTC)[reply]

Definitely! But in a different row of the table, right? --Waldir talk 11:30, 28 July 2012 (UTC)[reply]
 Done Absolutely. It should be pretty simple to repurpose the HTML/XML code that you superseded to pull it off. I was thinking that they should go below the alternate mappings, and I'd like to check out your new code anyway. VanIsaacWScontribs 12:40, 28 July 2012 (UTC)[reply]

I'm actually running into a problem with the HTML named ref. I changed the example to greek theta, but the HTML ref isn't showing. Did you test to see if your code worked before? I gave it the code to build the table correctly if some characters don't have a named ref, but that really shouldn't effect anything - if it worked before, it should at least still show the encoding name. VanIsaacWScontribs 14:23, 28 July 2012 (UTC)[reply]

Yeah, I've tried to fix this a couple ways. The #if: condition isn't working right with the results of the {{numcr2namecr}} template - it's sending us to the else results, when given the theta code points, even though the template returns the proper Θ and θ results when I test it. If you have any thoughts, I'd be glad to hear them, otherwise, I'm going to take it to WP:WikiProject Templates. VanIsaacWScontribs 14:49, 28 July 2012 (UTC)[reply]
I can try to look at it (not right now, I have only a few free minutes), but these days I'll be quite busy, so perhaps requesting help from others will be a good idea. By the way, this is one of the benefits that a smaller, simpler template would provide :) Such puzzling bugs would be easier to track down. --Waldir talk 09:21, 29 July 2012 (UTC)[reply]
I figured out that the conditional is unnecessary if we send it to the /named subtemplate, but it's still not working. Somehow the {{numcr2namecr}} template isn't returning its values. VanIsaacWScontribs 10:56, 29 July 2012 (UTC)[reply]

Yay!!! User:DePiep figured it out! the {{numcr2namecr}} template doesn't sanitize inputs for whitespace. We're all good now. Thanks Waldir and DePiep for your help. VanIsaacWScontribs 21:51, 29 July 2012 (UTC)[reply]

Gol dang, that template actually does sanitize for whitespace, it just wasn't doing it for our input. That's really weird. Well, it works now anyway. VanIsaacWScontribs 23:45, 29 July 2012 (UTC)[reply]

Unicode is not an encoding: it's an encoding standard[edit]

Why is Unicode listed as an encoding? Unicode is not an encoding. Unicode can be implemented by different character encodings ([1]). Sclaes (talk) 19:14, 25 November 2014 (UTC) — Preceding unsigned comment added by Sclaes (talkcontribs) 23:25, 29 October 2014 (UTC)[reply]

It would be really good to be able to call Template:Script for the character display in this template's output. For example, if you look at Tsade#Character encodings, the Samaritan, Ugaritic, Imperial Aramaic and Phoenician characters to not display, but they can be rendered for many users:

  • {{script|Samr|‏ࠀࠁࠂ‎}} renders as ࠀࠁࠂ
  • {{script|Ugar|𐎀𐎁𐎂}} renders as 𐎀𐎁𐎂
  • {{script|Armi|𐡀𐡁𐡂}} renders as 𐡀𐡁𐡂
  • {{script|Phnx|𐤀𐤁𐤂}} renders as 𐤀𐤁𐤂

It doesn't need to be automated — just being able to pass through an ISO 15924 code, like with {{script}}, would solve the issue. — OwenBlacker (Talk) 01:25, 11 December 2016 (UTC)[reply]

"ifeq"?[edit]

There's something broken! -- Polluks 17:25, 16 May 2019 (UTC)[reply]

Please link to an example of a broken instance of this template. – Jonesey95 (talk) 19:30, 16 May 2019 (UTC)[reply]

Adding a feature to show different styles[edit]

 – * Pppery * it has begun... 03:15, 19 October 2021 (UTC)[reply]

@GKFX: I am working on project to rebuild the templates that present information on category:Indic letters. I'd like to piggyback off of {{charmap}}, but I have an issue that also presents an opportunity for the development of this template. Basically, with the Brahmi script and other early writing systems, Unicode encodings end up encompassing a superset of all the different eras and styles of an early writing system, and it can become necessary to show example glyphs from multiple styles to truly show a character and its encoding. This also presents an opportunity to extend the idea to modern scripts, where there are very often several quite distinctive writing styles that are legible to readers of a given language - e.g. modern Cyrillic has several letters with distinct cursive forms in different languages, or Fraktur and "Old English" styles of the Latin script are legible to modern German and English readers. What I'd like to do is introduce parameters to elicit a sub-table in the "Preview" cell in charmap that would allow for up to five different styles of each letter to be introduced.

Parameters and how they would be implemented in a cell of the subtable could be:

Parameter Implementation Extent
styleNlabel for N = 1, 2, 3, 4, 5 ''{{{styleNlabel|}}}'' for characters 1-10
styleN for N = 1-5 <span style="{{{styleN|}}}">&#xnnnn;</span> for characters 1-10
styleX-Nlabel for X = 1-10 and N = 1-5 Same as styleNlabel for character X only
styleX-N for X = 1-10 and N = 1-5 Same as styleN for character X only
imageX-N for X = 1-10 and N = 1-5 [[File:{{{imageX-N}}}]] for character X, style N

Example[edit]

A hardcoded example, showing just the top few rows. When embedding tables, the inside table has to use the <table>...</table>, <tr>...</tr>, and <td>...</td> tags directly, instead of the standard wikimarkup for tables, so if you look at the wikicode, you can see how that's done.

Character information
Preview
Ashoka Gujarat Gupta
Ashoka Kushan Gupta
𑀓
Unicode name brahmi letter gha brahmi letter ka
Encodings decimal hex decimal hex
Unicode 69654 U+11016 69651 U+11013

{{charmap
| 11016
| 11013
| name1 = Brahmi letter Gha
| name2 = Brahmi letter Ka
| style1label = Ashoka
| style1-2label = Gujarat
| style2label = Kushan
| style3label = Gupta
| style1 = font-family:Noto Sans Brahmi; font-style:italic; font-size:24pt;
| image1-1 = [[File:Brah gh.svg]20px]
| image1-2 = [[File:Gupta gujarat gh.svg]20px]
| image1-3 = [[File:Gupta allahabad gh.svg]20px]
| image2-2 = [[File:Gupta ashoka k.svg]20px]
| image2-3 = [[File:Gupta allahabad k.svg]20px]
}}

Unfortunately, I am a template programmer, but I don't know lua at all. I've looked over the code for this module, but can't see how to implement something like this, so I will need help from you or somebody who knows how to program modules to get this off the ground. Much thanks for any help, VanIsaac, MPLL contWpWS 06:43, 1 October 2021 (UTC)[reply]

@Vanisaac OK, I have been on a bit of a Wikipedia break but I will try to make time for this this weekend or sooner. (It’s worth pointing out that, in general, you don’t have to specify hard limits for numbers in parameter names in Lua modules, as you can just loop over as many parameters as get provided.)@Vanisaac User:GKFXtalk 07:15, 1 October 2021 (UTC)[reply]
Oh, nice to know about the lack of limits on numbered parameters. As you can tell, I am very keyed in to template syntax limitations and logistics. And no worries about the timeline on this. There is an acceptable median state while I am trying to transition everything over to the new implementation of this information, and I just kind of realized a couple days ago that my old friend {{charmap}} was probably a really good solution, but it will probably be another month or so before I fully roll out the whole new setup. I had steeled myself for a hefty bunch of template programming when I come to find out the old cludge I made eight years ago had gotten a nice new makeover as a module. VanIsaac, MPLL contWpWS 08:05, 1 October 2021 (UTC)[reply]
With the testcases I've set up, I'm noticing some mechanisms that are looking nice, but also an issue with prioritization that I would hope to get changed. If I'm reading the code correctly, it looks like you are conditioning the stylized output on the presence of a matching style label, which seems like a really user-friendly control mechanism to me. I may have further feedback as I try different testcases, but it's looking good for now. However, when it comes to choosing what gets displayed, I feel like some adjustment should happen to ensure the specific overrides the general. So imageX-N, where the user has specified a particular image for a character in a style, should be the first choice for being displayed. Next would be the character with styling specified for that character with styleX-N. Ideally the character stylized with styleN would be next, followed by imageX, with unstyled plain text as the last resort. But if it's easier to program, it's probably fine for styleN to be lumped in with the plain text as the last option. I hope this doesn't throw too much of a monkey wrench into the module logic structure, but let me know if there are good reasons why it is better to go with something else. It's actually looking really good, and thank you for all of your hard work! VanIsaac, MPLL contWpWS 21:21, 1 October 2021 (UTC)[reply]
Yes, it's all conditional on the StyleNlabel or StyleX-Nlabel. I personally think that the two CSS attributes should always apply if provided, as it can always be helpful to be able to just add some CSS, even potentially to an image, and there are enough parameters provided to control every character individually. On the subject of images, the priority order is currently ImageX-N, ImageX, styled character. I'm wondering if there is any point in allowing ImageX to ever appear when splitting into multiple styles, as it makes far more sense to use ImageX-N to actually put the image in the right place. User:GKFXtalk 13:44, 2 October 2021 (UTC)[reply]
I think there are possible usage cases for imageX to appear several times, with a few exceptions, but it might best to pop up an error message if you have both imageX and imageX-N with the same X due to the likelihood that there's something amiss. In terms of the behavior, right now my testcase has |image2-3=[[File:Greek lc theta var.svg|15px]] and |image2=[[File:Greek lc theta.png|10px]], but File:Greek lc theta.png shows up in all three styles, so that's what I was hoping to get fixed. VanIsaac, MPLL contWpWS 07:38, 3 October 2021 (UTC)[reply]
  • @GKFX Okay, now that I've figured out that I had mismatching capitalization in parameter names, the new module seems to be working exactly consistently and predictably as far as I can see from my tests. I'd be willing to run run this as the main module now, if that's alright with you. VanIsaac, MPLL contWpWS 03:38, 7 October 2021 (UTC)[reply]
    OK, I've copied the sandbox version across into the main module. Seems to have worked OK. User:GKFXtalk 16:04, 9 October 2021 (UTC)[reply]
If you want to see what I'm doing with it, check out the bottom of Ka (Indic). VanIsaac, MPLL contWpWS 22:02, 9 October 2021 (UTC)[reply]

Unnamed Sequences[edit]

I'm trying to use Charmap to tabulate codes for user-perceived characters. Several are not named sequences, so I would like the descriptive names I supply to appear as I supply them in |name1= etc, for contrast with official names, for which uppercasing makes sense. {{charmap}} is upper casing them. How do I stop it uppercasing them? I've looked at the module code, but can't see where it is uppercasing them. I would favour having two levels of inhibition - inhibit all and inhibit a particular character. --RichardW57m (talk) 15:14, 29 October 2021 (UTC)[reply]

  • How are you doing named character sequences? It wasn't originally designed for that purpose, so I never documented it when it was originally built as a template. Now that it is a lua module, I have no idea how to parse an extended functionality like that, but I'd like to see that added to the documentation. VanIsaac, MPLL contWpWS 20:03, 29 October 2021 (UTC)[reply]
@Vanisaac: The documentation says, "Currently, each field (including the Unicode code point) can contain up to four hexadecimal numbers to be converted to decimal, separated by spaces". There are several examples in Ā (Indic).--RichardW57 (talk) 15:37, 30 October 2021 (UTC)[reply]
@RichardW57: The uppercasing is done by names[1 + #names] = frame:callParserFunction('uc', args['name' .. i]), which you may be able to see is the normal {{uc}} parser function. If this is just for a couple of articles, you can defeat that with nowiki tags: |name1=<nowiki>This won't get uppercased.</nowiki>. See Special:Permalink/1060161100 for a demo. If there are lots ping me and I can add a specific option. User:GKFXtalk 20:35, 13 December 2021 (UTC)[reply]
@GKFX: Thanks, I've implemented it in the entries that needed it. --RichardW57 (talk) 00:08, 16 December 2021 (UTC)[reply]

Alternative encodings[edit]

I feel that if deemed worthy of inclusion for completeness, deprecated, discouraged, obsolete and non-NFC encodings should be tagged within the table in some way, rather than having their status buried in the following text. But where can I add a footnote indicator? There are deprecated Tibetan vowels for Sanskrit, discouraged Khmer letters, obsolete Malayalam and Bengali false half-forms and reasonably common Vietnamese incomplete decompositions. --RichardW57m (talk) 15:33, 5 November 2021 (UTC)[reply]

(ping) Could you post a mockup of what you want to see? If you paste something like {{charmap|41|name1=Latin letter A|42|name2=Latin letter B}} into Special:ExpandTemplates and press OK, the "Result" box will contain the generated wikimarkup (should be fairly tidy) which you can edit to make a mockup. User:GKFXtalk 20:56, 13 December 2021 (UTC)[reply]
The problem was that I'd forgotten the trick of using {{efn}} and friends to get round the problem that <ref> doesn't work well in template parameters. I can now produce examples like:


Character information
Preview A
Unicode name LATIN LETTER A[α]
Encodings decimal hex
Unicode 65 U+0041
UTF-8 65 41
Numeric character reference &#65; &#x41;

Notes

  1. ^ Deprecated

@GKFX: That reduces the question to one of style and the chore of referencing the right part of the Unicode Standard. I'm not quite sure how I'd reference not being in form NFC - sometimes you have to execute the normalisation algorithm. --RichardW57m (talk) 13:14, 16 December 2021 (UTC)[reply]