Sunday, September 11, 2016

Unicode gadget

I'm planning a new page which has gadgets that I use for blog writing etc. The first entry is likely to be a Unicode writer; here is a draft you might like to try. Unicode is the massive collection of special characters which you can probably access using your system cahacter generator - on Windows it is a Windows Accessory called Character map.

Unicode chars come in groupings; a good listing is here. Each char has a number and you can render them in HTML using the scheme ߦ for character 2022 (in decimal). But they are widely rendered and usable as characters, in browsers and editors. The first 255 chars are just ascii. A particular aspect of usability is that you can use them in blog comments where most fancy html is not allowed.

I find them very useful for maths, chemical formulæ etc. You can use them where latex is unavailable, and there is much less overhead than latex. Here are some examples:
∫₀¹uⁱ⁻¹(1-u)ⁿ⁻ⁱ⁻¹du = β(i,n) = Γ(i)Γ(n)/Γ(i+n)

You can cut and paste from a char map as in Windows, or write the long form HTML as above. But that is tedious, as they come in lists of thousands, many for all the various language scripts. So I've collected a manageable table which I think has most that I'll ever need, and added an editable phrase generator, below the jump.

Here is the table. The shades indicate the groupings. Click on any character to select. I'll explain the controls below:

Once you start selecting, you'll see things appear in the four cells of the top row (below the char table). On the left is the chosen char. Next is the HTML long form, which includes the decimal number. Then there is a text box with a string, selected, of the "phrase" of all chars selected (and not erased). The text is selected, so you can just use Ctrl-C to copy for pasting. Finally, there is the same set of chars, but with a dotted line that functions as a cursor. This is what you edit.

The edit controls are the five buttons below. The one with < is a back erase (1 char), from the cursor position. > is forward erase. X erases the whole phrase (careful). The arrows ← and → move the cursor. If the cursor is not at the end (default), then the next char selected will go in at the cursor position. Eventually you'll want to paste the expression somewhere. You can probably further edit it there.

You may wonder why the phrase appears in both the box and the edit location. I found that notepad++ which I use does not show chars beyond 8192, and that includes a lot of sub, super and math. So for those I was writing the long form in the text box, which works as well, and will go through any environment. But then I found that notepad++ does show a marker char, and the information is still there, so when I eventually paste to a comment or to html to show, it shows. So I allowed all chars to appear there. Still, if there turns out to be some environment where pasting those chars would actually lose the information, I could reinstate it.

Obviously there may be chars that some would like that I have not selected, and suggestions are welcome. I may ask for a list of numbers, since lookup is tedious. I may also prune the existing list. I included the whole ascii clock 160-255, which has a lot of rarely used accents, for example.


  1. Ooops Nick...

    ∫₀¹uⁱ⁻¹(1-u)ⁿ⁻ⁱ⁻¹du = β(i,n) = Γ(i)Γ(n)/Γ(i+n)

    I don't know if can manage to see that at your site, but here, neither Firefox nor Chrome display two characters. Seems to be hex 209B and 2071...

    Regards from Germany

    1. Bindi,
      Thanks. There always is a possibility that local implementations of browsers won't show everything right. 209B (the subscript s) is a tricky one - for me it shows, but very small. I can see 2071 (super i), though it is smaller than the others. Numbers beyond 2000 seem to be sometimes a challenge - my notepad++ won't show them, though it knows they are there.