Fittingly, I am going to restart the blog on a bit of old chestnut. I’m considering throwing in the towel on those Unicode Characters that look identical (or are functionally identical in the minds of typists).
A bit of history. When I first started making keyboard layouts for indigenous languages, I quickly found that I needed to make a decision on apostrophes, lots of languages use them for a whole variety of sounds: glottal stop, aspiration, ejectives, reduced vowels, etymological deletions, and so on. The first thing I did is just use U+0027 '. This started long before I was using Unicode, so in a standard ASCII font, U+0027 made sense.
Once I began to look at Unicode, I wanted to follow the Unicode directive and use the Spacing Modifier Letter U+02BC ʼ because I’m supposed to according to the Standard.
- "glottal stop, glottalization, ejective"
- "many languages use this as a letter of one of their alphabets"
- "2019 is the preferred character for a punctuation apostrophe"
So when she was typing in English using the Tohono ’O’odham keyboard, words like “wasnʼt” was showing up as incorrect, because the apostrophe was typed as U+02BC. How on Earth is a typist going to know that a Tohono ’O’odham apostrophe is one entity while an English apostrophe is a different entity when they look exactly the same? They’re not going to know, so forcing the distinction is unfair. Especially in languages like Anishinaabemowin or Blackfoot which have writing systems with no letters outside English but with an apostrophe. So the Blackfoot typist can’t use the US English keyboard layout because they need a special apostrophe, which, by the way, looks exactly like the English? U+02BC doesn’t make sense and it’s out of my life.
All right then, I went through all the keyboards I had already made public, and redid the lot of them (well over 100). I changed them all to U+2019. This way I got the curly apostrophe I wanted, but no hassles with two different apostrophes underlyingly.
But this has had its own problems, primarily with line breaking. Because the U+2019 is "punctuation", it line-breaks. Let’s look at a couple of examples
- Blackfoot: omatsini’katsitsai
- Cheyenne: mostanotseevavo’hoveotse̊hevohe
- omatsini’
katsitsai - mostanotseevavo’
hoveotse̊hevohe
The software won’t automatically insert a hyphen because it considers there to be no word break: they apostrophe is a word boundary. A Mohawk word like kotitakhehnóntie’s is going to word-break between the apostrophe and the final s which is very wrong.
- kotitakhehnóntie’
s
What to do then? I’m going back to plain ol’ U+0027 APOSTROPHE. Yes it is vague as to what shape it ought to be. Yes it can be mutated into a curly apostrophe ‘ or ’ if smart quotes are turned on in your word processor. Yes the standard says that “2019 ’ is preferred for apostrophe.”
But in the end, U+0027 works. Software recognizes it for what it is, an apostrophe. Speakers recognize it for what it is, an apostrophe (both English, French, and Native-language). Drawbacks?
- With smart quotes turned on, any word-initial apostrophes (like ’O’odham) are going to be 6-curled: ‘O’odham. Ugly and wrong. But I see this all the time in English too, like “Hits from the ‘60’s”. Maybe one day we’ll have smart smart quotes.
- No apostrophes in web-site domain names. Sure this is no fun, but languages have other punctuation letters, like : for length which can’t go in file names even. We manage no apostrophes in domain names in English just fine.
I had this problem after I transcribed several hours of text, about half of which was done using the Coast Tsimshian keyboard. The problem came when the program I was using for coding turned all the punctuation into little clusters of random symbols in the output! It made for a lot of search and replace fun.
ReplyDelete