IBM Z and LinuxONE - Languages - Group home

Intention to deprecate trigraphs in the next C++ Standard

  

Do you use trigraphs? If so, you may be interested that the C++ Standard Committee will vote to deprecate trigraphs (to Appendix D) in the next meeting in July 12-18 in Frankfurt as I discussed in my last trip report.


Trigraph sequences are


??= #pound sign

??( [left bracket

??) ]right bracket

??< {left brace

??> }right brace

??/ \backslash

??’ ^caret

??! |vertical bar

??- ~tilde


One of the reason they were invented was because a character like '#' have code points that differ across EBCDIC code pages.


It is safe to use '?' and '=' because these share the same code point across all EBCDIC code pages.

Another reason is that on some international keybaords, these characters do not have representation and they must be typed in using this sequence of text. They get replaced in quotes because they are handled in phase 1 of translation and can be replaced even if found inside another token, such as a quoted string, or character constant.


There are some possible alternatives. Digraph supply a subset of trigraphs:


%: or %%#number sign<:[left bracket:>]right bracket<%{left brace%>}right brace%:%: or %%%%##token pasting


But it does not represent these trigraphs and cannot be a complete replacement:

??! |

??' ^

??/ \

??- ~


Digraphs also are handled during tokenization and must always be a full token by itself. This means it will not be replaced inside quotes.


Now you can use something like an alternative spelling for operators to cover 3 of those 4 cases:

bitor

or

or_eq

xor

xor_eq

compl


But one of those four, '\' truly has no replacement possibility.


Even in the ASCII world, the loss of trigraphs can be problematic if a keyboard does not have backslash. Does this happen? I would like to know.

Loss of backslash also can lead to other characters that are variant, including the half-width backslash in Japanese, and the yen sign in Shift-JIS.


So why is there enthusiasm for this deprecation?


It is precisely because of the ability to replace trigraphs in quotes. They can look like question marks, and in some cases they are especially when inside a quoted string. There is cause for confusion. Specifically, one National Body comment urge the removal of trigraphs because:

'Trigraphs are a complicated solution to an old problem, that cause more problems than they solve in the modern environment. Unexpected trigraphs in string literals and occasionally in comments can be very confusing for the non-expert. "


Are there others, I would like to know. Some folks have mentioned how this can be a problem is modern Perl searches.


I would be interested if you agree with these statements and what opinion you may have on the deprecation of trigraphs.