DragonFly On-Line Manual Pages

UTF(3)                DragonFly Library Functions Manual                UTF(3)

NAME
       runetochar, chartorune, runelen, fullrune, utflen, utfrune, utfrrune,
       utfutf - Unicode Text Format functionality

SYNOPSIS
       #include <utf.h>

       int runetochar(char *cp, Rune *rp);

       int chartorune(Rune *rp, char *cp);

       int runelen(long r);

       int fullrune(char *cp, int n);

       int utflen(char *s);

       int utfbytes(char *s);

       char *utfrune(char *cp, long r);

       char *utfrrune(char *cp, long r);

       char *utfutf(char *big, char *little);

       int utf_snprintf(char *buf, size_t size, char *format, ...);

       int utfcmp(char *s1, char *s2);

       int utfncmp(char *s1, char *s2, int rc);

       char *utfcpy(char *dst, char *src);

       char *utfncpy(char *dst, char *src, int nbytes);

       char *utfcat(char *src, char *append);

       char *utfncat(char *src, char *append, int nbytes);

DESCRIPTION
       The UTF routines are used to pack the Unicode text encoding into a
       standard character stream.  To do that effectively, ASCII characters
       form the lowest 127 characters of UTF-8. These characters are
       interchangeable between the two character sets.  A Rune is a Unicode
       character, defined in the header file utf.h.

       runetochar translates a single Rune to a UTF sequence and returns the
       number of bytes produced. chartorune is the inverse of this function,
       returning the number of bytes consumed.  runelen returns the number of
       bytes in the encoding of a Rune.  fullrune checks that the first n
       bytes of the UTF string cp contain a complete UTF encoding.

       utflen returns the number of runes in a UTF string.  utbytes returns
       the number of bytes in a UTF string.  utfrune returns a pointer to the
       first occurrence of a rune in a UTF string.  utfrrune returns a pointer
       to the last.  utfutf searches for the first occurrence of a UTF string
       in another UTF string.

       utf_snprintf is a prticularly dumb implementation of snprintf for utf
       strings - it only interprets %%, %s and %d sequences in the format
       string, and does no field width calculation on those.

       utfcmp compares two strings lexicographically, Rune by Rune, and
       returns a value greater than 0, equal to zero, or less than zero
       depending on whether the first UTF string is greater than, the same as,
       or less than the second string.  utfncmp does the same comparison as
       utfcmp, with a maximum upper bound of rc Runes.

       utfcpy copies from source to destination, Rune by Rune, and returns its
       destination string. No bounds checking is done on the number of Runes
       copied, or their individual sizes.  The dst argument is returned.
       utfncpy copies at most nbytes bytes from source to destination,
       terminating when a null Rune is found in the source. If the number of
       bytes copied is less than nbytes, then the destination string is
       paddedf with null (0) bytes. If it is equal to or greater than nbytes,
       no zero bytes is added.  The dst argument is returned.  utfcat appends
       the UTF string append onto the UTF string src.  utfncat appends the UTF
       string append onto the UTF string src, bearing in mind that the buffer
       src is only nbytes long.

IMPLEMENTATION
       This implementation of UTF, nominally UTF-8, can encode a null Unicode
       character using a one-byte or a two-byte encoding.  Typically, Plan 9
       uses a one-byte encoding, whilst Java uses a two-byte encoding.  Plan 9
       type encoding makes backwards compatibility much easier, and loses
       nothing - all the Java functionality is there, there are no embedded
       null bytes in a UTF string, due to the encoding of second and third
       characters, and ordinary C strings are recognised as well, which is not
       the case in Java.  By default, a one byte Null-byte encoding is used.

       UTF-8 is defined in X/Open Company Ltd., "File System Safe UCS
       Transformation Format (FSS_UTF)", X/Open Preliminary Specification,
       Document Number: P316, which also appears in ISO/IEC 10646, Annex P.

BUGS
       Undoubtably, these are many, and legion.

AUTHOR
       Written by Alistair Crooks (agc@amdahl.com, or
       agc@westley.demon.co.uk), from a draft document written by Rob Pike and
       Ken Thompson, detailing the implementation of UTF in the Plan 9
       operating system.

                                                                        UTF(3)