DragonFly On-Line Manual Pages

MANDOC_ESCAPE(3)      DragonFly Library Functions Manual      MANDOC_ESCAPE(3)

NAME
     mandoc_escape - parse roff escape sequences

SYNOPSIS
     #include <sys/types.h>
     #include <mandoc.h>

     enum mandoc_esc
     mandoc_escape(const char **end, const char **start, int *sz);

DESCRIPTION
     This function scans a roff(7) escape sequence.

     An escape sequence consists of
     -   an initial backslash character (`\'),
     -   a single ASCII character called the escape sequence identifier,
     -   and, with only a few exceptions, an argument.

     Arguments can be given in the following forms; some escape sequence
     identifiers only accept some of these forms as specified below.  The
     first three forms are called the standard forms.

     In brackets: [argument]
         The argument starts after the initial `[', ends before the final `]',
         and the escape sequence ends with the final `]'.

     Two-character argument short form: (ar
         This form can only be used for arguments consisting of exactly two
         characters.  It has the same effect as [ar].

     One-character argument short form: a
         This form can only be used for arguments consisting of exactly one
         character.  It has the same effect as [a].

     Delimited form: CargumentC
         The argument starts after the initial delimiter character C, ends
         before the next occurrence of the delimiter character C, and the
         escape sequence ends with that second C.  Some escape sequences allow
         arbitrary characters C as quoting characters, some restrict the range
         of characters that can be used as quoting characters.

     Upon function entry, end is expected to point to the escape sequence
     identifier.  The values passed in as start and sz are ignored and
     overwritten.

     By design, this function cannot handle those roff(7) escape sequences
     that require in-place expansion, in particular user-defined strings \*,
     number registers \n, width measurements \w, and numerical expression
     control \B.  These are handled by roff_res(), a private preprocessor
     function called from roff_parseln(), see the file roff.c.

     The function mandoc_escape() is used
     -   recursively by itself, because some escape sequence arguments can in
         turn contain other escape sequences,
     -   for error detection internally by the roff(7) parser part of the
         mandoc(3) library, see the file roff.c,
     -   above all externally by the mandoc formatting modules, in particular
         -Tascii and -Thtml, for formatting purposes, see the files term.c and
         html.c,
     -   and rarely externally by high-level utilities using the mandoc
         library, for example makewhatis(8), to purge escape sequences from
         text.

RETURN VALUES
     Upon function return, the pointer end is set to the character after the
     end of the escape sequence, such that the calling higher-level parser can
     easily continue.

     For escape sequences taking an argument, the pointer start is set to the
     beginning of the argument and sz is set to the length of the argument.
     For escape sequences not taking an argument, start is set to the
     character after the end of the sequence and sz is set to 0.  Both start
     and sz may be NULL; in that case, the argument and the length are not
     returned.

     For sequences taking an argument, the function mandoc_escape() returns
     one of the following values:

     ESCAPE_FONT
         The escape sequence \f taking an argument in standard form: \f[, \f(,
         \fa.  Two-character arguments starting with the character `C' are
         reduced to one-character arguments by skipping the `C'.  More
         specific values are returned for the most commonly used arguments:

         argument    return value
         R or 1      ESCAPE_FONTROMAN
         I or 2      ESCAPE_FONTITALIC
         B or 3      ESCAPE_FONTBOLD
         P           ESCAPE_FONTPREV
         BI          ESCAPE_FONTBI

     ESCAPE_SPECIAL
         The escape sequence \C taking an argument delimited with the single
         quote character and, as a special exception, the escape sequences not
         having an identifier, that is, those where the argument, in standard
         form, directly follows the initial backslash: \C', \[, \(, \a.  Note
         that the one-character argument short form can only be used for
         argument characters that do not clash with escape sequence
         identifiers.

         If the argument matches one of the forms described below under
         ESCAPE_UNICODE, that value is returned instead.

         The ESCAPE_SPECIAL special character escape sequences can be rendered
         using the functions mchars_spec2cp() and mchars_spec2str() described
         in the mchars_alloc(3) manual.

     ESCAPE_UNICODE
         Escape sequences of the same format as described above under
         ESCAPE_SPECIAL, but with an argument of the forms uXXXX, uYXXXX, or
         u10XXXX where X and Y are hexadecimal digits and Y is not zero: \C'u,
         \[u.  As a special exception, start is set to the character after the
         u, and the sz return value does not include the u either.

         Such Unicode character escape sequences can be rendered using the
         function mchars_num2uc() described in the mchars_alloc(3) manual.

     ESCAPE_NUMBERED
         The escape sequence \N followed by a delimited argument.  The
         delimiter character is arbitrary except that digits cannot be used.
         If a digit is encountered instead of the opening delimiter, that
         digit is considered to be the argument and the end of the sequence,
         and ESCAPE_IGNORE is returned.

         Such ASCII character escape sequences can be rendered using the
         function mchars_num2char() described in the mchars_alloc(3) manual.

     ESCAPE_OVERSTRIKE
         The escape sequence \o followed by an argument delimited by an
         arbitrary character.

     ESCAPE_IGNORE

         *   The escape sequence \s followed by an argument in standard form
             or by an argument delimited by the single quote character: \s',
             \s[, \s(, \sa.  As a special exception, an optional `+' or `-'
             character is allowed after the `s' for all forms.

         *   The escape sequences \F, \g, \k, \M, \m, \n, \V, and \Y followed
             by an argument in standard form.

         *   The escape sequences \A, \b, \D, \R, \X, and \Z followed by an
             argument delimited by an arbitrary character.

         *   The escape sequences \H, \h, \L, \l, \S, \v, and \x followed by
             an argument delimited by a character that cannot occur in
             numerical expressions.  However, if any character that can occur
             in numerical expressions is found instead of a delimiter, the
             sequence is considered to end with that character, and
             ESCAPE_ERROR is returned.

     ESCAPE_ERROR
         Escape sequences taking an argument but not matching any of the above
         patterns.  In particular, that happens if the end of the logical
         input line is reached before the end of the argument.

     For sequences that do not take an argument, the function mandoc_escape()
     returns one of the following values:

     ESCAPE_SKIPCHAR
         The escape sequence "\z".

     ESCAPE_NOSPACE
         The escape sequence "\c".

     ESCAPE_IGNORE
         The escape sequences "\d" and "\u".

FILES
     This function is implemented in mandoc.c.

SEE ALSO
     mchars_alloc(3), mandoc_char(7), roff(7)

HISTORY
     This function has been available since mandoc 1.11.2.

AUTHORS
     Kristaps Dzonsons <kristaps@bsd.lv>
     Ingo Schwarze <schwarze@openbsd.org>

BUGS
     The function doesn't cleanly distinguish between sequences that are valid
     and supported, valid and ignored, valid and unsupported, syntactically
     invalid, or undefined.  For sequences that are ignored or unsupported, it
     doesn't tell whether that deficiency is likely to cause major formatting
     problems and/or loss of document content.  The function is already rather
     complicated and still parses some sequences incorrectly.

DragonFly 6.5-DEVELOPMENT      January 21, 2015      DragonFly 6.5-DEVELOPMENT