DragonFly On-Line Manual Pages
KHTTP_PARSE(3) DragonFly Library Functions Manual KHTTP_PARSE(3)
NAME
khttp_parse, khttp_parsex - parse a CGI instance for kcgi
LIBRARY
library "libkcgi"
SYNOPSIS
#include <stdint.h>
#include <kcgi.h>
enum kcgi_err
khttp_parse(struct kreq *req, const struct kvalid *keys, size_t keysz,
const char *const *pages, size_t pagesz, size_t defpage);
enum kcgi_err
khttp_parsex(struct kreq *req, const struct kmimemap *suffixes,
const char *const *mimes, size_t mimemax, const struct kvalid *keys,
size_t keysz, const char *const *pages, size_t pagesz,
size_t defmime, size_t defpage, void *arg,
void (*argfree)(void *arg), unsigned int debugging);
extern const char *const kmimetypes[KMIME__MAX];
extern const char *const khttps[KHTTP__MAX];
extern const char *const kschemes[KSCHEME__MAX];
extern const char *const kresps[KRESP__MAX];
extern const char *const kmethods[KMETHOD__MAX];
extern const struct kmimemap ksuffixmap[];
extern const char *const ksuffixes[KMIME__MAX];
DESCRIPTION
The khttp_parse and khttp_parsex functions parse and validate input and
the HTTP environment (compression, paths, MIME types, and so on). It is
the central function in the kcgi(3) library, parsing and validating key-
value form (query string, message body, cookie) data and opaque message
bodies.
The collective arguments are as follows:
arg A pointer to private application data. It is not touched unless
argfree is provided.
argfree
Function invoked with arg by the child process starting to parse
untrusted network data. This makes sure that no unnecessary data
is leaked into the child.
debugging
This bit-field sets debugging of the underlying parse and/or
write routines. Debugging messages are sent to stderr and
consist of the process ID, a colon, then the logged data. Logged
data consists of printable ASCII characters and spaces. A
newline will flush the existing line. There are at most BUFSIZ
characters per line. Other characters are either escaped (\v,
\r, \b) or replaced with a question mark. If the
KREQ_DEBUG_WRITE bit is set, write operations directly or
indirectly via khttp_write(3) will be logged. When the request
is torn down with khttp_free(3), the process ID and total logged
bytes are printed on their own line. If the KREQ_DEBUG_READ_BODY
bit is set, the entire input body is logged. The total byte
count is printed on its own line afterward.
defmime
If no MIME type is specified (that is, there's no suffix to the
page request), use this index in the mimes array.
defpage
If no page was specified (e.g., the default landing page), this
is provided as the requested page index.
keys An array of input and validation fields.
keysz The number of elements in keys.
mimemax
The MIME index used if no MIME type was matched.
mimes An array of MIME types (e.g., "text/html"), mapped into a MIME
index during MIME body parsing. This relates both to pages and
input fields with a body type.
pages An array of recognised pathnames. When pathnames are parsed,
they're matched to indices in this array.
pagesz The number of pages in pages. Also used if the requested page
was not in pages.
req Fill with input fields and HTTP context parsed from the CGI
environment. This is the main structure carried around in a
kcgi(3) application.
suffixes
Define the MIME type (suffix) mapping.
The first form, khttp_parse, is for applications using the system-
recognised MIME types. This should work well enough for most
applications. It is equivalent to invoking the second form,
khttp_parsex, as follows:
khttp_parsex(req, ksuffixmap,
kmimetypes, KMIME__MAX, keys, keysz,
pages, pagesz, KMIME_TEXT_HTML,
defpage, NULL, NULL, 0);
The req object filled in by khttp_parse or khttp_parsex must be
subsequently freed by khttp_free.
Types
A struct kreq object is filled in by khttp_parse and khttp_parsex. It
consists of the following fields:
arg Private application data. This is set during khttp_parse().
auth Type of "managed" HTTP authorisation, if any. This is digest
(KAUTH_DIGEST) or basic (KAUTH_BASIC) authorisation performed by
the web server. See the rawauth field for raw authorisation
requests. If a managed authorisation is specified but with
unknown type (i.e., not digest or basic authentiation), this is
set to KAUTH_UNKNOWN.
cookies
All key-value pairs read from user cookies.
cookiemap
Entries in successfully-parsed (or un-parsed) cookies mapped into
field indices as defined by the keys argument to khttp_parse().
cookienmap
Entries in unsuccessfully-parsed (but still attempted) cookies
mapped into field indices as defined by the keys argument to
khttp_parse().
cookiesz
The size of the cookies array.
fields All key-value pairs read from the requests (query string,
cookies, message body).
fieldmap
Entries in successfully-parsed (or un-parsed) fields mapped into
field indices as defined by the keys arguments to khttp_parse().
fieldnmap
Entries in unsuccessfully-parsed (but still attempted) fields
mapped into field indices as defined by the keys argument to
khttp_parse().
fieldsz
The number of elements in the fields array.
fullpath
The full path following the server name or NULL if there is no
path following the server. For example, if foo.cgi/bar/baz is
the PATH_INFO, this would be /bar/baz.
host The host-name (i.e., the host of the web application) request
passed to the application. This shouldn't be confused with the
application host's canonical name.
method The KMETHOD_ACL, KMETHOD_CONNECT, KMETHOD_COPY, KMETHOD_DELETE,
KMETHOD_GET, KMETHOD_HEAD, KMETHOD_LOCK, KMETHOD_MKCALENDAR,
KMETHOD_MKCOL, KMETHOD_MOVE, KMETHOD_OPTIONS, KMETHOD_POST,
KMETHOD_PROPFIND, KMETHOD_PROPPATCH, KMETHOD_PUT, KMETHOD_REPORT,
KMETHOD_TRACE, or KMETHOD_UNLOCK submission method. If the
method was not understand, KMETHOD__MAX is used. If no method
was used, the default is KMETHOD_GET.
Note: applications will usually accept only KMETHOD_GET and
KMETHOD_POST, so be sure to emit a KHTTP_405 status for non-
conforming methods.
kdata Internal data. Should not be touched.
keys Value passed to khttp_parse().
keysz Value passed to khttp_parse().
mime The MIME type of the requested file as determined by its suffix
matched to the mimemap map passed to khttp_parsex() or the
default kmimemap if using khttp_parse(). This defaults to the
mimemax value passed to khttp_parsex() or the default KMIME__MAX
if using khttp_parse() when no suffix is specified or when the
suffix is specified but not known.
page The page index as defined by the pages array passed to
khttp_parse() and parsed from the requested file. This is the
first path component! The default page provided to khttp_parse()
is used if no path was specified or pagesz if the path failed
lookup.
pagename
The string corresponding to page.
port The server's receiving TCP port.
path The path (or empty string) following the parsed component
regardless of whether it was located in the path array provided
to khttp_parse(). For example, if the PATH_INFO is
foo.cgi/bar/baz.html, the path component would be baz (with the
leading slash stripped).
pname The script name (which may be an empty string in degenerate
cases) passed to the server. This may not reflect a file-system
entity if re-written by the web server.
rawauth
If the web server passes the "Authorization" header (which, for
example, Apache doesn't by default), then the header is parsed
into this field, which is of type struct khttpauth.
remote The string form of the client's IPV4 or IVP6 address.
reqmap Mapping of enum krequ enumeration values to reqs parsed from the
input stream.
reqs List of all HTTP request headers, known via enum krequ and not
known, parsed from the input stream.
reqsz Number of request headers in reqs.
scheme The access scheme, which is either KSCHEME_HTTP or KSCHEME_HTTPS.
The scheme defaults to KSCHEME_HTTP if not specified by the
request.
suffix The suffix part of the PATH_INFO or NULL if none exists. For
example, if the PATH_INFO is foo.cgi/bar/baz.html, the suffix
would be html. See the mime field for the MIME type parsed from
the suffix.
The application may optionally define keys provided to khttp_parse and
khttp_parsex as an array of struct kvalid. This structure is central to
the validation of input data. It consists of the following fields:
name The field name, i.e., how it appears in the HTML form input name.
This cannot be NULL. If the field name is an empty string and
the HTTP message consists of an opaque body (and not key-value
pairs), then that field will be used to validate the HTTP message
body. This is useful for KMETHOD_PUT style requests.
valid Validating function. This function accepts a single struct kpair
* argument and returns an int. If the function is NULL, then no
validation is performed and the data is considered as always
valid. If you provide your own valid function, it must set the
field and parsed variables in the key-value pair. You can also
allocate new memory for the val and thus valsz: if the value of
val changes during your validation, the new value will be freed
with free(3) after being passed out of the sandbox. Note: these
functions are invoked from within a system-specific sandbox. You
should assume that you cannot invoke any "invasive" system calls
such as opening files, sockets, etc. In other words, these must
be pure computation.
The struct kpair structure presents the user with fields parsed from
input and (possibly) matched to the keys variable passed to khttp_parse
and khttp_parsex. It is also passed to the validation function to be
filled in. In this case, the MIME-related fields are already filled in
and may be examined to determine the method of validation. This is
useful when validating opaque message bodies.
ctype The value's MIME content type (e.g., image/jpeg), or NULL if not
defined.
ctypepos
If ctype is not NULL, it is looked up in the mimes parameter
passed to khttp_parsex or ksuffixmap if using khttp_parse. If
found, it is set to the appropriate index. Otherwise, it's
mimesz.
file The value's MIME source filename or NULL if not defined.
key The nil-terminated key (input) name. If the HTTP message body is
opaque (e.g., KMETHOD_PUT), then an empty-string key is cooked
up.
keypos If looked up in the keys variable passed to khttp_parse, the
index of the looked-up key. Otherwise keysz.
next In a cookie or field map, next points to the next parsed key-
value pair with the same key name. This occurs most often in
HTML checkbox forms, where many fields may have the same name.
parsed The parsed, validated value. These may be integer, for a 64-bit
signed integer; string, for a nil-termianted character string; or
double, for a double-precision floating-point number. This is
intentionally basic because the resulting data must be reliably
passed from the parsing context back into the web application.
state The validation state: whether validated by a parse, invalidated
by a parse, or non-validated (unparsed).
type If parsed, the type of data in parsed, otherwise KFIELD__MAX.
val The (input) value, which is always nil-terminated, but if the
data is binary, nil terminators may occur before the true data
length of valsz.
valsz The true length of val.
xcode The value's MIME content transfer encoding (e.g., base64), or
NULL if not defined.
The struct khttpauth structure holds authorisation data if passed by the
server. If no data was passed by the server, the type value is
KAUTH_NONE. Otherwise it's KAUTH_BASIC or KAUTH_DIGEST, with
KAUTH_UNKNOWN if the authorisation type was not recognised. The specific
fields are as follows.
authorised
For KAUTH_BASIC or KAUTH_DIGEST authorisation, this field
indicates whether all required values were specified.
d A union containing parsed fields per type: basic for KAUTH_BASIC
or digest for KAUTH_DIGEST.
If the field for an HTTP authorisation request is KAUTH_BASIC, it will
consist of the following for its parsed entities in its struct khttpbasic
structure:
response
The hashed and encoded response string.
If the field for an HTTP authorisation request is KAUTH_DIGEST, it will
consist of the following in its struct khttpdigest structure:
alg The encoding algorithm, parsed from the possible MD5 or MD5-Sess
values.
qop The quality of protection algorithm, which may be unspecified,
Auth or Auth-Init.
user The user coordinating the request.
uri The URI for which the request is designated. (This must match
the request URI).
realm The request realm.
nonce The server-generated nonce value.
cnonce The (optional) client-generated nonce value.
response
The hashed and encoded response string, which entangled fields
depending on algorithm and quality of protection.
count The (optional) cnonce counter.
opaque The (optional) opaque string requested by the server.
Lastly, the struct khead structure holds parsed HTTP headers.
key Holds the HTTP header name. This is not the CGI header name
(e.g., HTTP_COOKIE), but the reconstituted HTTP name (e.g.,
Coookie).
val The opaque header value, which may be an empty string.
Variables
A number of variables are defined <kcgi.h> to simplify invocations of the
khttp_parse family. Applications are strongly suggested to use these
variables (and associated enumerations) in khttp_parse instead of
overriding them with hand-rolled sets in khttp_parsex.
kmimetypes
Indexed list of common MIME types, for example, "text/html" and
"application/json". Corresponds to enum kmime enum khttp.
khttps Indexed list of HTTP status code and identifier, for example,
"200 OK". Corresponds to enum khttp.
kschemes
Indexed list of URL schemes, for example, "https" or "ftp".
Corresponds to enum kscheme.
kresps Indexed list of header response names, for example,
"Cache-Control" or "Content-Length". Corresponds to enum kresp.
kmethods
Indexed list of HTTP methods, for example, "GET" and "POST".
Corresponds to enum kmethod.
ksuffixmap
Map of MIME types defined in enum kmime to possible suffixes.
This array is terminated with a MIME type of KMIME__MAX and name
NULL.
ksuffixes
Indexed list of canonical suffixes for MIME types corresponding
to enum kmime. Note: this may be a NULL pointer for types that
have no canonical suffix, for example.
"application/octet-stream".
RETURN VALUES
khttp_parse and khttp_parsex return an error code:
KCGI_OK
Success (not an error).
KCGI_ENOMEM
Memory failure. This can occur in many places: spawning a child,
allocating memory, creating sockets, etc.
KCGI_ENFILE
Could not allocate file descriptors.
KCGI_EAGAIN
Could not spawn a child.
KCGI_FORM
Malformed data between parent and child whilst parsing an HTTP
request. (Internal system error.)
KCGI_SYSTEM
Opaque operating system error.
On failure, the calling application should terminate as soon as possible.
Applications should not try to write an HTTP 505 error or similar, but
allow the web server to handle the empty CGI response on its own.
SEE ALSO
kcgi(3), khttp_free(3)
AUTHORS
The khttp_parse and khttp_parsex functions were written by Kristaps
Dzonsons <kristaps@bsd.lv>.
DragonFly 6.5-DEVELOPMENT January 4, 2016 DragonFly 6.5-DEVELOPMENT