src/unibreakdef.h File Reference

Header file for private definitions in the libunibreak library. More...

#include <stddef.h>
#include "unibreakbase.h"
Include dependency graph for unibreakdef.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Defines

#define EOS   0xFFFFFFFF
 Constant value to mark the end of string.

Typedefs

typedef utf32_t(* get_next_char_t )(const void *, size_t, size_t *)
 Abstract function interface for ub_get_next_char_utf8, ub_get_next_char_utf16, and ub_get_next_char_utf32.

Functions

utf32_t ub_get_next_char_utf8 (const utf8_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-8 sequence.
utf32_t ub_get_next_char_utf16 (const utf16_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-16 sequence.
utf32_t ub_get_next_char_utf32 (const utf32_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-32 sequence.

Detailed Description

Header file for private definitions in the libunibreak library.

Version:
3.0, 2015/05/10
Author:
Wu Yongwei

Define Documentation

#define EOS   0xFFFFFFFF

Constant value to mark the end of string.

It is not a valid Unicode character.


Typedef Documentation

typedef utf32_t(* get_next_char_t)(const void *, size_t, size_t *)

Function Documentation

utf32_t ub_get_next_char_utf16 ( const utf16_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-16 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-16 surrogate pair.

Parameters:
[in] s input UTF-16 string
[in] len length of the string in words
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered
utf32_t ub_get_next_char_utf32 ( const utf32_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-32 sequence.

The index will be advanced to the next character.

Parameters:
[in] s input UTF-32 string
[in] len length of the string in dwords
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered
utf32_t ub_get_next_char_utf8 ( const utf8_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-8 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-8 sequence.

Parameters:
[in] s input UTF-8 string
[in] len length of the string in bytes
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered

Generated by  doxygen 1.6.2