linebreakdef.h File Reference

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm. More...

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  LineBreakProperties
 Struct for entries of line break properties. More...
struct  LineBreakPropertiesLang
 Struct for association of language-specific line breaking properties with language names. More...

Defines

#define EOS   0xFFFF
 Constant value to mark the end of string.

Typedefs

typedef utf32_t(* get_next_char_t )(const void *, size_t, size_t *)
 Abstract function interface for lb_get_next_char_utf8, lb_get_next_char_utf16, and lb_get_next_char_utf32.

Enumerations

enum  LineBreakClass {
  LBP_Undefined, LBP_OP, LBP_CL, LBP_CP,
  LBP_QU, LBP_GL, LBP_NS, LBP_EX,
  LBP_SY, LBP_IS, LBP_PR, LBP_PO,
  LBP_NU, LBP_AL, LBP_ID, LBP_IN,
  LBP_HY, LBP_BA, LBP_BB, LBP_B2,
  LBP_ZW, LBP_CM, LBP_WJ, LBP_H2,
  LBP_H3, LBP_JL, LBP_JV, LBP_JT,
  LBP_AI, LBP_BK, LBP_CB, LBP_CR,
  LBP_LF, LBP_NL, LBP_SA, LBP_SG,
  LBP_SP, LBP_XX
}
 

Line break classes.

More...

Functions

utf32_t lb_get_next_char_utf8 (const utf8_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-8 sequence.
utf32_t lb_get_next_char_utf16 (const utf16_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-16 sequence.
utf32_t lb_get_next_char_utf32 (const utf32_t *s, size_t len, size_t *ip)
 Gets the next Unicode character in a UTF-32 sequence.
void set_linebreaks (const void *s, size_t len, const char *lang, char *brks, get_next_char_t get_next_char)
 Sets the line breaking information for a generic input string.

Variables

struct LineBreakProperties lb_prop_default []
 Default line breaking properties as from the Unicode Web site.
struct LineBreakPropertiesLang lb_prop_lang_map []
 Association data of language-specific line breaking properties with language names.

Detailed Description

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm.

Version:
2.1, 2011/05/07
Author:
Wu Yongwei

Define Documentation

#define EOS   0xFFFF

Constant value to mark the end of string.

It is not a valid Unicode character.


Typedef Documentation

typedef utf32_t(* get_next_char_t)(const void *, size_t, size_t *)

Enumeration Type Documentation

Line break classes.

This is a direct mapping of Table 1 of Unicode Standard Annex 14, Revision 26.

Enumerator:
LBP_Undefined 

Undefined.

LBP_OP 

Opening punctuation.

LBP_CL 

Closing punctuation.

LBP_CP 

Closing parenthesis.

LBP_QU 

Ambiguous quotation.

LBP_GL 

Glue.

LBP_NS 

Non-starters.

LBP_EX 

Exclamation/Interrogation.

LBP_SY 

Symbols allowing break after.

LBP_IS 

Infix separator.

LBP_PR 

Prefix.

LBP_PO 

Postfix.

LBP_NU 

Numeric.

LBP_AL 

Alphabetic.

LBP_ID 

Ideographic.

LBP_IN 

Inseparable characters.

LBP_HY 

Hyphen.

LBP_BA 

Break after.

LBP_BB 

Break before.

LBP_B2 

Break on either side (but not pair).

LBP_ZW 

Zero-width space.

LBP_CM 

Combining marks.

LBP_WJ 

Word joiner.

LBP_H2 

Hangul LV.

LBP_H3 

Hangul LVT.

LBP_JL 

Hangul L Jamo.

LBP_JV 

Hangul V Jamo.

LBP_JT 

Hangul T Jamo.

LBP_AI 

Ambiguous (alphabetic or ideograph).

LBP_BK 

Break (mandatory).

LBP_CB 

Contingent break.

LBP_CR 

Carriage return.

LBP_LF 

Line feed.

LBP_NL 

Next line.

LBP_SA 

South-East Asian.

LBP_SG 

Surrogates.

LBP_SP 

Space.

LBP_XX 

Unknown.


Function Documentation

utf32_t lb_get_next_char_utf16 ( const utf16_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-16 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-16 surrogate pair.

Parameters:
[in] s input UTF-16 string
[in] len length of the string in words
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered
utf32_t lb_get_next_char_utf32 ( const utf32_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-32 sequence.

The index will be advanced to the next character.

Parameters:
[in] s input UTF-32 string
[in] len length of the string in dwords
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered
utf32_t lb_get_next_char_utf8 ( const utf8_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-8 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-8 sequence.

Parameters:
[in] s input UTF-8 string
[in] len length of the string in bytes
[in,out] ip pointer to the index
Returns:
the Unicode character beginning at the index; or EOS if end of input is encountered
void set_linebreaks ( const void *  s,
size_t  len,
const char *  lang,
char *  brks,
get_next_char_t  get_next_char 
)

Sets the line breaking information for a generic input string.

Parameters:
[in] s input string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR
[in] get_next_char function to get the next UTF-32 character

Variable Documentation

Default line breaking properties as from the Unicode Web site.

Association data of language-specific line breaking properties with language names.

This is the definition for the static data in this file. If you want more flexibility, or do not need the data here, you may want to redefine lb_prop_lang_map in your C source file.


Generated by  doxygen 1.6.2