linebreakdef.h File Reference

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm. More...

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures
struct	LineBreakProperties
	Struct for entries of line break properties. More...
struct	LineBreakPropertiesLang
	Struct for association of language-specific line breaking properties with language names. More...
Defines
#define	EOS 0xFFFF
	Constant value to mark the end of string.
Typedefs
typedef utf32_t(*	get_next_char_t )(const void , size_t, size_t )
	Abstract function interface for lb_get_next_char_utf8, lb_get_next_char_utf16, and lb_get_next_char_utf32.
Enumerations
enum	LineBreakClass { LBP_Undefined, LBP_OP, LBP_CL, LBP_CP, LBP_QU, LBP_GL, LBP_NS, LBP_EX, LBP_SY, LBP_IS, LBP_PR, LBP_PO, LBP_NU, LBP_AL, LBP_ID, LBP_IN, LBP_HY, LBP_BA, LBP_BB, LBP_B2, LBP_ZW, LBP_CM, LBP_WJ, LBP_H2, LBP_H3, LBP_JL, LBP_JV, LBP_JT, LBP_AI, LBP_BK, LBP_CB, LBP_CR, LBP_LF, LBP_NL, LBP_SA, LBP_SG, LBP_SP, LBP_XX }
	Line break classes. More...
Functions
utf32_t	lb_get_next_char_utf8 (const utf8_t s, size_t len, size_t ip)
	Gets the next Unicode character in a UTF-8 sequence.
utf32_t	lb_get_next_char_utf16 (const utf16_t s, size_t len, size_t ip)
	Gets the next Unicode character in a UTF-16 sequence.
utf32_t	lb_get_next_char_utf32 (const utf32_t s, size_t len, size_t ip)
	Gets the next Unicode character in a UTF-32 sequence.
void	set_linebreaks (const void s, size_t len, const char lang, char *brks, get_next_char_t get_next_char)
	Sets the line breaking information for a generic input string.
Variables
struct LineBreakProperties	lb_prop_default []
	Default line breaking properties as from the Unicode Web site.
struct LineBreakPropertiesLang	lb_prop_lang_map []
	Association data of language-specific line breaking properties with language names.

Detailed Description

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm.

Version:: 2.1, 2011/05/07

Author:: Wu Yongwei

Define Documentation

#define EOS 0xFFFF

Constant value to mark the end of string.

It is not a valid Unicode character.

Typedef Documentation

typedef utf32_t(* get_next_char_t)(const void *, size_t, size_t *)

Abstract function interface for lb_get_next_char_utf8, lb_get_next_char_utf16, and lb_get_next_char_utf32.

Enumeration Type Documentation

enum LineBreakClass

Line break classes.

This is a direct mapping of Table 1 of Unicode Standard Annex 14, Revision 26.

Enumerator:

LBP_Undefined	Undefined.
LBP_OP	Opening punctuation.
LBP_CL	Closing punctuation.
LBP_CP	Closing parenthesis.
LBP_QU	Ambiguous quotation.
LBP_GL	Glue.
LBP_NS	Non-starters.
LBP_EX	Exclamation/Interrogation.
LBP_SY	Symbols allowing break after.
LBP_IS	Infix separator.
LBP_PR	Prefix.
LBP_PO	Postfix.
LBP_NU	Numeric.
LBP_AL	Alphabetic.
LBP_ID	Ideographic.
LBP_IN	Inseparable characters.
LBP_HY	Hyphen.
LBP_BA	Break after.
LBP_BB	Break before.
LBP_B2	Break on either side (but not pair).
LBP_ZW	Zero-width space.
LBP_CM	Combining marks.
LBP_WJ	Word joiner.
LBP_H2	Hangul LV.
LBP_H3	Hangul LVT.
LBP_JL	Hangul L Jamo.
LBP_JV	Hangul V Jamo.
LBP_JT	Hangul T Jamo.
LBP_AI	Ambiguous (alphabetic or ideograph).
LBP_BK	Break (mandatory).
LBP_CB	Contingent break.
LBP_CR	Carriage return.
LBP_LF	Line feed.
LBP_NL	Next line.
LBP_SA	South-East Asian.
LBP_SG	Surrogates.
LBP_SP	Space.
LBP_XX	Unknown.

Function Documentation

utf32_t lb_get_next_char_utf16	(	const utf16_t *	s,
		size_t	len,
		size_t *	ip
	)

Gets the next Unicode character in a UTF-16 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-16 surrogate pair.

Parameters:

`[in]`	s	input UTF-16 string
`[in]`	len	length of the string in words
`[in,out]`	ip	pointer to the index

Returns:: the Unicode character beginning at the index; or EOS if end of input is encountered

utf32_t lb_get_next_char_utf32	(	const utf32_t *	s,
		size_t	len,
		size_t *	ip
	)

Gets the next Unicode character in a UTF-32 sequence.

The index will be advanced to the next character.

Parameters:

`[in]`	s	input UTF-32 string
`[in]`	len	length of the string in dwords
`[in,out]`	ip	pointer to the index

Returns:: the Unicode character beginning at the index; or EOS if end of input is encountered

utf32_t lb_get_next_char_utf8	(	const utf8_t *	s,
		size_t	len,
		size_t *	ip
	)

Gets the next Unicode character in a UTF-8 sequence.

The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-8 sequence.

Parameters:

`[in]`	s	input UTF-8 string
`[in]`	len	length of the string in bytes
`[in,out]`	ip	pointer to the index

Returns:: the Unicode character beginning at the index; or EOS if end of input is encountered

void set_linebreaks	(	const void *	s,
		size_t	len,
		const char *	lang,
		char *	brks,
		get_next_char_t	get_next_char
	)

Sets the line breaking information for a generic input string.

Parameters:

`[in]`	s	input string
`[in]`	len	length of the input
`[in]`	lang	language of the input
`[out]`	brks	pointer to the output breaking data, containing LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR
`[in]`	get_next_char	function to get the next UTF-32 character

Variable Documentation

struct LineBreakProperties lb_prop_default[]

Default line breaking properties as from the Unicode Web site.

struct LineBreakPropertiesLang lb_prop_lang_map[]

Association data of language-specific line breaking properties with language names.

This is the definition for the static data in this file. If you want more flexibility, or do not need the data here, you may want to redefine lb_prop_lang_map in your C source file.

linebreakdef.h File Reference

Data Structures

Defines

Typedefs

Enumerations

Functions

Variables

Detailed Description

Define Documentation

Typedef Documentation

Enumeration Type Documentation

Function Documentation

Variable Documentation