src/linebreakdef.h File Reference

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm. More...

#include "unibreakdef.h"
Include dependency graph for linebreakdef.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  LineBreakProperties
 Struct for entries of line break properties. More...
struct  LineBreakPropertiesLang
 Struct for association of language-specific line breaking properties with language names. More...
struct  LineBreakContext
 Context representing internal state of the line breaking algorithm. More...

Enumerations

enum  LineBreakClass {
  LBP_Undefined, LBP_OP, LBP_CL, LBP_CP,
  LBP_QU, LBP_GL, LBP_NS, LBP_EX,
  LBP_SY, LBP_IS, LBP_PR, LBP_PO,
  LBP_NU, LBP_AL, LBP_HL, LBP_ID,
  LBP_IN, LBP_HY, LBP_BA, LBP_BB,
  LBP_B2, LBP_ZW, LBP_CM, LBP_WJ,
  LBP_H2, LBP_H3, LBP_JL, LBP_JV,
  LBP_JT, LBP_RI, LBP_AI, LBP_BK,
  LBP_CB, LBP_CJ, LBP_CR, LBP_LF,
  LBP_NL, LBP_SA, LBP_SG, LBP_SP,
  LBP_XX
}
 

Line break classes.

More...

Functions

void lb_init_break_context (struct LineBreakContext *lbpCtx, utf32_t ch, const char *lang)
 Initializes line breaking context for a given language.
int lb_process_next_char (struct LineBreakContext *lbpCtx, utf32_t ch)
 Updates LineBreakingContext for the next code point and returns the detected break.
void set_linebreaks (const void *s, size_t len, const char *lang, char *brks, get_next_char_t get_next_char)
 Sets the line breaking information for a generic input string.

Variables

struct LineBreakProperties lb_prop_default []
 Default line breaking properties as from the Unicode Web site.
struct LineBreakPropertiesLang lb_prop_lang_map []
 Association data of language-specific line breaking properties with language names.

Detailed Description

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm.

Version:
3.0, 2015/05/10
Author:
Wu Yongwei
Petr Filipsky

Enumeration Type Documentation

Line break classes.

This is a direct mapping of Table 1 of Unicode Standard Annex 14, Revision 26.

Enumerator:
LBP_Undefined 

Undefined.

LBP_OP 

Opening punctuation.

LBP_CL 

Closing punctuation.

LBP_CP 

Closing parenthesis.

LBP_QU 

Ambiguous quotation.

LBP_GL 

Glue.

LBP_NS 

Non-starters.

LBP_EX 

Exclamation/Interrogation.

LBP_SY 

Symbols allowing break after.

LBP_IS 

Infix separator.

LBP_PR 

Prefix.

LBP_PO 

Postfix.

LBP_NU 

Numeric.

LBP_AL 

Alphabetic.

LBP_HL 

Hebrew letter.

LBP_ID 

Ideographic.

LBP_IN 

Inseparable characters.

LBP_HY 

Hyphen.

LBP_BA 

Break after.

LBP_BB 

Break before.

LBP_B2 

Break on either side (but not pair).

LBP_ZW 

Zero-width space.

LBP_CM 

Combining marks.

LBP_WJ 

Word joiner.

LBP_H2 

Hangul LV.

LBP_H3 

Hangul LVT.

LBP_JL 

Hangul L Jamo.

LBP_JV 

Hangul V Jamo.

LBP_JT 

Hangul T Jamo.

LBP_RI 

Regional indicator.

LBP_AI 

Ambiguous (alphabetic or ideograph).

LBP_BK 

Break (mandatory).

LBP_CB 

Contingent break.

LBP_CJ 

Conditional Japanese starter.

LBP_CR 

Carriage return.

LBP_LF 

Line feed.

LBP_NL 

Next line.

LBP_SA 

South-East Asian.

LBP_SG 

Surrogates.

LBP_SP 

Space.

LBP_XX 

Unknown.


Function Documentation

void lb_init_break_context ( struct LineBreakContext lbpCtx,
utf32_t  ch,
const char *  lang 
)

Initializes line breaking context for a given language.

Parameters:
[in,out] lbpCtx pointer to the line breaking context
[in] ch the first character to process
[in] lang language of the input
Postcondition:
the line breaking context is initialized
int lb_process_next_char ( struct LineBreakContext lbpCtx,
utf32_t  ch 
)

Updates LineBreakingContext for the next code point and returns the detected break.

Parameters:
[in,out] lbpCtx pointer to the line breaking context
[in] ch Unicode code point
Returns:
break result, one of LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, and LINEBREAK_NOBREAK
Postcondition:
the line breaking context is updated
void set_linebreaks ( const void *  s,
size_t  len,
const char *  lang,
char *  brks,
get_next_char_t  get_next_char 
)

Sets the line breaking information for a generic input string.

Parameters:
[in] s input string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR
[in] get_next_char function to get the next UTF-32 character

Variable Documentation

Default line breaking properties as from the Unicode Web site.

Association data of language-specific line breaking properties with language names.

This is the definition for the static data in this file. If you want more flexibility, or do not need the data here, you may want to redefine lb_prop_lang_map in your C source file.


Generated by  doxygen 1.6.2