src/linebreak.h File Reference

Header file for the line breaking algorithm. More...

#include <stddef.h>
#include "unibreakbase.h"
Include dependency graph for linebreak.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Defines

#define LINEBREAK_MUSTBREAK   0
 Break is mandatory.
#define LINEBREAK_ALLOWBREAK   1
 Break is allowed.
#define LINEBREAK_NOBREAK   2
 No break is possible.
#define LINEBREAK_INSIDEACHAR   3
 A UTF-8/16 sequence is unfinished.

Functions

void init_linebreak (void)
 Initializes the second-level index to the line breaking properties.
void set_linebreaks_utf8 (const utf8_t *s, size_t len, const char *lang, char *brks)
 Sets the line breaking information for a UTF-8 input string.
void set_linebreaks_utf16 (const utf16_t *s, size_t len, const char *lang, char *brks)
 Sets the line breaking information for a UTF-16 input string.
void set_linebreaks_utf32 (const utf32_t *s, size_t len, const char *lang, char *brks)
 Sets the line breaking information for a UTF-32 input string.
int is_line_breakable (utf32_t char1, utf32_t char2, const char *lang)
 Tells whether a line break can occur between two Unicode characters.

Detailed Description

Header file for the line breaking algorithm.

Version:
3.0, 2015/05/10
Author:
Wu Yongwei

Define Documentation

#define LINEBREAK_ALLOWBREAK   1

Break is allowed.

#define LINEBREAK_INSIDEACHAR   3

A UTF-8/16 sequence is unfinished.

#define LINEBREAK_MUSTBREAK   0

Break is mandatory.

#define LINEBREAK_NOBREAK   2

No break is possible.


Function Documentation

void init_linebreak ( void   ) 

Initializes the second-level index to the line breaking properties.

If it is not called, the performance of get_char_lb_class_lang (and thus the main functionality) can be pretty bad, especially for big code points like those of Chinese.

int is_line_breakable ( utf32_t  char1,
utf32_t  char2,
const char *  lang 
)

Tells whether a line break can occur between two Unicode characters.

This is a wrapper function to expose a simple interface. Generally speaking, it is better to use set_linebreaks_utf32 instead, since complicated cases involving combining marks, spaces, etc. cannot be correctly processed.

Parameters:
char1 the first Unicode character
char2 the second Unicode character
lang language of the input
Returns:
one of LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR
void set_linebreaks_utf16 ( const utf16_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the line breaking information for a UTF-16 input string.

Parameters:
[in] s input UTF-16 string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR
void set_linebreaks_utf32 ( const utf32_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the line breaking information for a UTF-32 input string.

Parameters:
[in] s input UTF-32 string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR
void set_linebreaks_utf8 ( const utf8_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the line breaking information for a UTF-8 input string.

Parameters:
[in] s input UTF-8 string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing LINEBREAK_MUSTBREAK, LINEBREAK_ALLOWBREAK, LINEBREAK_NOBREAK, or LINEBREAK_INSIDEACHAR

Generated by  doxygen 1.6.2