src/wordbreak.h File Reference

Header file for the word breaking (segmentation) algorithm. More...

#include <stddef.h>
#include "unibreakbase.h"
Include dependency graph for wordbreak.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Defines

#define WORDBREAK_BREAK   0
 Break is allowed.
#define WORDBREAK_NOBREAK   1
 No break is allowed.
#define WORDBREAK_INSIDEACHAR   2
 A UTF-8/16 sequence is unfinished.

Functions

void init_wordbreak (void)
 Initializes the wordbreak internals.
void set_wordbreaks_utf8 (const utf8_t *s, size_t len, const char *lang, char *brks)
 Sets the word breaking information for a UTF-8 input string.
void set_wordbreaks_utf16 (const utf16_t *s, size_t len, const char *lang, char *brks)
 Sets the word breaking information for a UTF-16 input string.
void set_wordbreaks_utf32 (const utf32_t *s, size_t len, const char *lang, char *brks)
 Sets the word breaking information for a UTF-32 input string.

Detailed Description

Header file for the word breaking (segmentation) algorithm.

Version:
3.0, 2015/05/10
Author:
Tom Hacohen

Define Documentation

#define WORDBREAK_BREAK   0

Break is allowed.

#define WORDBREAK_INSIDEACHAR   2

A UTF-8/16 sequence is unfinished.

#define WORDBREAK_NOBREAK   1

No break is allowed.


Function Documentation

void init_wordbreak ( void   ) 

Initializes the wordbreak internals.

It currently does nothing, but it may in the future.

void set_wordbreaks_utf16 ( const utf16_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the word breaking information for a UTF-16 input string.

Parameters:
[in] s input UTF-16 string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing WORDBREAK_BREAK, WORDBREAK_NOBREAK, or WORDBREAK_INSIDEACHAR
void set_wordbreaks_utf32 ( const utf32_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the word breaking information for a UTF-32 input string.

Parameters:
[in] s input UTF-32 string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing WORDBREAK_BREAK, WORDBREAK_NOBREAK, or WORDBREAK_INSIDEACHAR
void set_wordbreaks_utf8 ( const utf8_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the word breaking information for a UTF-8 input string.

Parameters:
[in] s input UTF-8 string
[in] len length of the input
[in] lang language of the input
[out] brks pointer to the output breaking data, containing WORDBREAK_BREAK, WORDBREAK_NOBREAK, or WORDBREAK_INSIDEACHAR

Generated by  doxygen 1.6.2