You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v0.0.1.5: mecab-ko 0.999 support, Windows build fix, dict_index()
- Fix Windows build: add $(SHLIB):$(MECAB_OBJS) dependency so MeCab
.o files are compiled before linking; patch dllimport, _stdcall,
size_t overload, progress_bar gutting, and HAVE_WINDOWS_H for
MinGW/Rtools45 compatibility.
- Korean builds now use mecab-ko-msvc 0.999 (Pusnow/mecab-ko-msvc)
instead of mecab-ko 0.9.2. Japanese builds continue using
taku910/mecab 0.996. Selected via MECAB_LANG env var (default: ko).
- Add dict_index() R function wrapping mecab_dict_index, allowing
user dictionary compilation directly from R without the external
mecab-dict-index command-line tool.
- Update CI to install mecab-ko 0.999 pre-built binaries on Ubuntu
and macOS for Korean builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Linux and macOS, if MeCab is already installed system-wide (detected via `mecab-config`), RcppMeCab uses the system installation regardless of `MECAB_LANG`.
If you already have MeCab installed (e.g. via `brew install mecab` on macOS, or `apt install libmecab-dev` on Linux), RcppMeCab will use your system installation.
28
39
29
-
### Language selection (Windows)
40
+
### Language selection
30
41
31
-
On Windows, set `MECAB_LANG` before installation to choose the MeCab language variant. The default is `ko` (Korean).
42
+
Set `MECAB_LANG` before installation to choose the MeCab variant:
32
43
33
44
```r
34
45
# Korean (default)
@@ -44,7 +55,7 @@ install.packages("RcppMeCab", type = "source")
44
55
You need a MeCab dictionary for your target language:
45
56
46
57
+**Japanese**: Install [MeCab](http://taku910.github.io/mecab/) and IPAdic, or on macOS: `brew install mecab mecab-ipadic`
47
-
+**Korean**: Install [mecab-ko](https://bitbucket.org/eunjeon/mecab-ko) and [mecab-ko-dic](https://bitbucket.org/eunjeon/mecab-ko-dic). Mirrors of these files are also available on the [RcppMeCab releases page](https://github.com/junhewk/RcppMeCab/releases/tag/0.0.1.0) in case Bitbucket is unavailable. On Windows: install [mecab-ko-msvc](https://github.com/Pusnow/mecab-ko-msvc) and [mecab-ko-dic-msvc](https://github.com/Pusnow/mecab-ko-dic-msvc) in `C:\mecab`
58
+
+**Korean**: Install [mecab-ko-dic](https://github.com/Pusnow/mecab-ko-msvc/releases) (available as `mecab-ko-dic.zip`/`mecab-ko-dic.tar.gz` from mecab-ko-msvc releases)
48
59
+**Chinese**: Install MeCab with [MeCab Chinese Dic](http://www.52nlp.cn/%E7%94%A8mecab%E6%89%93%E9%80%A0%E4%B8%80%E5%A5%97%E5%AE%9E%E7%94%A8%E7%9A%84%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D%E7%B3%BB%E7%BB%9F%E4%B8%89%EF%BC%9Amecab-chinese)
49
60
50
61
## Usage
@@ -65,32 +76,27 @@ posParallel(sentence) # parallelized, faster for large inputs
65
76
+`join`: if `TRUE` (default), output is `morpheme/tag`; if `FALSE`, output is `morpheme` with tag as attribute
66
77
+`format`: `"list"` (default) or `"data.frame"`
67
78
+`sys_dic`: directory containing `dicrc`, `model.bin`, etc. Set a default with `options(mecabSysDic = "/path/to/dic")`
68
-
+`user_dic`: path to a user dictionary compiled by `mecab-dict-index`
79
+
+`user_dic`: path to a user dictionary compiled by `dict_index()`
69
80
70
81
Note: provide full paths for `sys_dic` and `user_dic` (no tilde `~/` expansion).
71
82
72
83
## Compiling a user dictionary
73
84
74
-
MeCab's `DictionaryCompiler` API calls `die()`, which would crash the R session, so it is not exposed through RcppMeCab. Use the `mecab-dict-index` command-line tool instead.
75
-
76
-
You need a `model_file` for automatic cost estimation:
77
-
78
-
+ Japanese: [model_file in ipadic](https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7bnc5aFZSTE9qNnM)
79
-
+ Korean: `model.bin` in [mecab-ko-dic](https://bitbucket.org/eunjeon/mecab-ko-dic)
85
+
RcppMeCab provides the `dict_index()` function to compile user dictionaries directly from R, without needing the `mecab-dict-index` command-line tool.
80
86
81
87
Prepare your entries as a CSV file ([Japanese format](http://taku910.github.io/mecab/dic.html), [Korean format](https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/final/user-dic/README.md)), then compile:
82
88
83
-
```sh
84
-
/usr/local/libexec/mecab/mecab-dict-index \
85
-
-m /path/to/model.bin \
86
-
-d /path/to/mecab-dic \
87
-
-u userdic.dic \
88
-
-f utf8 -t utf8 \
89
-
entries.csv
89
+
```r
90
+
dict_index(
91
+
dic_csv="entries.csv",
92
+
out_dic="userdic.dic",
93
+
dic_dir="/path/to/mecab-dic"
94
+
)
95
+
96
+
# Then use the compiled dictionary:
97
+
pos("some text", user_dic="userdic.dic")
90
98
```
91
99
92
-
On Windows, use `mecab-dict-index.exe` bundled with [mecab-ko-msvc](https://github.com/Pusnow/mecab-ko-msvc) or the MeCab binary installer.
0 commit comments