You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Chinese (mecab-jieba) dictionary support to download_dic and lang
- download_dic("zh") downloads and compiles the mecab-jieba dictionary
(584k entries, jieba word frequencies + CC-CEDICT enrichment)
- lang = "zh" now available in pos(), posParallel(), and set_dic()
- Source: https://github.com/lindera/mecab-jieba
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: R/pos.r
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -21,9 +21,9 @@
21
21
#' @param sentence A character vector of any length. For analyzing multiple sentences, put them in one character vector.
22
22
#' @param join A bool to decide the output format. The default value is TRUE. If FALSE, the function will return morphemes only, and tags put in the attribute. if \code{format="data.frame"}, then this will be ignored.
23
23
#' @param format A data type for the result. The default value is "list". You can set this to "data.frame" to get a result as data frame format.
24
-
#' @param lang Optional language code (\code{"ja"} or \code{"ko"}) to select
25
-
#' a dictionary installed via \code{\link{download_dic}}. When specified, this
26
-
#' overrides \code{sys_dic}.
24
+
#' @param lang Optional language code (\code{"ja"}, \code{"ko"}, or \code{"zh"})
25
+
#' to select a dictionary installed via \code{\link{download_dic}}. When
26
+
#' specified, this overrides \code{sys_dic}.
27
27
#' @param sys_dic A location of system MeCab dictionary. The default value is "".
28
28
#' @param user_dic A location of user-specific MeCab dictionary. The default value is "".
29
29
#' @return A string vector or a list of POS tagged morpheme will be returned in conjoined character
@@ -52,7 +52,7 @@ pos <- function(sentence, join = TRUE, format = c("list", "data.frame"), lang =
Copy file name to clipboardExpand all lines: R/posParallel.R
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -26,9 +26,9 @@
26
26
#' @param sentence A character vector of any length. For analyzing multiple sentences, put them in one character vector.
27
27
#' @param join A bool to decide the output format. The default value is TRUE. If FALSE, the function will return morphemes only, and tags put in the attribute. if \code{format="data.frame"}, then this will be ignored.
28
28
#' @param format A data type for the result. The default value is "list". You can set this to "data.frame" to get a result as data frame format.
29
-
#' @param lang Optional language code (\code{"ja"} or \code{"ko"}) to select
30
-
#' a dictionary installed via \code{\link{download_dic}}. When specified, this
31
-
#' overrides \code{sys_dic}.
29
+
#' @param lang Optional language code (\code{"ja"}, \code{"ko"}, or \code{"zh"})
30
+
#' to select a dictionary installed via \code{\link{download_dic}}. When
31
+
#' specified, this overrides \code{sys_dic}.
32
32
#' @param sys_dic A location of system MeCab dictionary. The default value is "".
33
33
#' @param user_dic A location of user-specific MeCab dictionary. The default value is "".
34
34
#' @return A string vector or a list of POS tagged morpheme will be returned in conjoined character
0 commit comments