Subtlex-CH: Chinese word and game of thrones s01e04 direct character frequencies based on film subtitles.
It uses three categories of features: character identity n-grams, morphological and character reduplication features.
In naacl 2009 Third Workshop on Syntax and Structure in Statistical Translation.
Nevertheless, EMR makes the sensitive healthcare data much easier to collect, process, store and publish.
Full-text Article Jun 2010, read full-text.
Abstract: We present the Jinan Chinese Learner Corpus, a large collection of L2 Chinese texts produced by learners that can be used for educational tasks.Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency.Overview, we work on a wide variety of research in Chinese Natural Language Processing and speech processing, including word segmentation, part-of-speech tagging, syntactic and semantic parsing, machine translation, disfluency detection, prosody, and other areas.Outstanding Paper Award Honorable Mention wanxiang Che, Mengqiu Wang and Christopher.In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition.