Academia Sinica Balanced Corpus
1. The Sinica Corpus

Academia Sinica Balanced Corpus (Sinica Corpus) is the first proportionally sampled Chinese corpus with part-of-speech tagging. The corpus (Sinica 1.0) was compiled and opened to the research community through direct license in 1995 (Huang et al. 1995). Its size was two million words. After 10 years of further development, it was upgraded to the Sinica 5.0 with ten million words in 2005. Its on-line web service is available at http://asbc.iis.sinica.edu.tw. The corpus can also be a…

