Is there any good open-source or freely available Chinese segmentation algorithm available? -


as phrased in question, i'm looking free and/or open-source text-segmentation algorithm chinese, understand difficult task solve, there many ambiguities involed. know there's google's api, rather black-box, i.e. not many information of doing passing through.

the keyword text-segmentation chinese should 中文分词 in chinese.

good , active open-source text-segmentation algorithm :

  1. 盘古分词(pan gu segment) : c#, snapshot
  2. ik-analyzer : java
  3. ictclas : c/c++, java, c#, demo
  4. nlpbamboo : c, php, postgresql
  5. httpcws : based on ictclas, demo
  6. mmseg4j : java
  7. fudannlp : java, demo
  8. smallseg : python, java, demo
  9. nseg : nodejs
  10. mini-segmenter: python

other

  1. google code : http://code.google.com/query/#q=中文分词
  2. oschina (open source china)

sample

  1. google chrome (chromium) : src, cc_cedict.txt (73,145 chinese words/pharases)

    • in text field or textarea of google chrome chinese sentences, press ctrl+ or ctrl+

    • double click on 中文分词指的是将一个汉字序列切分成一个一个单独的词


Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -