项目作者: Aurelius84

项目描述 :
A project of N-gram model comparing FMM/BMM
高级语言: Python
项目地址: git://github.com/Aurelius84/N-gram.git
创建时间: 2016-11-19T08:59:58Z
项目社区:https://github.com/Aurelius84/N-gram

开源协议:

下载


N-gram

A project of N-gram model comparing FMM/BMM
Document:CocoNLP

Usage

Firstly, you should download the data ‘199801.txt’ from Internet and put it in the project dir.
Use as followed:

  1. python statistic.py

And you will get result like this:

  1. successfully to split corpus by train = 0.900000 test = 0.100000
  2. the total number of words is:53260
  3. The total number of bigram is : 403121.
  4. successfully witten-Bell smoothing! smooth_value:1.3372788850370981e-05
  5. the total number of punction is:47
  6. 召回率为:0.962036929819092
  7. 准确率为:0.9401303935308096
  8. F值为:0.950957517059212

Result

指标 FMM BMM Unigram Bigram
准确率 91.54% 92.13% 93.20% 94.01%
召回率 94.66% 95.07% 96.14% 96.20%
F1值 93.07% 93.58% 94.64% 95.10%