Summarises text for articles in Hindi
This is a project made by:
requirements.txt
This model generates a summary using a Document Term Matrix and frequency count. To use this
method_1
foldervalid
folder named as article.txt
.Run the extractive.py
file using python3.
You will end up getting a summary named as summary.txt
inside the valid
folder.
This model generates a summary using modified TF-IDF of the document dataset, with weights attached. To use this
method_2
foldervalid
folderInput the name of your file which is within that directory
You will end up getting a summary + wordcloud in the output folder :)
n.txt
in the Gold folder in the Summaries directory. Here n
is the next number in the sequence in the Gold folder.1.txt
2.txt
… 7.txt
etc.Extractive
and RuleBased
directories.accuracy.py
file on line number 15, change the code to for i in range(1, n+1):
where n is the same variable as above.9.txt
you would change the code to for i in range(1, 10):
python accuracy.py
Rouge_1.py
file.python accuracy.py > output.txt
to enable better formatting.For Method I we got an accuracy of 74.1%
For Method II we got an accuracy of 83.4%
The evaulation was done based on the Rouge method proposed by Chin-Yew Lin. For this project, since the summarization has been extractive, only Rouge-I has been used. To generate the gold standard for the summaries, the annotation was done manually. For any given article, the annotators were asked to pick the most important sentences. The only rule was that the number of sentences they could choose was equal to 0.3N where N was the number of sentences in the initial article.
We thank the following for creating the gold standard summaries:
git remote add upstream https://github.com/AurumnPegasus/Text-Summariser.git
requirements.txt
)