Analysed syntax and Semantics of Corpus of Text Documents Retrieved from Web Scraping of News articles from Inshorts and followed the Standard NLP Workflow of the CRISP-DM model.
Analysed syntax and Semantics of Corpus of Text Documents Retrived from Web Scraping of News articles from Inshorts and followed the Standard NLP Workflow of the CRISP-DM model.
@dipanzan.sarkar">Credits
A NLP based Project which scraps the news articles of mainly 3 categories:
from InShorts using website urls.
Finally after numerous preprocessing steps like Text Wrangling, Removing accented characters, Removing html tags, Lemmatization, Stemming, build a text normalizer to create dataset for applying sentiment analysis.
Sentiment analysis is perhaps one of the most popular applications of NLP.
The key aspect of sentiment analysis is to analyze a body of text for understanding the opinion expressed by it. Typically, quantifying this sentiment with a positive or negative value, called polarity.
This project can be used to create following key features:
Build this project to learn the nuances of NLP of handling Text Data.
Note: Spacy may give lot of errors, one should make sure to proper install it.
Further more refer to the requirements.txt
Just want to run the project on your local machine:
Make sure you install all the packages mentioned in requirements.txt.
$ git clone https://github.com/codekhal/Inshorts-NLP
$ cd Inshorts-NLP
$ run jupyter or any other preferable editor
.__Inshorts-NLP__
├── contractions.py
├── img
│ ├── scraping.png
│ ├── Sentiment_Score_News_Category.png
│ ├── sentiments.png
│ ├── stemming.png
│ ├── Visualizing_Sentiments_Box_Plot.png
│ └── workflow.png
├── LICENSE
├── news.csv
├── NLP_main.ipynb
├── __pycache__
│ └── contractions.cpython-35.pyc
├── README.md
└── requirements.txt
2 directories, 13 files
Built a web scraper which had scraped news articles from Inshorts website urls.
Then using numerous text-preprocessing techniques, cleaned the data for further processing.
After this, turn came for sentiment analysis on the data.
Various popular lexicons are used for sentiment analysis, including the following.
Used NLTK, AFINN and TextBlob library.
Using both data visualization tools and pandas dataframe techniques to show results of the dataset.
The sentiment score of different genres of news category is shown with the help of the following plots.
Lastly, the count of three sentiments in different genres of news articles is depicted with the help of factor or bar plot.
Future Work that could be done:
Flask/Flask App Deployment - Deploy the app so that couldbe efficiently used.
Use of Deep Learning - One may try and use deep learning for building a text summurizer and tone detector.
Kindly follow the Contributions Guildlines before you create any pull requests or issues. Though feel free to contribute in any form.
Open Source <3
Feel free to reach out to me