Wednesday, March 28, 2018

Python: Installation of newspaper3k module


The Newspaper library, developed and maintained by Lucas Ou-Yang, is specially designed for extracting information from the websites of newspapers and magazines. The objective of this library is to extract and curate the articles from the newspapers and similar websites. To install Newspaper, use the following commands.
  • Install pip3 command needed to install newspaper3k package:
    $ sudo apt-get install python3-pip
    
  • Python development version, needed for Python.h:
    $ sudo apt-get install python-dev
    
  • lxml requirements:
    $ sudo apt-get install libxml2-dev libxslt-dev
    
  • For PIL to recognize .jpg images:
    $ sudo apt-get install libjpeg-dev zlib1g-dev libpng12-dev
    
NOTE: If you find problem installing libpng12-dev, try installing libpng-dev.
  • Download NLP related corpora:
    $ curl https://raw.githubusercontent.com/codelucas/newspaper/master/download_corpora.py | python3
    
  • Install the distribution via pip:
    $ pip3 install newspaper3k

No comments: