Skip to content

Scrapy

Posted on:September 23, 2022 at 03:22 PM
  1. To start a new project
$ scrapy startproject projectname.
  1. To create a simple spider within a project. domain: for eg. www.reddit.com/r/gameofthrones/.
$ scrapy genspider spydername domain
  1. To generate a “crawler” within a project
$ scrapy genspider -t crawl crawlername domain (for eg. 	www.reddit.com/r/gameofthrones/).
  1. To run a spider from scrapy shell
$ scrapy crawl netaproj -o items.json
$ scrapy crawl netaproj -t csv -o items.csv
  1. To run scrapy shell
$ scrapy shell
  1. To fetch and view a page from scrapy shell
$ fetch(‘url’)
$ view(response) # will open page in default browser
$ response.text # contains text of the response

Note : The response will be in response variable. One can use response.css() or response.xpath() to extract specific information.