Scraping the data (folder name: neta_scraper):
Run following command from project folder(neta_scraper) to get lok sabha data:
$ scrapy crawl lsbot -t csv -o lsabha.csv
Run following command from project folder(neta_scraper) to get assembly data:
$ scrapy crawl netabot -t csv -o mla.csv
Cleaning the data (folder name: neta_cleaner):
- Using script to clean data :
clean_data.py
. Use following Python script to clean data. Herelsabha.csv
is raw Lok Sabha data andmla.csv
is raw Assembly data. Cleaned data will be in filesls_cleaned_data.csv
andmla_cleaned_data.csv
respectively.
$ python clean_data.py lsabha.csv mla.csv
- Ipython notebook to clean data :
neta_cleaner.ipynb
Analyzing the data (folder name: neta_analysis):
- Ipython notebook to analyze the data :
neta_analysis.ipynb
Visualizing the data using geopandas (folder name: neta_geopandas_visual):
- Ipython notebook to visualize state data on map :
neta_state_mapviz.ipynb
- Ipython notebook to visualize PC data on map :
neta_pc_mapviz.ipynb
Note : Check for compatibility of geojson file on : http://geojson.io/#map=2/20.0/0.0
Convert and simplify shape file to geojson or other format:
- Upload dbf, shp, shx and prj file to https://mapshaper.org.
- (Optional) Simplyfy the file to reduce size. Click on
Simplify
and selectPrevent shape removal
also select weigted area (default). Then click apply. - Reduce the file size by using the slider on top. Also repair line intersections by clicking on
repair
in left. - Export the file to json and select
don't remove shapes
.