User Tools

Site Tools


code:shell:fetchpm25

Fetch the pm 2.5 for all the cities in China

Get the pm 2.5 from http://www.soupm25.com

With this script you can get all the pm 2.5 in just one file “all_city/pm25_all.txt”, and it will auto backup the pm 2.5 in the directory “history”.

So have fun.

Before you can run it, you need install “dos2unix”

apt-get install dos2unix

#!/bin/sh
 
RAW_HTML="index_html"
CITY="all_city.txt"
TMP="all_city"
HISTORY="history"
OUTPUT="${TMP}/pm25_all.txt"
OUTPUT_TMP="${TMP}/pm25_all_latest.txt"
wget http://www.soupm25.com/ -O ${RAW_HTML}
ALL_URLS=`cat ${RAW_HTML}|grep ".html"|sed "s/<a href=//g"|sed "s/\"//g"|sed "s/\/\///g"|sed "s/>//g"|sed "s/<\/a//g"|cut -d "=" -f4|sed "s/<\/li//g"|grep -v -E "^$|</html|DOCTYPE"|awk -F "www" '{print "http://www"$2}'|awk -F "html" '{print $1"html "$2}'|sed 's/  / /g'|sort|uniq|dos2unix`
ALL_CITY=`echo "${ALL_URLS}"|grep "city"`
echo "${ALL_CITY}" > ${CITY}
 
rm "${OUTPUT_TMP}"
 
mkdir ${HISTORY}
mkdir "${TMP}"
cd "${TMP}"
 
while read line
do
    i="${line}"
    url=`echo "${i}"|cut -d " " -f1`
    city=`echo "${i}"|awk -F "city/" '{print $2}'|sed 's/.html /_/g'`
    pingyin=`echo "${city}"|cut -d "_" -f1`
    #echo "${pingyin}"
    mkdir "${pingyin}"
    cd ${pingyin}
    echo wget "${url}" -O "${pingyin}.html"
    wget "${url}" -O "${pingyin}.html"
    PM25=`cat "${pingyin}.html"|grep cityid|grep -v "config"|awk -F "cityid" '{print $2}'|cut -d "<" -f1|cut -d ">" -f2|sed 's/ //g'`
    echo ""${city}" "${PM25}"" >>"../../${OUTPUT_TMP}"
    cd ..
done < ../${CITY}
 
cd ..
 
cp ${OUTPUT_TMP} ${OUTPUT}
cp ${OUTPUT} ${HISTORY}/"`date`.txt"

I have deploy it on my vps, so you can get it from here

/var/www/dokuwiki/wiki/data/pages/code/shell/fetchpm25.txt · Last modified: 2016/05/05 13:07 by 127.0.0.1