Helo

通过百度城市拥堵指数平台爬取道路的拥堵指数
说明:通过百度智慧交通平台城市拥堵指数平台 http://jiaotong.baidu.com/top/ 爬取道路...
扫描右侧二维码阅读全文
11
2019/10

通过百度城市拥堵指数平台爬取道路的拥堵指数

说明:通过百度智慧交通平台城市拥堵指数平台 http://jiaotong.baidu.com/top/ 爬取道路的拥堵指数并存在Mysql数据库中。

百度-中国城市拥堵指数平台

百度-中国城市拥堵指数平台是百度地图旗下道路拥堵情况发布平台(http://jiaotong.baidu.com/top/),每隔5分钟刷新一次。先介绍几个url:

爬取特定道路一天的拥堵系数和速度

要先修改代码下的数据库信息,并手动创建tablename中的表名,表格属性name (name,speed,congestIndex,datatime,crawlertime),类型(text,float,float,text,datetime)

import re
import requests
from lxml import etree
import pymysql
import time
import json

def bridge():
    road_list = ['厦门大桥-1','集美大桥-1','杏林大桥-1','海沧大桥-5','新阳大桥-1','同安大桥-8','翔安隧道-1','银江路辅路-2','嘉庚路-1','石鼓路-1','盛光路-1','浔江路-1','集源路-3','鳌园路-6','海堤路-2','滨水路-1','集美大道-3','同集南路-3','乐海路-1','岑西路-1']
    result=[]
    for i in range(len(road_list)):
        sUrl = 'https://jiaotong.baidu.com/trafficindex/city/roadcurve?cityCode=194&id='+road_list[i]
        res = requests.get(url=sUrl).content
        res=res.decode("utf-8")
        total_json = json.loads(res)
        result.append(total_json.get('data').get('curve'))
    return result

def write_sql(result):
    client = pymysql.connect("119.23.111.11","root","password","traffic")
    col = client.cursor()
    tablename = ['bridge_xiamen','bridge_jimei','bridge_xinglin','bridge_haicang','bridge_xinyang','bridge_tongan','tunnel_xiangan','road_yinjiangfulu','road_jiageng','road_shigu','road_shengguang','road_xinjiang','road_jiyuan','road_aoyuan','road_haidi','road_bingshui','road_jimeidadao','road_tongjinan','road_lehai','road_cenxi']
    for i in range(len(tablename)):
        for j in range(len(result[i])):
            t0 = tablename[i]
            t1 = re.sub("[A-Za-z0-9\!\%\[\]\,\。\-]", "", result[i][j].get('roadsegid'))    #去除名字中的-和序号
            t2 = result[i][j].get('speed')
            t3 = result[i][j].get('congestIndex')
            t4 = result[i][j].get('datatime')
            t5 = time.strftime("%Y-%m-%d %H:%M:%S",time.localtime())    #获取爬取时间
            print(t0,t1,t2,t3,t4,t5)
            col.execute("insert into %s (name,speed,congestIndex,datatime,crawlertime) values('%s','%s','%s','%s','%s')" % (t0,t1,t2,t3,t4,t5))
        client.commit()
    client.close()

result = bridge()
write_sql(result)

爬取特定道路最近一条的拥堵系数和速度

要先修改代码下的数据库信息,并手动创建tablename中的表名,表格属性name (name,speed,congestIndex,datatime,crawlertime),类型(text,float,float,text,datetime)

import re
import requests
from lxml import etree
import pymysql
import time
import json
import threading

def bridge():
    road_list = ['厦门大桥-1','集美大桥-1','杏林大桥-1','海沧大桥-5','新阳大桥-1','同安大桥-8','翔安隧道-1','银江路辅路-2','嘉庚路-1','石鼓路-1','盛光路-1','浔江路-1','集源路-3','鳌园路-6','海堤路-2','滨水路-1','集美大道-3','同集南路-3','乐海路-1','岑西路-1']
    result=[]
    for i in range(len(road_list)):
        sUrl = 'https://jiaotong.baidu.com/trafficindex/city/roadcurve?cityCode=194&id='+road_list[i]
        res = requests.get(url=sUrl).content
        res=res.decode("utf-8")
        total_json = json.loads(res)
        result.append(total_json.get('data').get('curve'))
    return result

def write_sql(result):
    client = pymysql.connect("119.23.111.11","root","password","traffic")
    col = client.cursor()
    tablename = ['bridge_xiamen','bridge_jimei','bridge_xinglin','bridge_haicang','bridge_xinyang','bridge_tongan','tunnel_xiangan','road_yinjiangfulu','road_jiageng','road_shigu','road_shengguang','road_xinjiang','road_jiyuan','road_aoyuan','road_haidi','road_bingshui','road_jimeidadao','road_tongjinan','road_lehai','road_cenxi']
    for i in range(len(tablename)):
        t0 = tablename[i]
        t1 = re.sub("[A-Za-z0-9\!\%\[\]\,\。\-]", "", result[i][-1].get('roadsegid'))
        t2 = result[i][-1].get('speed')
        t3 = result[i][-1].get('congestIndex')
        t4 = result[i][-1].get('datatime')
        t5 = time.strftime("%Y-%m-%d %H:%M:%S",time.localtime())
        #print(t0,t1,t2,t3,t4,t5)
        col.execute("insert into %s (name,speed,congestIndex,datatime,crawlertime) values('%s','%s','%s','%s','%s')" % (t0,t1,t2,t3,t4,t5))
    client.commit()
    client.close()

def pydata():
    result = bridge()
    write_sql(result)
    timer = threading.Timer(300,pydata)
    timer.start()

if __name__ == "__main__":
    timer = threading.Timer(300,pydata)
    timer.start()
Last modification:October 12th, 2019 at 08:36 am
If you think my article is useful to you, please feel free to appreciate

2 comments

  1. 趣知识

    在北京,5分钟的数据,有时用处不大

    1. Helo
      @趣知识

      主要是长期收集为以后的分析做准备哈

Leave a Comment