如何使用scrapy框架循环爬京东数据后导入Mysql的方法-木庄网络博客

本文摘自php中文网，作者零到壹度，侵删。

本文主要为大家分享一篇J如何使用scrapy框架循环爬京东数据后导入Mysql的方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧，希望能帮助到大家。

京东是有反爬机制的，所以我用到用户代理、伪装成浏览器。

爬取数据是京东商城的手机信息 URL：https://list.jd.com/list.html?cat=9987,653,655&page=1

大概是9000多条数据，不在列表之内的商品没有算在内。

遇到的问题：

1、用户代理最好是用方法（use_proxy）封装起来，因为自己之前就是把代码直接写在parse下，遇到not enough values to unpack的问题，我实在不知道错误出在哪一句，就每句代码之后print，发现问题出在urlopen（），但是我反复试、查网上，也没发现错误在哪，写成方法就解决了，现在想来可能是因为parse方法是处理respose。

2、在把数据导入mysql之前，我先试着把数据导入到文件中，但是在导入中，发现x.txt的大小一直是0kb,1kb在变，没有增长，想想应该是覆盖了，本来是认为自己fh.close()写的位置不对,后来突然想到

fh = open("D:/pythonlianxi/result/4.txt", "w")写错了，应该要把'w'变成'a'。

3、导入数据库，碰到的问题主要是中文编码问题，要先打开mysql, show variables like '%char%';查看数据库的字符集编码形式，用对应的形式，比如我自己是utf8，用gbk就不好使。另外，在写连接mysql时 charset='utf8'不要忘记。

下面是具体代码：

1	`<span style="font-family: 微软雅黑, "Microsoft YaHei"; font-size: 16px;">conn = pymysql.connect(host="127.0.0.1", user="root", passwd="root", db="jingdong", charset="utf8")<br></span>`

import scrapy from scrapy.http import Request from jingdong.items import JingdongItem import re import urllib.error import urllib.request import pymysql class JdSpider(scrapy.Spider): name = 'jd'  allowed_domains = ['jd.com'] #start_urls = ['http://jd.com/'] header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"} #fh = open("D:/pythonlianxi/result/4.txt", "w") def start_requests(self):  return [Request("https://list.jd.com/list.html?cat=9987,653,655&page=1",callback=self.parse,headers=self.header,meta={"cookiejar":1})] def use_proxy(self,proxy_addr,url):  try: req=urllib.request.Request(url) req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36") proxy = urllib.request.ProxyHandler({"http": proxy_addr}) opener = urllib.request.build_opener(proxy, urllib.request.HTTPHandler) urllib.request.install_opener(opener) data=urllib.request.urlopen(req).read().decode("utf-8","ignore")  return data except urllib.error.URLError as e:  if hasattr(e,"code"):  print(e.code)  if hasattr(e,"reason"):  print(e.reason) except Exception as e:  print(str(e)) def parse(self, response): item=JingdongItem() proxy_addr = "61.135.217.7:80"   try: item["title"]=response.xpath("//p[@class='p-name']/a[@target='_blank']/em/text()").extract() item["pricesku"] =response.xpath("//li[@class='gl-item']/p/@data-sku").extract()  for j in range(2,166): url="https://list.jd.com/list.html?cat=9987,653,655&page="+str(j)  print(j) #yield item yield Request(url) pricepat = '"p":"(.*?)"'  personpat = '"CommentCountStr":"(.*?)",'   print("2k") #fh = open("D:/pythonlianxi/result/5.txt", "a") conn = pymysql.connect(host="127.0.0.1", user="root", passwd="root", db="jingdong", charset="utf8")  for i in range(0,len(item["pricesku"])): priceurl="https://p.3.cn/prices/mgets?&ext=11000000&pin=&type=1&area=1_72_4137_0&skuIds="+item["pricesku"][i] personurl = "https://club.jd.com/comment/productCommentSummaries.action?referenceIds=" + item["pricesku"][i] pricedata=self.use_proxy(proxy_addr,priceurl) price=re.compile(pricepat).findall(pricedata) persondata = self.use_proxy(proxy_addr,personurl) person = re.compile(personpat).findall(persondata) title=item["title"][i]  print(title) price1=float(price[0]) #print(price1) person1=person[0] #fh.write(tile+"\n"+price+"\n"+person+"\n") cursor = conn.cursor() sql = "insert into jd(title,price,person) values(%s,%s,%s);"  params=(title,price1,person1)  print("4") cursor.execute(sql,params) conn.commit() #fh.close() 

1	`<span style="font-family: 微软雅黑, "Microsoft YaHei"; font-size: 16px;"> conn.close() <br>` `return` `item <br> except Exception` `as` `e: <br>` `print(str(e))</span><span style="font-family: 微软雅黑, "Microsoft YaHei";"><br></span>`

相信聪明的你已经学会了，还等什么，赶快去实践吧。

以上就是如何使用scrapy框架循环爬京东数据后导入Mysql的方法的详细内容，更多文章请关注木庄网络博客！！