csv - python多列存取爬蟲(chóng)網(wǎng)頁(yè)?
問(wèn)題描述
爬蟲(chóng)抓取的資料想分列存取在tsv上,試過(guò)很多方式都沒(méi)有辦法成功存存取成兩列資訊。想存取為數(shù)字爬取的資料一列,底下類型在第二列
from urllib.request import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)def GPname(): GPnameList = bs0bj.find_all('dd',{'class':re.compile('ddappname')}) str = ’’ for name in GPnameList:str += name.get_text()str += ’n’print(name.get_text()) return strdef GPcompany(): GPcompanyname = bs0bj.find_all('dd',{'style':re.compile('color')}) str = ’’ for cpa in GPcompanyname:str += cpa.get_text()str += ’n’print(cpa.get_text()) return strwith open(’0217.tsv’,’w’,newline=’’,encoding=’utf-8’) as f: f.write(GPname()) f.write(GPcompany())f.close()
可能對(duì)zip不熟悉,存取下來(lái)之后變成一個(gè)字一格也找到這篇參考,但怎么嘗試都沒(méi)有辦法成功https://segmentfault.com/q/10...
問(wèn)題解答
回答1:寫(xiě)csv文件簡(jiǎn)單點(diǎn) 你的結(jié)構(gòu)數(shù)據(jù)要成這樣 [['1. 東森新聞雲(yún)','新聞'],['2. 創(chuàng)世黎明(Dawn of world)','遊戲']]
from urllib import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)GPnameList = [name.get_text() for name in bs0bj.find_all('dd',{'class':re.compile('ddappname')})]GPcompanyname = [cpa.get_text() for cpa in bs0bj.find_all('dd',{'style':re.compile('color')})]data = ’n’.join([’,’.join(d) for d in zip(GPnameList, GPcompanyname)])with open(’C:/Users/sa/Desktop/0217.csv’,’wb’) as f: f.write(data.encode(’utf-8’))
相關(guān)文章:
1. javascript - sublime快鍵鍵問(wèn)題2. javascript - immutable配合react提升性能?3. css - 寫(xiě)頁(yè)面遇到個(gè)布局問(wèn)題,求大佬們幫解答,在線等,急!~4. javascript - nodejs關(guān)于進(jìn)程間發(fā)送句柄的一點(diǎn)疑問(wèn)5. Apache 已經(jīng)把網(wǎng)站根目錄的改為allow from all了,但是服務(wù)器還是不能訪問(wèn)?6. 實(shí)現(xiàn)bing搜索工具urlAPI提交7. 配置Apache時(shí),添加對(duì)PHP的支持時(shí)語(yǔ)法錯(cuò)誤8. vue.js - Vue 如何像Angular.js watch 一樣監(jiān)聽(tīng)數(shù)據(jù)變化9. javascript - 移動(dòng)端上不能實(shí)現(xiàn)拖拽布局嗎?10. phpstudy8.1支持win11系統(tǒng)嗎?

網(wǎng)公網(wǎng)安備