时间:2023-07-27 10:30:01 | 来源:网站运营
时间:2023-07-27 10:30:01 来源:网站运营
爬虫杂谈(二)使用Selenium抓取动态网站:很多电商网站内容是动态加载的,requests无法抓取全部内容,内容是随着用户向下浏览而逐步加载的。Selenium可以抓取动态内容,提供针对浏览器的很多操作。import requestsurl = 'https://www.jd.com/'r = requests.get(url).textprint(len(r))with open('jd.html','w',encoding='gbk') as f: f.write(r)
抓取到108399from selenium import webdriverimport timedef scroll(n,i): return "window.scrollTo(0,(document.body.scrollHeight/{0})*{1});"./ format(n,i)url = 'https://www.jd.com/'firefox = webdriver.Firefox()firefox.maximize_window()firefox.get(url)n = 10for i in range(0,n+1): s = scroll(n,i) print(s) firefox.execute_script(s) time.sleep(2)print(len(firefox.page_source))with open("jd2.html",'w',encoding="utf-8",errors='ignore') as f: f.write(firefox.page_source)
关键词:动态,使用,爬虫