python爬取网页数据步骤图解

作者：思念的回忆发布日期:2025-11-06 浏览:157

# 示例代码：Python 爬取网页数据步骤图解

import requests
from bs4 import BeautifulSoup

# 1. 发送HTTP请求
url = 'https://example.com'  # 替换为目标网页的URL
response = requests.get(url)

# 2. 检查请求是否成功
if response.status_code == 200:
    print("请求成功")
else:
    print("请求失败，状态码：", response.status_code)
    exit()

# 3. 解析HTML内容
soup = BeautifulSoup(response.text, 'html.parser')

# 4. 提取所需数据
# 假设我们要提取所有的标题（<h1>标签）
titles = soup.find_all('h1')
for title in titles:
    print(title.get_text())

# 5. 数据保存（可选）
with open('output.txt', 'w', encoding='utf-8') as file:
    for title in titles:
        file.write(title.get_text() + '\n')

# 6. 关闭会话（如果是使用session的话）
# session.close()

解释说明：

发送HTTP请求：使用requests.get()方法向目标网页发送GET请求，并获取响应。
检查请求是否成功：通过检查响应的状态码（response.status_code），确保请求成功（状态码为200）。
解析HTML内容：使用BeautifulSoup库解析HTML内容，使其更易于操作和提取数据。
提取所需数据：通过find_all()方法查找所有指定的HTML标签（例如<h1>），并提取其文本内容。
数据保存：将提取的数据保存到文件中（可选步骤）。
关闭会话：如果使用了session对象，则需要在爬取完成后关闭会话（这里没有使用session，所以无需关闭）。

上一篇：python with as

下一篇：python量化交易