爬虫python链接mysql数据库使用to_sql写入数据老报错的问题

爬虫python链接mysql数据库使用to_sql写入数据老报错的问题

技术教程gslnedu2024-12-20 10:08:1114A+A-

网上有很多使用爬虫技术抓取数据并写入cvs的教学案例

抓下来也没大的问题,但是如果自己本地建立了mysql数据库,使用很多公开的教程,反倒实现不了功能,经常报这样那样的错误,比如常见的UserWarning: pandas only support SQLAlchemy connectable(engine/connection)错误!

经过不少时间的折腾终于找到了解决问题的办法,是因为to_sql办法网上大多数教程都是老版本的链接数据库的办法,新的版本的pandas只支持SQLAlchemy的办法链接

老的链接数据库的办法一般是这样的

import pandas as pd

import pymysql

# 创建数据库连接

conn = pymysql.connect(host='localhost', port=3306, user='root', password='password', db='database_name')

# 创建游标对象 cursor = conn.cursor()

# 创建一个DataFrame示例

data = {'Name': ['John', 'Mike', 'Bob'], 'Age': [25, 30, 35]}

df = pd.DataFrame(data)

# 将DataFrame插入MySQL表

# 如果该表不存在,可以使用CREATE TABLE语句来创建它

df.to_sql(name='table_name', con=conn, if_exists='replace', index=False)

# 关闭游标和连接

cursor.close()

conn.close()

这样的话直接报错!

那么就需要用新的办法链接

from sqlalchemy import create_engine

MYSQL_HOST = 'localhost'

MYSQL_PORT = '3306'

MYSQL_USER = 'root'

MYSQL_PASSWORD = '12345678'

MYSQL_DB = 'stocks'

### 使用pandas to_sql写入数据

engine = create_engine('mysql+pymysql://%s:%s@%s:%s/%s?charset=utf8'% (MYSQL_USER, MYSQL_PASSWORD, MYSQL_HOST, MYSQL_PORT, MYSQL_DB))

#engine = create_engine(conn)

df = pd.DataFrame(all_products,columns=keys)

#replce 清除并重写,append 叠加数据

df.to_sql('test_ifeng',engine, if_exists='replace', index=False)

#raise SystemExit

OK,这样我们就能看到mysql数据库存入了相关的数据

excel数据库

这是昨天爬虫入库的完整代码,有三种写入数据库的办法

import requests

from bs4 import BeautifulSoup

import csv

import pandas as pd

from sqlalchemy import create_engine

url = 'http://www.ifeng.com'

page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')

# 设置headers

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36",

}

# 设置cookies

cookies = {}

cookies['mall'] = '1638237165.7.1148.523814'

cookies['JSESSIONID'] = '7D7F08E6CAC6989FDE82EBDEBBF9CB21'

# 发起请求

res = requests.get(url, cookies=cookies, headers=headers, timeout=5)

#res = requests.get(url)

#title = soup.title.text

all_products = []

#print(soup)

products = soup.select('p.index_news_list_p_5zOEF')

#raise SystemExit

for product in products:

title = product.select('a')[0].text

url = product.select('a')[0].attrs['href']

all_products.append({

"biaoti":title,

"link": url

})

#csv写入办法

keys = all_products[0].keys()

with open('ifeng新闻.csv', 'w', newline='', encoding='utf-8-sig') as output_file:

dict_writer = csv.DictWriter(output_file, keys)

dict_writer.writeheader()

dict_writer.writerows(all_products)

MYSQL_HOST = 'localhost'

MYSQL_PORT = '3306'

MYSQL_USER = 'root'

MYSQL_PASSWORD = '12345678'

MYSQL_DB = 'stocks'

### 使用pandas to_sql写入数据

engine = create_engine('mysql+pymysql://%s:%s@%s:%s/%s?charset=utf8'% (MYSQL_USER, MYSQL_PASSWORD, MYSQL_HOST, MYSQL_PORT, MYSQL_DB))

#engine = create_engine(conn)

df = pd.DataFrame(all_products,columns=keys)

#replce 清除并重写,append 叠加数据

df.to_sql('test_ifeng',engine, if_exists='replace', index=False)

#raise SystemExit

### 使用pandas to_csv写入数据

pd.DataFrame(all_products,columns=keys).to_csv('ifeng新闻1.csv', encoding='utf-8-sig')

点击这里复制本文地址 以上内容由朽木教程网整理呈现,请务必在转载分享时注明本文地址!如对内容有疑问,请联系我们,谢谢!
qrcode

朽木教程网 © All Rights Reserved.  蜀ICP备2024111239号-8