How To Manipulate SQL Data Using SQLAlchemy and Pandas
Dealing with databases through Python is easily achieved using SQLAlchemy. Manipulating data through SQLAlchemy can be accomplished in most tasks, but there are some cases you need to integrate your database solution with the Pandas library.
This is useful, especially if you’re comfortable manipulating data through Pandas. There are also cases when you have a large CSV file, and you want to import the contained data into your SQLAlchemy. You can also use Pandas with SQLAlchemy when you already have a DataFrame that you want to import to your database instead of manual SQL inserts.
In this tutorial, you’ll learn how to import data from SQLAlchemy to a Pandas data frame, how to export Pandas data frame to SQLAlchemy, and how to load a CSV file into a database.
Database setup with SQLAlchemy ORM
The following snippet contains a database setup that we will use in this tutorial:
from sqlalchemy import create_engine, Column, Integer, String, DateTime, Text, ForeignKey
from sqlalchemy.engine import URL
from sqlalchemy.orm import declarative_base, sessionmaker
from datetime import datetime
import pandas as pd
Base = declarative_base()
class Article(Base):
__tablename__ = 'articles'
id = Column(Integer(), primary_key=True)
slug = Column(String(100), nullable=False, unique=True)
title = Column(String(100), nullable=False)
created_on = Column(DateTime(), default=datetime.now)
updated_on = Column(DateTime(), default=datetime.now, onupdate=datetime.now)
content = Column(Text)
url = URL.create(
drivername="postgresql",
username="coderpad",
host="/tmp/postgresql/socket",
database="coderpad"
)
engine = create_engine(url)
Session = sessionmaker(bind=engine)
session = Session()
Code language: Python (python)
ℹ️ If you want to know the details of this setup and what every line entails, you can consult the previous SQLAlchemy with PostgreSQL tutorial that we discussed in detail.
Now, the session object is ready to be used to interact with the database. You can insert a new article and commit that change to the PostgreSQL engine:
article1 = Article(
slug="clean-python",
title="How to Write Clean Python",
content="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
)
Base.metadata.create_all(engine)
session.add(article1)
session.commit()
Code language: Python (python)
Here, we add the article object that has the title “How to Write Clean Python”.
How to import data from SQLAlchemy to Pandas DataFrame
We want the article we inserted into our database to be imported to a Pandas data frame. To do that, we need to use the read_sql() Pandas method:
first_article_query = session.query(Article).filter(Article.id == 1)
first_article_df = pd.read_sql(
sql=first_article_query.statement,
con=engine
)
print(first_article_df['title'])
Code language: Python (python)
So here, the first_article_query
is a query object from SQLAlchemy retrieving the article with an id
of 1
. We then pass this object to the read_sql
method with the statement method applied to it. This statement method retrieves the SQLAlchemy selectable object that we need for the SQL parameter in the read_sql
method.
We then pass the engine object to the con
parameter of that method. Finally, we have our new Pandas data frame: first_article_df
.
If you print the title, you’ll get the expected result:
>>> first_article_df['title']
0 How to Write Clean Python
Name: title, dtype: object
Code language: CSS (css)
ℹ️ This
read_sql
method only reads SQL queries. It does not write data in the database. For example, if you want to change the title of that article:
first_article_df['title'] = 'Cleaner Python'
Code language: Python (python)
You’ll only see that this value is changed in the data frame:
>>> first_article_df['title']
0 Cleaner Python
Name: title, dtype: object
Code language: CSS (css)
Not in the database:
>>> session.query(Article).filter(Article.id == 1).first().title
u'How to Write Clean Python'
Code language: JavaScript (javascript)
How to export Pandas DataFrame to SQLAlchemy
If you want to change the database through a DataFrame
, you can use the to_sql() Pandas method. This method can write records stored in a DataFrame
to a SQL database.
So here is a sample data frame that contains a new article:
new_article_df = pd.DataFrame(
{
'slug': ['sqlalchemy-pandas'],
'title': ['SQLAlchemy Pandas Article'],
'content': ['Bla bla bla'],
})
Code language: Python (python)
If you want to insert this new article into the database, you need to use the to_sql()
method as follows:
new_article_df.to_sql(
name='articles',
con=engine,
if_exists='append',
index=False,
)
Code language: Python (python)
Here, we pass the table name to the name argument and the engine object to the con
argument.
We also specify the append
option to the if_exists
parameter to indicate that the table already exists and that we want to append that new article to the articles
table. If you leave this parameter blank, it will automatically fail because the if_exists
param is by default set to “fail”, and the articles table exists. However, if you set it to “replace”, the articles
table will be dropped before inserting that new article.
You can make sure this article is already committed to the database by the following query statement:
session.query(Article).filter(
Article.slug=='sqlalchemy-pandas'
).first().title
Code language: Python (python)
>>> session.query(Article).filter(Article.slug == 'sqlalchemy-pandas').first().title
u'SQLAlchemy Pandas Article'
Code language: JavaScript (javascript)
How to load a CSV file into a database
In some cases, you may have a large CSV file that you want to import into the database. Instead of having a data entry clerk who can insert each record one by one, you can grab all data in that file and insert it into the database.
You can do so by loading the CSV file to a data frame:
new_df = pd.read_csv("more_articles.csv")
Code language: Python (python)
And then use the same to_sql()
method to insert these records:
new_df.to_sql(
name='articles',
con=engine,
if_exists='append',
index=False,
)
Code language: PHP (php)
That way, you have inserted new articles into your database. Just make sure that the column names exactly match the name of your columns. If this is not the case, you may use the rename()
Pandas method on your data frame.
Try manipulating the SQL data in the sandbox below:
Conclusion
This tutorial has covered how to interact with SQLAlchemy and Pandas libraries to manipulate data. We discussed how to import data from SQLAlchemy to Pandas DataFrame
using read_sql
, how to export Pandas DataFrame
to the database using to_sql
, and how to load a CSV file to get a DataFrame
that can be shipped to the database.
I’m Ezz. I’m an AWS Certified Machine Learning Specialist and a Data Platform Engineer. I help SaaS companies rank on Google. Check out my website for more.