How To Manipulate SQL Data Using SQLAlchemy and Pandas

January 23, 2023 • Development

By EzzEddin Abdullah

Dealing with databases through Python is easily achieved using SQLAlchemy. Manipulating data through SQLAlchemy can be accomplished in most tasks, but there are some cases you need to integrate your database solution with the Pandas library.

This is useful, especially if you’re comfortable manipulating data through Pandas. There are also cases when you have a large CSV file, and you want to import the contained data into your SQLAlchemy. You can also use Pandas with SQLAlchemy when you already have a DataFrame that you want to import to your database instead of manual SQL inserts.

In this tutorial, you’ll learn how to import data from SQLAlchemy to a Pandas data frame, how to export Pandas data frame to SQLAlchemy, and how to load a CSV file into a database.

Database setup with SQLAlchemy ORM

The following snippet contains a database setup that we will use in this tutorial:

from sqlalchemy import create_engine, Column, Integer, String, DateTime, Text, ForeignKey
from sqlalchemy.engine import URL
from sqlalchemy.orm import declarative_base, sessionmaker
from datetime import datetime
import pandas as pd


Base = declarative_base()


class Article(Base):
    __tablename__ = 'articles'

    id = Column(Integer(), primary_key=True)
    slug = Column(String(100), nullable=False, unique=True)
    title = Column(String(100), nullable=False)
    created_on = Column(DateTime(), default=datetime.now)
    updated_on = Column(DateTime(), default=datetime.now, onupdate=datetime.now)
    content = Column(Text)

url = URL.create(
    drivername="postgresql",
    username="coderpad",
    host="/tmp/postgresql/socket",
    database="coderpad"
    )

engine = create_engine(url)
Session = sessionmaker(bind=engine)
session = Session()Code language: Python (python)

ℹ️ If you want to know the details of this setup and what every line entails, you can consult the previous SQLAlchemy with PostgreSQL tutorial that we discussed in detail.

Now, the session object is ready to be used to interact with the database. You can insert a new article and commit that change to the PostgreSQL engine:

article1 = Article(
    slug="clean-python",
    title="How to Write Clean Python",
    content="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
    )

Base.metadata.create_all(engine)

session.add(article1)
session.commit()Code language: Python (python)

Here, we add the article object that has the title “How to Write Clean Python”.

How to import data from SQLAlchemy to Pandas DataFrame

We want the article we inserted into our database to be imported to a Pandas data frame. To do that, we need to use the read_sql() Pandas method:

first_article_query = session.query(Article).filter(Article.id == 1)
first_article_df = pd.read_sql(
    sql=first_article_query.statement,
    con=engine
)

print(first_article_df['title'])Code language: Python (python)

So here, the first_article_query is a query object from SQLAlchemy retrieving the article with an id of 1. We then pass this object to the read_sql method with the statement method applied to it. This statement method retrieves the SQLAlchemy selectable object that we need for the SQL parameter in the read_sql method.

We then pass the engine object to the con parameter of that method. Finally, we have our new Pandas data frame: first_article_df.

If you print the title, you’ll get the expected result:

>>> first_article_df['title']
0    How to Write Clean Python
Name: title, dtype: objectCode language: CSS (css)

ℹ️ This read_sql method only reads SQL queries. It does not write data in the database. For example, if you want to change the title of that article:

first_article_df['title'] = 'Cleaner Python'Code language: Python (python)

You’ll only see that this value is changed in the data frame:

>>> first_article_df['title']
0    Cleaner Python
Name: title, dtype: objectCode language: CSS (css)

Not in the database:

>>> session.query(Article).filter(Article.id == 1).first().title
u'How to Write Clean Python'Code language: JavaScript (javascript)

How to export Pandas DataFrame to SQLAlchemy

If you want to change the database through a DataFrame, you can use the to_sql() Pandas method. This method can write records stored in a DataFrame to a SQL database.

So here is a sample data frame that contains a new article:

new_article_df = pd.DataFrame(
    {
        'slug': ['sqlalchemy-pandas'],
        'title': ['SQLAlchemy Pandas Article'],
        'content': ['Bla bla bla'],
    })Code language: Python (python)

If you want to insert this new article into the database, you need to use the to_sql() method as follows:

new_article_df.to_sql(
    name='articles',
    con=engine,
    if_exists='append',
    index=False,
)Code language: Python (python)

Here, we pass the table name to the name argument and the engine object to the con argument.

We also specify the append option to the if_exists parameter to indicate that the table already exists and that we want to append that new article to the articles table. If you leave this parameter blank, it will automatically fail because the if_exists param is by default set to “fail”, and the articles table exists. However, if you set it to “replace”, the articles table will be dropped before inserting that new article.

You can make sure this article is already committed to the database by the following query statement:

session.query(Article).filter(
    Article.slug=='sqlalchemy-pandas'
).first().titleCode language: Python (python)

>>> session.query(Article).filter(Article.slug == 'sqlalchemy-pandas').first().title
u'SQLAlchemy Pandas Article'Code language: JavaScript (javascript)

How to load a CSV file into a database

In some cases, you may have a large CSV file that you want to import into the database. Instead of having a data entry clerk who can insert each record one by one, you can grab all data in that file and insert it into the database.

You can do so by loading the CSV file to a data frame:

new_df = pd.read_csv("more_articles.csv")Code language: Python (python)

And then use the same to_sql() method to insert these records:

new_df.to_sql(
    name='articles',
    con=engine,
    if_exists='append',
    index=False,
)Code language: PHP (php)

That way, you have inserted new articles into your database. Just make sure that the column names exactly match the name of your columns. If this is not the case, you may use the rename() Pandas method on your data frame.

Try manipulating the SQL data in the sandbox below:

Conclusion

This tutorial has covered how to interact with SQLAlchemy and Pandas libraries to manipulate data. We discussed how to import data from SQLAlchemy to Pandas DataFrame using read_sql, how to export Pandas DataFrame to the database using to_sql, and how to load a CSV file to get a DataFrame that can be shipped to the database.

I’m Ezz. I’m an AWS Certified Machine Learning Specialist and a Data Platform Engineer. I help SaaS companies rank on Google. Check out my website for more.