Sunday, 23 May 2021

SQL Server with Python - An Intro

Everyone uses SQL, and everyone uses Python. SQL is the de-facto standard for databases. Python on the other hand is an all-star, a top language for data analytics, machine learning, and web development. Imagine both, together.

This is actually incredibly easy to setup. We can quickly utilize the dynamic nature of Python, to control and build queries in SQL. The best part? After set-up, you don’t need to do anything.

Both of these amazing tools together, allow us to reach new heights of automation and efficiency.

pyodbc

Our bridge between the two technologies is pyodbc. This library allows easy access to ODBC databases.

ODBC, short for Open Database Connectivity, is a standardised application programming interface (API) for accessing databases, developed by the SQL Access group back in the early 90's.

Compliant database management systems (DBMS) include:

  • IBM Db2
  • MS Access
  • MS SQL Server
  • MySQL
  • Oracle

In this article, we will be using MS SQL Server. For the most part, this should be directly transferable for use with any ODBC compliant database. The only change required should be with the connection setup.

Connect

The first thing we need to do is create a connection to the SQL server. We can do this using pyodbc.connect. Within this function we must also pass a connection string.

This connection string must specify the DBMS Driver, the Server, a specific Database to connect to, and our connection settings.

So, lets assume we want to connect to server UKXXX00123,45600, database DB01 , to do this we want to use SQL Server Native Client 11.0.

We will be connecting from an internal, and thus trusted connection (we do not need to enter our username and password).

cnxn_str = ("Driver={SQL Server Native Client 11.0};"
"Server=UKXXX00123,45600;"
"Database=DB01;"
"Trusted_Connection=yes;")

Our connection is now initialized with:

cnxn = pyodbc.connect(cnxn_str)

If we are not accessing the database via a trusted connection, we will need to enter the username and password that we would usually use to access the server via SQL Server Management Studio (SSMS).

For example, if our username is JoeBloggs, and our password is Password123, we should immediately change our password.

But before changing that horrible password, we can connect like so:

cnxn_str = ("Driver={SQL Server Native Client 11.0};"
"Server=UKXXX00123,45600;"
"Database=DB01;"
"UID=JoeBloggs;"
"PWD=Password123;")
cnxn = pyodbc.connect(cnxn_str)

Now we are connected to the database, we can begin performing SQL queries via Python.

Run a Query

Every query we run on SQL Server now will consist of a cursor initialization, and query execution. Additionally, if we make any changes inside the server, we also need to commit these changes to the server (which we cover in the next section).

To initialize a cursor:

cursor = cnxn.cursor()

Now, whenever we want to perform a query, we use this cursor object.

Let’s first select the top 1000 rows from a table called customers:

cursor.execute("SELECT TOP(1000) * FROM customers")

This performs the operation, but within the server, and so nothing is actually returned to Python. So let’s look at extracting this data from SQL.

Extract Data

To extract our data from SQL into Python, we use pandas. Pandas provides us with a very convenient function called read_sql, this function, as you may have guessed, reads data from SQL.

read_sql requires both a query and the connection instance cnxn, like so:

data = pd.read_sql("SELECT TOP(1000) * FROM customers", cnxn)

This returns a dataframe containing the top 1000 rows from the customers table.

Changing Data in SQL

Now, if we wanted to change the data in SQL, we need to add another step to the original initialize connection, execute query process.

When we execute queries in SQL, these changes are kept in a temporarily existing space, they are not made directly to the data.

To make these changes permanent, we must commit them. Lets concatenate the firstName and lastName columns, to create a fullName column.

cursor = cnxn.cursor()# first alter the table, adding a column
cursor.execute("ALTER TABLE customer " +
"ADD fullName VARCHAR(20)")
# now update that column to contain firstName + lastName
cursor.execute("UPDATE customer " +
"SET fullName = firstName + " " + lastName")

At this point, fullName does not exist in our database. We must commit these changes to make them permanent:

cnxn.commit()

No comments:

Post a Comment