Loading & Saving Data In Python Pandas

This article explains how load data from different data sources into Python using the Pandas library.

Read Data

Read specific Excel worksheet:

import pandas as pd
xls_df = pd.read_excel(
    "na_values=["NA", "?"]

Read json file:

import pandas as pd
json_df = pd.read_json(

Read CSV file:

import pandas as pd
csv_df = pd.read_csv(

Read all HTML tables and take one:

table_df = pd.read_html(

Query SQL Server or Azure SQL Database:

import pandas as pd
import pyodbc
connection = pyodbc.connect(
    "DRIVER={ODBC Driver 13 for SQL Server};"
sql_df = pd.read_sql(
    "SELECT * "
    "FROM [SalesLT].[SalesOrderDetail] "

Note that the DRIVER parameter can differ in your own system.
If pyodbc complains, use this…


… to get a list of all drivers installed on your workstation and select the correct one.

Write Data

Write a data frame to an Excel worksheet:


Write a data frame to a json file:


Write a data frame to a csv file:


Fix Data

Want to rename your columns/features?

df.columns = ["Column1", "Column2", "Column3"]

Want to take a quick peek at your data?

   OrderQty  UnitPrice
0        10    672.294
1         8     32.394
2         1     38.100
3        23     34.925
4         9   1020.594

Want to get some basic stats?

         OrderQty    UnitPrice
count  105.000000   105.000000
mean     3.171429   406.358515
std      3.445893   487.939042
min      1.000000     2.894200
25%      1.000000    37.254000
50%      2.000000   158.430000
75%      3.000000   728.910000
max     23.000000  1466.010000

Want to check out the name of your columns?

Index(['OrderQty', 'UnitPrice'], dtype='object')

Want to check out the indices on the data?

RangeIndex(start=0, stop=105, step=1)

Want to to check out the data types Pandas has inferred?

OrderQty       int64
UnitPrice    float64
dtype: object

Common Parameters

  • sep: The delimiter to use for reading csv-like files. This defaults to comma (,) if you don’t provide it.
  • delimiter: Means the same as sep. Use whichever you like.
  • header: The row number to use for column header names. Data reading also starts from here. You can provide a list of row numbers if you’re loading weird stuff that requires a hierarchical multi-index.
  • names: A list of column names to use. If the data has no header of its own, make sure you also pass in header=None so pandas doesn’t gobble up the first row by accident.
  • index_col: The column to use for row labels. Use this to override the default row numbering that pandas provides.
  • skiprows: Number of rows to ignore at the start of the file. Useful if you’re loading weird report data extracted from a human-made excel file with fancy headers.
  • na_values: A list of source data values to recognize as NA or NaN values. Saves you the trouble of having to clean them up afterwards.
  • thousands: The character to recognize as a thousands separator. Defaults no None, so specify if you need it.
  • decimal: The character to recognize as a decimal separator. Defaults to point (.) so change it if you’re loading data with a different separator. Most European countries use comma (,) for example.

There are many more parameters to look out for in the documentation.

There are also many more ways to load and save data using pandas. Check the official docs here for the lot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.