class: center, titleslide
# Python Workshop: # Handling data with Pandas
##
Ties de Kok
## Tilburg University --- layout: true class: mainlayout --- class: tocslide .left-column[ ## Pandas
Library ] .right-column[
] --- class: tocslide .left-column[ ## Pandas
Library ] .right-column[
] -- .right-column-next[
] --- class: tocslide .left-column[ ## Pandas
Library ## Agenda ] .right-column[ ### What are we going to do this session
1. Terminology 2. Specific topics: - Open files - Saving files - Navigating dataframe - Select data - Create new columns - Merge data - Groupby operation - Plotting with Pandas - Plotting with Seaborn ] --- class: tocslide .left-column[ ## Pandas
Library ## Agenda ## Terminology ] .right-column[ ## Terminology
### Pandas vs. Numpy Numpy provides a powerful N-dimensional array object.
Pandas builds upon the Numpy functionality. ] -- .right-column-next[
### pd.DataFrame vs. pd.Series A Pandas Series is a 1D data structure (like a vector) A Pandas DataFrame is a 2D data structure (like a matrix)
Columns and rows in a DataFrame are Series. ] --- class: tocslide .left-column[ ## Open data ] .right-column[ ## Opening data Pandas can open pretty much any data file!
Opening and Saving files with Pandas notebook
] --- class: tocslide .left-column[ ## Open data ## Save data ] .right-column[ ## Saving data Pandas can save to pretty much any data file! (except SAS)
Opening and Saving files with Pandas notebook
] --- class: tocslide .left-column[ ## Open data ## Save data ## HDF files ] .right-column[ ## HDF files
Tip: HDF files are awesome!
] --- class: tocslide .left-column[ ## Open data ## Save data ## Navigate ] .right-column[ ## How to inspect your data?
There is no standard data browser for DataFrames ## My recommendation Use basic operations to view parts of the data in the notebook:
] --- class: tocslide .left-column[ ## Open data ## Save data ## Navigate ] .right-column[ ## Alternative, use the QGrid extension
] --- class: tocslide .left-column[ ## Open data ## Save data ## Navigate ## Select data ] .right-column[ ## Selecting data
Selecting data based on a condition, Jupyter Notebook
] --- class: tocslide .left-column[ ## Open data ## Save data ## Navigate ## Select data ## Create
Columns ] .right-column[ ## Creating columns
Various methods to create columns, Jupyter Notebook
] --- class: tocslide .left-column[ ## Open data ## Save data ## Navigate ## Select data ## Create
Columns ## Merge data ] .right-column[ ## Merging DataFrames
Various methods to merge, join, and append, Jupyter Notebook
] --- class: tocslide .left-column[ ## Open data ## Save data ## Navigate ## Select data ## Create
Columns ## Merge data ## GroupBy
Operation ] .right-column[ ## GroupBy Operations
Various methods to merge, join, and append, Jupyter Notebook
] --- class: tocslide .left-column[ ## Save data ## Navigate ## Select data ## Create
Columns ## Merge data ## Groupby
Operation ## Plotting ] .right-column[ ## Plotting data (Pandas and Seaborn)
Comprehensive notebook for plotting with Pandas
] --- class: tocslide .left-column[ ## Closing
remarks ] .right-column[
Questions?
] --- class: tocslide .left-column[ ## Closing
remarks ## Demonstration ] .right-column[
Demonstration
] --- class: tocslide .left-column[ ## Closing
remarks ## Demonstration ## Mini-Task
Instructions ] .right-column[ ## Mini Task **Goal:** Get hands-on experience with Pandas on a real-life dataset. ### Instructions 1. Open (start) a Jupyter Notebook in the `UW_python_2018` folder 2. Solve tasks in: `Materials > Session_2 > pandas_mini_task.ipynb` *Feel free to also work on:* `Session_1 > basic_python_tasks.ipynb` ### For help: -
Python tutorial
-
Python Basics Notebook
-
Opening files with Python / Pandas
-
Data handling with Pandas
]