class: center, titleslide
# Python Workshop: # Introduction
##
Ties de Kok
## Tilburg University --- layout: true class: mainlayout --- class: tocslide .left-column[ ## About me ] .right-column[
] --- class: tocslide .left-column[ ## About me ## Program ] .right-column[ ### What will we be doing?
**Four main blocks:**
1) Introduction to Python (+ Python worfklow)
Today
2) Handling data with `Pandas`
Today
3) Gathering data from the web
Thursday
4) Natural Language Processing
Friday
] --- class: tocslide .left-column[ ## About me ## Program ## Basic
Principles ] .right-column[
### Basic Principles of this course:
1) I cannot inject you with Python skills 2) **It is up to you to make yourself proficient with Python** ] -- .right-column-next[
### My goal:
Make it more efficient for you to **teach Python to yourself**
] -- .right-column-next[
### How? 1. By providing starting points 2. By pointing out common pitfalls ] --- class: tocslide .left-column[ ## About me ## Program ## Basic
Principles ## Structure ] .right-column[
### Structure: **Each block consists of three elements:**
1) Conceptual introduction
Introduce basic constructs and terminology 2) Setup + Get started
Make sure everything is setup and working 3) Mini-task
Get hands-on experience ] --- class: tocslide .left-column[ ## About me ## Program ## Basic
Principles ## Structure ## Materials ] .right-column[
### Slides: All of the slides are made available here:
GitHub page
### Python materials:
All materials are available here: 1)
Learn Python for Research (GitHub)
2)
Natural Language Processing (NLP) Tutorial (GitHub)
] --- class: tocslide .left-column[ ## About me ## Program ## Basic
Principles ## Structure ## Materials ## Agenda ] .right-column[ ### Agenda
1. Python eco-system 2. Using Python 3. Jupyter Notebook 4. Python syntax 5. Extra topics - Folder structure - GitHub crash-course - Begin-to-End example - General tips ] --- class: tocslide .left-column[ ## Why
Python? ] .right-column[
(Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/)
] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ] .right-column[
### Python 2 vs. Python 3:
Simple: always use Python 3 unless you have to use Python 2. Python 3 receives new updates, Python 2.7 is slowly phased out.
**Note!** Python 3 syntax is not always backwards compatible!
`print 'Hello, world!'`
Only works in Python 2.7 `print('Hello, world!')`
Works in Python 2.7 and Python 3.X
### We will use Python 3.6 ] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ] .right-column[
### Modules and packages A module/package is Python code that you "import" to add functionality. **Two types of modules/packages:** 1. Build-in modules that are included with Python 2. Third-party modules/packages
The Python Package Index hosts more than 130,000 packages! **Example:** `import os`
Standard module `import pandas as pd`
Third-party module ] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ] .right-column[
### Modules and packages **How to install third-party modules/packages?** Use `pip` to install packages hosted on the Python Package Index Use `conda` to install packages hosted by `Anaconda` or `Conda-Forge`
] -- .right-column-next[ ### Recommendation:
always start with the default Anaconda 3 distribution! **Anaconda:** Python bundled with most used data science packages.
**For more info:**
Anconda Distribution website
] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ## Using
Python ] .right-column[ ### How to run Python code?
1) Save code to `.py` file and run from command line: `python file.py`
2) Use an interactive console in the command line: `python` or `ipython`
**3) Use Jupyter Notebooks!** ] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ## Using
Python ## Jupyter
Notebook ] .right-column[ ### Jupyter Notebook
Try it in your browser
] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ## Using
Python ## Jupyter
Notebook ] .right-column[ ### Jupyter Notebook
] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ## Using
Python ## Jupyter
Notebook ] .right-column[ ### Jupyter Notebook #### How does it work:
**How to start:** 1. Open up a command line / terminal 2. Change to project directory using `cd` 3. Type: `jupyter notebook` **How to stop:**
Press `ctrl+c` in command line / terminal ] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ## Using
Python ## Jupyter
Notebook ] .right-column[ ### Jupyter Notebook Using a Jupyter Notebook is largely self-explanatory.
**Most relevant shortcuts for reference purposes:**
] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ## Using
Python ## Jupyter
Notebook ## Python
syntax ] .right-column[ ### Python syntax is easy!
] -- .right-column-next[ ### Where to start?
I recommend to use my
Python Basics Notebook
] --- class: tocslide .left-column[ ## Why
Python? ## Python
Eco-system ## Using
Python ## Jupyter
Notebook ## Python
syntax ] .right-column[ ### A couple of caveats 1) It is best practice to include all imports at the start 2) The spacing (i.e. tabs) are not just for looks!
3) Avoid "blind" `try` and `except` blocks.
] --- class: tocslide .left-column[ ## Folder
Structure ] .right-column[ ### Folder structure
] --- class: tocslide .left-column[ ## Folder
Structure ## GitHub
Repository ] .right-column[ ### GitHub Repository
You should really look into using version control with Git + GitHub. GitHub provides **free** private repositories for academics:
Apply for GitHub Education
**Steps to get your project on GitHub:** 1. Create a new empty repository on
GitHub.com
2. Clone empty repository using
GitHub Desktop app
to your computer 3. Copy your project files into this folder 4. Commit + Push using
GitHub Desktop app
The earlier you do this the better! ] --- class: tocslide .left-column[ ## Folder
Structure ## GitHub
Repository ## Project
Begin-to-End ] .right-column[ ### Project Begin-to-End
**My usual workflow for a research project:**
1. Create empty repository on GitHub + setup folder structure
can save a lot of headache later on! 2. Start with Python to gather and clean data
usually 70% of the work 3. Once the data is ready, I switch over to R or Stata
but still in the Jupyter Notebook! 4. Write the paper and create the tables in LaTeX
I highly recommend
ShareLaTeX
for LaTeX! ] --- class: tocslide .left-column[ ## Closing
remarks ] .right-column[
Questions?
] --- class: tocslide .left-column[ ## Closing
remarks ## Setup +
Get Started ] .right-column[ ## Setup: 1. Make sure you have Anaconda installed 2. Make sure you can start / open a Jupyter Notebook
## Get Started: **Goal:** Solve some of the "Basic Python Tasks" in a Jupyter Notebook. 1. Open a Jupyter Notebook in the `Materials` folder 2. Solve some of the "Basic Python Tasks"
Find them in `Materials > Day_1 > mini_task` For help: -
Python tutorial
-
Python Basics Notebook
]