` parent:  
 CSS Selector: `div > p.title`
For a full overview I recommend checking this page:  
https://www.w3schools.com/cssref/css_selectors.asp
]
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
]
.right-column[
#### `SelectorGadget` Chrome extension:

#### Chrome DevTools:

]
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
]
.right-column[
        
### Recap: web-scraping a page
**Step 1: determine the URL of the page you need**
> URL = https://www.tiesdekok.com
**Step 2 and Step 3: download and parse the HTML of the webpage**
 Note: the `Requests-HTML` does the HTML parsing automatically for you.
  
Note: the `Requests-HTML` does the HTML parsing automatically for you.
**Step 4: use CSS Selectors to extract information**

 
]      
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
## JavaScript
Pages
]
.right-column[
        
### JavaScript heavy webpages
Some webpages rely heavily on JavaScript to load in data-elements:

 
]
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
## JavaScript
Pages
]
.right-column[
        
### JavaScript heavy webpages
Can we still scrape them?
**Sure, but with a different approach:**
**Option 1:** use browser automation tools   
 Two primary tools:
1. Use a headless browser (`requests-html` can do this)
2. Use `Selenium` with Chrome bindings
**Option 2:** try to reverse-engineer the HTTP Requests  
]
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
## JavaScript
Pages
]
.right-column[
**Option 1:** use browser automation tools   
 Use a headless browser (`requests-html` can do this)
 Note: first time you run `html.render()` it will download some dependencies.
 
Note: first time you run `html.render()` it will download some dependencies.
]
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
## JavaScript
Pages
]
.right-column[
**Option 1:** use browser automation tools   
 Use `Selenium` with Chrome bindings
 GIF courtesy of the PyWhatsapp GitHub page
 
GIF courtesy of the PyWhatsapp GitHub page
]
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
## JavaScript
Pages
## HTTP
Requests
]
.right-column[
        
### HTTP Requests
Modern webpages often "load" data to the page using HTTP Requests. 
**Tip: reverse-engineer the APIs that are used and mimic them!**
]
--
.right-column-next[
### Example:
Let's say we want to get data on the approval rating for Jeff Bezos:

 
]    
---
class: tocslide
.left-column[
## API
Requests 
## Web
Scraping
## JavaScript
Pages
## HTTP
Requests
] 
.right-column[
        
What do we see in the `Network Sniffer` Chrome extension?

   
]
--
.right-column-next[

 
]    
---
class: tocslide
.left-column[
  ## Get
Started!
]
.right-column[
What is next?
## Demonstration:
Watch the demonstration video, see Discord for the link. 
## Problems:
Solve tasks in the "web_gathering_problems.ipynb" notebook.
]
---
class: tocslide
exclude: true
.left-column[
  ## Closing
remarks
]
.right-column[
Questions?

 
      
]
---
class: tocslide
exclude: true
.left-column[
  ## Closing
remarks
  ## Demonstration
]
.right-column[
Demonstration

 
      
]
---
class: tocslide
exclude: true
.left-column[
  ## Closing
remarks
  ## Demonstration
  ## Mini-task
]
.right-column[
## Setup:
1. Download the day 3 materials from GitHub
2. Make sure you have Chrome installed
3. Install `SelectorGadget` extension
## Mini-tasks:
**Goal:** Solve tasks in the "web_gathering_tasks.ipynb" notebook.
1. Open a Jupyter Notebook in the `limperg_python_2019` folder
2. Solve the web gathering tasks  
 Find them in `minitasks > day_3 > web_gathering_tasks.ipynb`
### You will need these notebooks: 
      
]