Query Yahoo finance historical data via python requests

Query Yahoo finance historical data via python requests

Overview

This term, one of my friends is the TA for a course about algorithm trading. Then she needs historical data for the course project. However, querying data from Yahoo api ichart.finance.yahoo.com always failed. It is said Yahoo closed the api on May 18th 2017, but leave a download button on its webpage. For example, https://finance.yahoo.com/quote/BIDU/history?p=BIDU, you can find the download link like https://query1.finance.yahoo.com/v7/finance/download/BIDU?period1=1506780002&period2=1509372002&interval=1d&events=history&crumb=NbjLKgotcao for “BIDU”. Then we can build a similar url to download the historical data as CSV file.

I found a post, NEW YAHOO FINANCE QUOTE DOWNLOAD URL, which clearly explained how to build the download url. Here, I just present the brief review.

https://query1.finance.yahoo.com/v7/finance/download/BIDU?period1=1506780002&period2=1509372002&interval=1d&events=history&crumb=NbjLKgotcao

Look at the above url for the historical data, there is a crumb parameter that is not familiar with us. Actually, you can find it in the html file:

curl -s --cookie-jar cookie.txt https://finance.yahoo.com/quote/BIDU?p=BIDU > baidu.html

Curl the page and save the html and cookie. There are several crumbs showing in the page, but there is only one CrumbStore":{"crumb":"SCYl9KtqqXZ"}, we can search and save it. But that’s not enough, if you directly query data from this url, you probability get an error. We still need a parameter from the cookie text:

# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

.yahoo.com	TRUE	/	FALSE	1540910372	B	0mjeuqhcveed4&b=3&s=62

The B values is what we need, we can store this value for later use.

Query from python requests

Ok, now doing it with python requests. Things we need to do:

  1. build the url for python request
  2. save the cookie B value
  3. find the crumb parameter from the page
  4. set the start and end date, for whole historical data, it’s from now to the very first

Here is the script adapted from YAHOO FINANCE QUOTE DOWNLOAD PYTHON, itself explained what is going on:

import re
import sys
import time
import datetime
import requests

def get_cookie_value(r):
    return {'B': r.cookies['B']}

def get_page_data(symbol):
    url = "https://finance.yahoo.com/quote/%s/?p=%s" % (symbol, symbol)
    r = requests.get(url)
    cookie = get_cookie_value(r)
    lines = r.content.decode('unicode-escape').strip(). replace('}', '\n')
    return cookie, lines.split('\n')

def find_crumb_store(lines):
    # Looking for
    # ,"CrumbStore":{"crumb":"9q.A4D1c.b9
    for l in lines:
        if re.findall(r'CrumbStore', l):
            return l
    print("Did not find CrumbStore")

def split_crumb_store(v):
    return v.split(':')[2].strip('"')

def get_cookie_crumb(symbol):
    cookie, lines = get_page_data(symbol)
    crumb = split_crumb_store(find_crumb_store(lines))
    return cookie, crumb

def get_data(symbol, start_date, end_date, cookie, crumb):
    filename = '%s.csv' % (symbol)
    url = "https://query1.finance.yahoo.com/v7/finance/download/%s?period1=%s&period2=%s&interval=1d&events=history&crumb=%s" % (symbol, start_date, end_date, crumb)
    response = requests.get(url, cookies=cookie)
    with open (filename, 'wb') as handle:
        for block in response.iter_content(1024):
            handle.write(block)

def get_now_epoch():
    # @see https://www.linuxquestions.org/questions/programming-9/python-datetime-to-epoch-4175520007/#post5244109
    return int(time.time())

def download_quotes(symbol):
    start_date = 0
    end_date = get_now_epoch()
    cookie, crumb = get_cookie_crumb(symbol)
    get_data(symbol, start_date, end_date, cookie, crumb)

symbol = input('Enter the symbol: ')
print("--------------------------------------------------")
print("Downloading %s to %s.csv" % (symbol, symbol))
download_quotes(symbol)
print("--------------------------------------------------")

From bottom to top, we enter the symbol we want to download the data, then call the download_quotes() function. It sets the dates range, get cookie B value and look for the crumb parameter, then download it through requests. Then, we get the CSV file of the historical data saved in the symbol.csv file. If Yahoo updates its api later, we probabaliy could doing this in the similar way.

avatar

Frank Lin

Code learning...

Say something Login