In last week’s post, I pulled data about local restaurants from Yelp to generate a dataset. I was happy to find that Yelp actually has a very friendly API. This guide will walk you through setting up some boiler plate code that can then be configured to your specific needs.
Step 1: Obtaining Access to the Yelp API
Before you can use the Yelp API, you need to submit a developer request. This can be done here. I’m not sure what the requirements are, but my guess is they approve almost everyone. After getting access, you will need to get your API keys from the Manage API access section on the site.
Step 2: Getting the rauth library
Yelp’s API uses OAuth authentication for API calls. Unless you want to do a lot of work, I suggest that you use a third party library to handle the OAuth for you. For this tutorial I’m using rauth, but feel free to use any library of your choice.
You can use easy_install rauth
or pip install rauth
to download the library.
Step 3: Write the code to query the Yelp API
You’ll first need to figure out what information you actually want to query. The API Documentation gives you all of the different parameters that you can specify and the correct syntax.
For this example, we’re going to be doing some location-based searching for restaurants. If you store each of the search parameters in a dictionary, you can save yourself some formatting. Here’s a method that accepts a latitude and longitude and returns the search parameter dictionary:
def get_search_parameters(lat,long): #See the Yelp API for more details params = {} params["term"] = "restaurant" params["ll"] = "{},{}".format(str(lat),str(long)) params["radius_filter"] = "2000" params["limit"] = "10" return params
Next we need to build our actual API call. Using the codes from the Manage API access page, we’re going to create an OAuth session. After we have a session, we can make an actual API call using our search parameters. Finally, we take that data and put it into a Python dictionary.
def get_results(params): #Obtain these from Yelp's manage access page consumer_key = "YOUR_KEY" consumer_secret = "YOUR_SECRET" token = "YOUR_TOKEN" token_secret = "YOUR_TOKEN_SECRET" session = rauth.OAuth1Session( consumer_key = consumer_key ,consumer_secret = consumer_secret ,access_token = token ,access_token_secret = token_secret) request = session.get("http://api.yelp.com/v2/search",params=params) #Transforms the JSON API response into a Python dictionary data = request.json() session.close() return data
Now we can put it all together. Since Yelp will only return a max of 40 results at a time, you will likely want to make several API calls if you’re putting together any sort of sizable dataset. Currently, Yelp allows 10,000 API calls per day which should be way more than enough for compiling a dataset! However, when I’m making repeat API calls, I always make sure to rate-limit myself.
Companies with APIs will almost always have mechanisms in place to prevent too many requests from being made at once. Often this is done by IP address. They may have some code in place to only handle X calls in Y time per IP or X concurrent calls per IP, etc. If you rate limit yourself you can increase your chances of always getting back a response.
def main(): locations = [(39.98,-82.98),(42.24,-83.61),(41.33,-89.13)] api_calls = [] for lat,long in locations: params = get_search_parameters(lat,long) api_calls.append(get_results(params)) #Be a good internet citizen and rate-limit yourself time.sleep(1.0) ##Do other processing
At this point you have a list of dictionaries that represent each of the API calls you made. You can then do whatever additional processing you want to each of those dictionaries to extract the information you are interested in.
When working with a new API, I sometimes find it useful to open an interactive Python session and actually play with the API responses in the console. This helps me understand the structure so I can code the logic to find what I’m looking for.
You can get this complete script here. Every API is different, but Yelp is a friendly introduction to the world of making API calls through Python. With this skill you can construct your own datasets from any of the companies with public APIs.
When attempting this script – Python returns
File “API1.py”, line 31
return data
SyntaxError: ‘return’ outside function
28 #Transforms the JSON API response into a Python dictionary
29 data = request.json()
30 session.close()
31 return data
Any thoughts
You need to indent the lines, like in the code block. Python uses whitespace to mark what’s part of a function definition, much like C uses { }. This error is saying that it encountered a `return` but — to its knowledge — it wasn’t inside of a function at all.
Tabbing in line 31 (and the lines above it) makes the `return data` line the last line of the `get_results` function definition.
How would you implement the API in R?
You could use RCurl to download the data and rjson to process the data.
Thanks. I was trying with httR, but it seems to be limited.
What kind of latency are you getting with the api?
It is taking me something like 25 seconds to authenticate and retrieve the data.
Does that sound right?
That doesn’t sound too extreme if you’re pulling a lot of data down. If you want to eliminate variables, you can always just use
curl
with your parameters to see if that’s any faster. You might also try to make a request that will return no data or a request that will intentionally 404 to see where the slow down is.Hello,
I am trying to make that work with my python script, but all my requests are failing with the following error:
———–
{‘error’: {‘description’: ‘Invalid signature. Expected signature base string: GET&http%3A%2F%2Fapi.yelp.com%2Fv2%2Fsearch&location%3Dlocation%26oauth_consumer_key%3D1sqhFxRdGZS1Ksy_tLcLxg%26oauth_nonce%3D80fec8017c0a0834ca225761c2c9246522b81192%26oauth_signature_method%3DHMAC-SHA1%26oauth_timestamp%3D1410534092%26oauth_token%3D6_ewtaaJU5YWM4MTiYBVUOoqmmKWqrOx%26oauth_version%3D1.0%26term%3Dsearchterm’, ‘id’: ‘INVALID_SIGNATURE’, ‘text’: ‘Signature was invalid’}}
————-
Upon inspecting that more closely, I see that oauth_signature is not present in the request, and it is required by yelp’s API. Why is that?
The code:
———
params = { }
params[‘term’] = urllib.parse.quote(query)
params[‘location’] = urllib.parse.quote(self.city)
session = rauth.OAuth1Session(consumer_key=self._consumer_key,
consumer_secret=self._consumer_secret,
access_token=self._token,
access_token_secret=self._token_secret)
response = session.get(“http://api.yelp.com/v2/search”, params=params)
return response.json()
———————
How can I make the request work?
This is most likely an issue with your keys. It means that the signature created by the oauth library does not match what Yelp is expecting. Double check all your keys and make sure that there are no extra special characters. Also compare the URL in the error message to the actual URL you are generating. That may help to pint out differences.
Hi Phillip,
I’m currently using your script to query the Yelp API (thanks), and end up with a block of JSON embedded in a Python list. Like this:
[{u'region': {u'span': {u'latitude_delta': 0.0215103900000031, u'longitude_delta': 0.024831400000010717}, u'center': {u'latitude': 34.00977745, u'longitude': -117.98871299999999}}, u'total': 876, u'businesses': [{u'is_claimed': True, u'distance': 3009.570851637409, u'mobile_url': u'http://m.yelp.com/biz/foo-foo-tei-hacienda-heights', u'rating_img_url': u'http://s3-media4.fl.yelpcdn.com/assets/2/www/img/c2f3dd9799a5/ico/stars/v1/stars_4.png', u'review_count': 1332, u'name': u'Foo Foo Tei', u'rating': 4.0, u'url': u'http://www.yelp.com/biz/foo-foo-tei-hacienda-heights', u'categories': [[u'Japanese', u'japanese']], u'is_closed': False, u'phone': u'6269376585', u'snippet_text': u"This place lives up to its hype!\n\nIf it's your first time here, you'd be surprised with that many ramen soup choices they offer off from menu. Def will try...", u'image_url': u'http://s3-media4.fl.yelpcdn.com/bphoto/sWEFkZw2u39YFSd4qmhlhg/ms.jpg', u'location': {u'city': u'Hacienda Heights', u'display_address': [u'15018 Clark Ave', u'Hacienda Heights, CA 91745'], u'postal_code': u'91745', u'country_code': u'US', u'address': [u'15018 Clark Ave'], u'state_code': u'CA'}, u'display_phone': u'+1-626-937-6585', u'rating_img_url_large': u'http://s3-media2.fl.yelpcdn.com/assets/2/www/img/ccf2b76faa2c/ico/stars/v1/stars_large_4.png', u'id': u'foo-foo-tei-hacienda-heights', u'snippet_image_url': u'http://s3-media3.fl.yelpcdn.com/photo/anEM2GKE1bl26QD1sq00BA/ms.jpg', u'rating_img_url_small': u'http://s3-media4.fl.yelpcdn.com/assets/2/www/img/f62a5be2f902/ico/stars/v1/stars_small_4.png'}]}]
I’m rather new to APIs and the Python language but am trying to learn. Would you be able to provide any hints as to how one can instantiate some Python classes, store this raw output into a readable format, and perhaps even read the data into a dataframe of some sort? I’ve been stuck for quite some time now and am reaching out for some guidance if you could spare some.
Thanks for the help,
Frank Chen
Well using
json.loads
you can easily convert JSON to Python objects. Then you traverse the lists and dictionaries the same way you would any ordinary dictionary or list. In this example, you probably would loop through each of the “businesses” and create a new custom Business object for each item in the list.I’m pretty new to Python, but immersing myself in various aspects. This is going to sound like a silly, dumb question but with those functions written… now what? How do I look at the data that they are returning?
Well the simplest thing you can do is
print
them to the console. But usually when you use an API it’s because you want to do something more interesting with the data. If you just wanted to see the data, you could use the site directly. As far as what that interesting thing is, well, the sky’s the limit!Hi!
Thanks for your code! I’m trying to use it exactly as given, but it does not return anything.
What do I have to put in at the end to get the API calls returned as visible data in my python console? (using python3.4 and ipython notebook). I will then write it to a csv file, but for now I just want to see it in the output here. Thanks!
I removed the code from your comment because API keys should never be shared publicly. However, I was able to confirm that there is an error with your keys. I tried my keys and there was no problem. You might want to try regenerating your keys. Also make sure you are entering the correct key in the correct place in the code. Sorry I can’t offer more specific advice, but this error means there’s an issue with your token.
Ah, yes it works with a new key. Thank you so much!
Thank you very much for the tutorial. It is really helpful. My question is that what if I want to search all the restaurants in Toronto? Should I get the lat and long all the time to do that, or there is another easier way? Thanks!
There are lots of ways to specify location, latitude and longitude is just one. Here’s the documentation, it sounds like the
location
parameter will be generic enough.Thank you very much for your prompt reply! I think setting the location parameter to be Toronto will work. My concern is that you mentioned every time Yelp Api will only return 40 results. So next time if I send the same request (in my case, request restaurants list in Toronto), will the Api return the same 40 results? Thanks!
I wouldn’t rely on the API always returning the same exact 40 results. Yelp may do a number of things to control what results you get, such as if the restaurant is open, the rating, number of reviews, etc.
Hello,
Already I down load the yelp Data in form of json file.
My question is How do I extract the long and lat data point from the json file. I am using python .
would you please advise me
I’m not quite sure what you’re looking for, but take a look at the “Response Values” section of the API docs. I think region.center may be what you’re looking for.
Hi I’m trying to use your code but am getting a TypeError: __init__() got an unexpected keyword argument ‘access_token_secret’ in rauth’s OAuth1Service. I just got a fresh token and token secret and am still getting error. Any help would be greatly appreciated.
My example uses an
OAuth1Session
not anOAuth1Service
. These are really similarly named, but a service uses URLs to obtain tokens and a session passes in tokens that you already have. Try changing to a session and see if that works. Happy programming!Thank you for this tutorial, it’s really helpful and worked quite well for me. In Yelp’s documentation they mention you can get up to 1000 results using the limit and offset parameters. I want to get the most restaurants for a specific city (around 800), and I believe you would need to update those parameters by looping through and updating them, however I am having a hard time getting started. Any advice?
Hi Marissa,
I would probably do something like this:
Hi, Thank you for the tutorial. It is really helpful. your code work correclty for me but return on one restaurant :(, I change only locations = [(45.4301928892447, -73.6253511424274)] in your code
How do i read this in sql?