3 Tips to Make Your API Dazzlingly Fast

7 min readMar 6, 2019

We all want fast APIs. We want to see the charts in NewRelic or Heroku show high request volume and low latency. In this article I’m going to show you three tips to make sure your most critical API endpoints are dazzlingly fast.

Make Your APIs Faster Than This Time Lapse of a Highway — Photo by zhang kaiyv from Pexels

At Drift, we have a few critical endpoints that we call “critical path”. If these are slow, then something critical to Drift is hurting. The example that I’m going to use in this article is the “create a new message” endpoint.

Example Setup

from flask import Flask, request, session
...from drift.message.utils import extract_message_params
from drift.conversation.models import Conversation
from drift.geo import ip_based_lookup
from drift.routing import do_routing_for_conversationapp = Flask(__name__)class Message(MyORMBaseClass):    """This class represents a message row in the DB."""
    ...
@app.route('/messages', methods=['POST'])
def create_new_message:
    message_params = extract_message_params(request.json)
    conversation = Conversation.find(message_params['conversation_id']) or Conversation.create()
    message = Message.create(conversation=conversation, **message_params)
    location = ip_based_lookup(request.remote_addr)
    message.location = location
    if not conversation.participants:
        do_routing_for_conversation(conversation)
    return message, 201

There are a few things to note here:

This isn’t how Drift actually works, it’s just a contrived example
There are a few things we’re missing here from any normal API (authentication, error handling, etc.)
We have a couple functions that I won’t be going into for the sake of brevity (let me know if they’re unclear and I can clarify them)

So we have our endpoint in production. It’s getting a ton of use. But we’re pulled aside by the director of engineering and told that the median response time is 1.2 seconds! 😱 Our director says she can show us the timing of the endpoint broken down by line of code and this is what it looks like:

message_params = extract_message_params(request.json)  # 80ms
conversation = Conversation.find(message_params[‘conversation_id’]) or Conversation.create()  # 10ms
message = Message.create(conversation=conversation, **message_params)  # 125ms
location = ip_based_lookup(request.remote_addr)  # 380ms
if not conversation.participants:  # 0.2ms
    do_routing_for_conversation(conversation)  # 605ms

What do we do first? It’s hard not to go with something we know. For example, if I spent a year and a half writing a JSON parser in python, I want to say “We can definitely shave time off of parsing JSON”. The answer, though, should be that we need to order the work by “biggest payoff first”. This means we should try to optimize routing first.

Let’s say, for one reason or another, that we can’t currently make routing faster. Maybe it’s too big a project, maybe the routing team is busy. Whatever the reason, we can’t make do_routing_for_conversation any faster. This brings me to the first method you can use.

Defer Processing to Threads

If we think of our endpoint, it has the job “create a new message”. It isn’t our endpoint’s job to do routing for a conversation. Just because we need to do it, doesn’t mean we need to do it now. What if we did routing in a separate thread, since it’s not changing the object that’s being returned (the message)?

Let’s make a small utility that allows us to spin up separate threads that still have the Flask app context:

from threading import Thread
from flask import current_appclass FlaskThread(Thread):
    """A utility class for threading in a flask app."""
    def __init__(self, *args, **kwargs):
        """Create a new thread with a flask context."""
        super().__init__(*args, **kwargs)
        self.app = current_app._get_current_object()    def run(self):
        """Run the thread."""
        # Make this an effective no-op if we're testing.
        if not self.app.config['TESTING']:
            with self.app.app_context():
               super().run()

Then we can use that helper class like this:

...
from drift.utils import FlaskThread
...@app.route('/messages', methods=['POST'])
...    if not conversation.participants:
        th = FlaskThread(target=do_routing_for_conversation,
                         args=(conversation,))
        th.start()
    return message, 201

And now our endpoint is much faster!

But, we now have a scenario where we’re manually spinning up new threads every time we create a message with that endpoint. While this isn’t always bad, we should at least be aware of some of its drawbacks. In our case, we should worry that a spike in traffic will cause our server to spin up two threads per message create instead of one. This could cause our CPU to hit a ceiling and slow down processing for other things.

So threads are a useful tool, but might be best used for a lower traffic endpoint. One that has a slow operation that we still want a quick response from.

Process Work on a Queue

If you need real time, distributed work, then you’re going to need queues. They give you a lot of benefits: asynchronous work, speed, separation of concerns, easy distribution & fanout. The cost is that we will have to debug our problems on a distributed system. This is easy as long as your queue consumers don’t share resources.

Routing a new Drift conversation to the right Drift user is a great example of when to use queues to our advantage because:

1) it takes a long time

2) we don’t need to route before creating the message

At Drift, we use AWS resources as much as we can. For queueing, we send queue messages to SNS topics and let them “fan out” to SQS queues. For python, you can use a library like Celery, RQ, or Pyres to easily manage queue consumers. The idea is that you make consumers for things that need to happen ASAP, but don’t need to block message creation. With queues, your code might end up looking like this:

@app.route('/messages', methods=['POST'])
def create_new_message:
    message_params = extract_message_params(request.json)
    conversation = Conversation.find(message_params['conversation_id']) or Conversation.create()
    message = Message.create(conversation=conversation, **message_params)
    location = ip_based_lookup(request.remote_addr)
    message.location = location
    send_message_to_sns(message) # 25ms average
    return message, 201

While sending the message to SNS is in the critical path, it allowed us to move routing a new conversation to a queue consumer. This means that we cut a 605ms operation to 25ms and the work of routing a conversation is deferred to a worker. We just cut our endpoint’s median response time in half!

Now that we cut down speed by moving routing out of the critical path, let’s focus on the next biggest bottleneck: IP based location lookup. For this example, we use a lookup database, which brings me to my final strategy:

Know Your Database Performance

One of the most important things you can know is the performance of your database queries. Let’s take a look under the hood of our contrived example. We’ve got two tables in PostgreSQL:

create table geo_id_to_name (
  geoname_id bigint UNIQUE,
  locale_code varchar,
  continent_code varchar,
  continent_name varchar,
  country_iso_code varchar,
  country_name varchar,
  subdivision_1_iso_code varchar,
  subdivision_1_name varchar,
  subdivision_2_iso_code varchar,
  subdivision_2_name varchar,
  city_name varchar,
  metro_code varchar,
  time_zone varchar,
  is_in_european_union boolean
);create table ip_to_geo_id (
  network cidr,
  geoname_id bigint references geo_id_to_name(geoname_id),
  registered_country_geoname_id bigint,
  represented_country_geoname_id bigint,
  is_anonymous_proxy boolean,
  is_satellite_provider boolean,
  postal_code varchar(50),
  latitude float(20),
  longitude float(20),
  accuracy_radius bigint
);

And our ip_based_lookup function does this:

SELECT * FROM geo_id_to_name NATURAL JOIN ip_to_geo_id where network >>= inet('IP ADDRESS HERE');

So what’s making that query so slow if that’s all the ip lookup is doing? We could analyze the python code for speed, but let’s use some intuition. Anytime you see a simple database query take a long time (more than 5ms), an alarm should go off in your head. “Where’s the index?” Your inner self should say. If you look at those tables, you might be horrified to find that there is no index 😱.

To understand why indexing is important, let’s have SQL explain these queries to us. This is the plan for that query before we add the indices:

Let’s add an index on our “ip_to_geo_id” table. We’re looking up a column of type CIDR. What is that and how should we index it? Lucky for us, PostgreSQL has great docs on what a CIDR type is, what operations you can use on it, and how to index those operations. For our index, we want this:

create index ix_inet_ops_network on ip_to_geo_id using gist (network inet_ops);

And here’s a look at the explain after adding the index!

After indexing, that line of code went from 380ms on average to 10ms on average!

You may not be using PostgreSQL, but the lesson is the same: make sure you know and take advantage of your database’s strengths.

In Conclusion

We talked about 3 ways to make your API endpoints faster. Keep in mind that the first two methods didn’t make code and faster, they just moved the processing somewhere else. Either a new thread or a queue worked is now doing the slow, not immediate work.

You should be viewing the amount I covered the three topics as introductory and not as a reference. There are a lot of nuances that I didn’t cover. If you’re interested in either of the first two topics, check out this distributed systems book. If you want to know more about SQL query optimizations, this website is a wonderful (and free) reference.

Happy coding! ⚡️