Chris White's Blog

Software developer from Edinburgh. I mainly do PHP stuff with Laravel and Angular stuff with TypeScript. I like to pair APIs to SPAs while drinking IPAs.

The making of XKCDBot

May 08, 2016 in #reflection #php #slack #bots #slim | | | Share on Google+

I love XKCD and I spend an embarrasing amount of time in various Slack teams (productivity tool... lol). A couple of weeks ago I went on a search for a Slack integration to insert XKCD comics based on some choice phrases. What I was envisoning was basically Giphy for XKCD comics.

Imagine my surprise when I found that no such integration exists. There are some self hosted solutions floating around, but nothing that's integrated as a Slack app and is easy to install. Lame! However, all is not lost. Being a developer who is highly motivated by nerdy side projects that offer no monetary value, I made one myself.

It lives at xkcdbot.com, and it's a simple process of completing the OAuth flow to add it to your Slack team. It registers a slash command, so you just need to /xkcd <search terms> and it will find a relevant XKCD and post it as a message.

XKCDBot preview

For those who aren't familiar with how Slack apps work, all XKCDBot does is register a slash command for your team. It isn't a bot that constantly watches your channels for new messages. When a user in your team uses the /xkcd slash command, Slack's servers send a POST request to a HTTP endpoint that I've set in XKCDBot's settings. In the POST request they send a string of text that followed the command, which would be your search terms. Then on the server side the bot finds a XKCD comic that's relevant to the search terms and returns a specially formed JSON response which defines how the Slack message is formatted. Other than that, there's just some cruft on top to handle the OAuth flow and static pages. Bare bones, dead simple web app that uses nginx and PHP with the Slim framework.

Below is a pretty high level overview of how XKCDBot handles an incoming comic request from Slack.

Overview

Let's break down each part of the application.

The OAuth authentication flow

Before XKCDBot even gets comic requests from your team, you need to authorize it. You may be familiar with the "Add to Slack" buttons that every Slack app under the sun seems to have as the main call to action on their website. When you click this button, it follows the standard OAuth behaviour where you'll be presented with a page and asked to authorize the app against a Slack team. You'll then be redirected back to the app's website with a unique code in a query string parameter, and the app will use that code to request a more permanent access token for your user account on Slack's API.

Depending on what permissions an app uses and what it needs to do on Slack's API, that process can be a bit complicated. XKCDBot is a little bit unique in the way that it doesn't actually need to do anything on behalf of the user authorizing the app, since the only thing it does is register a slash command for your team.

Below is the route that's used to handle a user being redirected back to https://xkcdbot.com/auth after they've authorized the app to access one of their teams.

/**
 * Respond to Slack redirecting back from its OAuth flow.
 */
$app->get('/auth', function (Request $request, Response $response) {
    $provider = new \League\OAuth2\Client\Provider\GenericProvider([
        'clientId'                => getenv('SLACK_CLIENT_ID'),
        'clientSecret'            => getenv('SLACK_CLIENT_SECRET'),
        'redirectUri'             => getenv('SLACK_REDIRECT_URI'),
        'urlAuthorize'            => 'https://slack.com/oauth/authorize',
        'urlAccessToken'          => 'https://slack.com/api/oauth.access',
        'urlResourceOwnerDetails' => 'https://slack.com/api/users.info'
    ]);

    if ($request->getParam('code')) {
        try {
            // We'll just request an access token and do nothing with it, which will complete the OAuth flow.
            $provider->getAccessToken('authorization_code', [
                'code' => $request->getParam('code')
            ]);
        } catch (\League\OAuth2\Client\Provider\Exception\IdentityProviderException $e) {
            // Silently fail... shhhh.
        }
    }

    return $response->withRedirect('/thanks');
});

Most of the complicated OAuth things are taken care of by thephpleague/oauth2-client, all I'm doing myself is requesting an access token and then doing bugger all with it. As it turns out, if you don't request an access token the OAuth flow doesn't complete and XKCDBot's slash command doesn't get registered. Even though I don't actually use the access token for anything. Whatever.

Finding a relevant XKCD

Sweet, so the bot is now registered with your team. Whenever somebody in your Slack team uses the /xkcd slash command, Slack is going to send a POST request to the URL that I've specified in XKCDBot's settings. It's up to me to respond to that POST request within 3 seconds, and the response will be posted as a message in the channel. As part of that POST request, Slack will send a text parameter. If you were to do /xkcd one two, the value of the text parameter is going to be one two. Simples.

So how do I go about finding a relevant XKCD? I built a small "search engine" out of a few classes. The general principle is the search engine has one or more "sources" provided to it. The search engine will search these sources in a priority list until one of them returns a result from the query entered by the user.

My own Google

The search engine class is dead simple, and exposes only one public method: search($terms).

<?php

namespace ChrisWhite\XkcdSlack\Search;

use ChrisWhite\XkcdSlack\Comic\ComicRepository;

class Engine implements EngineInterface  
{
    protected $sources = [];
    protected $comics;

    public function __construct(array $sources, ComicRepository $comics)
    {
        $this->sources = $sources;
        $this->comics = $comics;
    }

    public function search($terms)
    {
        foreach ($this->sources as $source) {
            $comicId = $source->search($terms);

            if (is_null($comicId)) {
                // We didn't get a match, try next source.
                continue;
            }

            return $this->comics->find($comicId);
        }

        // We didn't get a comic match over any of our sources :(
        return null;
    }
}

When I instantiate a new search engine, I provide the various sources through its constructor. Since the sources are only responsible for returning the ID of a comic, we also pass in a ComicRepository which can take this ID and return the JSON representation of an XKCD comic, which is stored in a directory on the server and updated every 10 minutes via a cron job.

If no comic can be found through any of our sources, the search engine just returns null and the calling code is responsible for handling that case.

The sources

Each search engine source is also very simple, and again only exposes a public search($terms) method which accepts the search terms from a user and returns a comic ID, if one can be found.

LocalSource

The local source simply scans a local directory of XKCD comics, attempting to match the user's search terms against a comic title. If it finds a match, we can be realtively certain that the comic it has found is the one the user is looking for, so we return its ID back to the search engine. Note from the snippet below that this source uses a generator to scan the comics directory. XKCD has thousands of comics, and the bot is storing the JSON representation of each one. Copying the JSON representation of each comic into memory, while doable, would be a pretty big waste of resources. Generators solve that problem by only loading one comic into memory at a time. Yay, scale!

public function search($terms)  
{
    $searchTerms = strtolower($terms);

    foreach ($this->comics() as $comic) {
        if (strtolower($comic['safe_title']) === $searchTerms) {
            return $comic['num'];
        }
    }

    return null;
}

protected function comics()  
{
    $dir = new \DirectoryIterator($this->comicsDirectory);

    foreach ($dir as $file) {
        if ($file->isDir() || $file->isDot()) continue;

        yield json_decode(file_get_contents($file->getPathname()), true);
    }
}

BingSource

This is the source that provides most of the "relevance" in "find a relevant XKCD". If the local source can't find a comic by its title, the bot will fall back to searching the web with Bing. Now I know what you're thinking. Bing has a reputation for innacuracy compared to other search engines but it's free for up to 5,000 queries per month, provides a decent API and returns results that are good enough for this bot.

I wrote a small API wrapper around the Bing API in order to perform web searches in a fluent way, and I dependency inject this wrapper into BingSource. When I call the performWebSearch() method, I'm using this wrapper.

Thanks to the way search engines work in general, the top result is always the most relevant. After performing the web search, the bot will take the first result and parse the comic ID out of the URL, and then pass it back to the search engine. You can see I'm using an advanced operator, site:xkcd.com, to make Bing only search the XKCD website.

public function search($terms)  
{
    $terms = 'site:xkcd.com '.$terms;

    $bingResults = $this->client->performWebSearch($terms);

    if (!isset($bingResults['d']['results'][0])) {
        // No results from Bing :(
        return null;
    }

    $firstResult = $bingResults['d']['results'][0];

    return $this->parseComicId($firstResult);
}

protected function parseComicId($result)  
{
    $url = $result['Url'];

    preg_match('~^.+xkcd.com\/([0-9]+)\/$~', $url, $matches);

    if (isset($matches[1])) {
        return $matches[1];
    }

    return null;
}

Time to decorate

The search engine that I described above works well functionaly, and was surprisingly good at finding relevant XKCDs thanks to it falling back to a Bing search. The problem with searching via Bing is the posibility of hitting the Bing API's rate limit. As I said, you get 5,000 queries for free per month which is around 166 per day. Although I'm not expecting the bot to get that popular, if it ever does I can see myself blowing right past that limit and into paid Bing territory which starts at £13 per month for 10,000 queries.

As well as rate limits, falling back to sending HTTP requests for most search terms (let's be honest - most won't match a comic title exactly) isn't a very good solution. I have just three seconds to return a response from Slack's POST request, and a HTTP request to Bing's API would take up a good chunk of that time. Clearly some sort of caching was required.

Enter the decorator pattern! This is one of my favourite patterns in software engineering, and is pretty much perfect for adding a simple caching layer to a component like the search engine.

I created another class called CachedEngine, which takes an instance of the search engine and an instance of a cache through its constructor. The cache I decided to use was Redis, as I just needed a simple and fast key value store. To abstract away Redis's complexity, I used the Predis package.

The CachedEngine class has exactly the same method signatures as the real search engine, and just wraps it with some caching functionality:

public function search($terms)  
{
    // Attempt to retrieve the comic from Redis.
    $cachedValue = $this->client->get($terms);

    if ($cachedValue) {
        return unserialize($cachedValue);
    }

    // Cache miss - do a real search and add the result to cache.
    $freshValue = $this->engine->search($terms);

    $this->client->set($terms, serialize($freshValue));

    return $freshValue;
}

And that's it!

This was a pretty simple but really fun project. Slack has made it really easy to create integrations, which is evidenced by the rich variety of apps in their directory. Feel free to give XKCDBot a spin and tell me what you think!

May 08, 2016 in #reflection #php #slack #bots #slim | | | Share on Google+