2018-February-10
I recently came across Spark, a micro framework for developing web applications for the JVM. I decided to try it out from Clojure by writing a URL shortening service. This post will walk you through the actual implementation of the service. In that proecess, we will create a Clojure wrapper for Spark's programming interface. We will also explore some ideas around configuring and scaling the service.
The technique for shortening a long URL is quite simple ‐ convert the hash of the URL to a base-62 value. This is accomplished by the following code:
(def ^String base62-lookup "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
(defn decimal->b62
[n]
(loop [n n, rs []]
(if (zero? n)
(clojure.string/join (reverse rs))
(let [[q r] [(quot n 62) (rem n 62)]]
(recur q (conj rs (.charAt base62-lookup (Math/abs r))))))))
(defn encode
[^String url]
(decimal->b62 (.hashCode url)))
Calling encode
on a URL will return the shortened version of it:
user> (encode "http://sparkjava.com/documentation")
;; "AuV1V"
The service exposes two HTTP API endpoints. One is for shortening a URL. This endpoint will internally invoke the encode
function that was defined in the preceding section.
The second endpoint will receive a value generated by encode
and return a
redirect to the original URL.
Here is the specification for the API:
POST /
Request:
Content-Type: application/json
Body: {"url": "a_url"}
Response on Success:
Status: 200 OK
Content-Type: application/json
Body: {"hash": "hash_of_url"}
GET /:hash
Response on Success:
Status: 302 Found
Location: "original_url_from_which_hash_was_generated"
Response if :hash was not generated here:
Status: 404 Not found
A Spark application uses a set of routes to declare its public interface. The HTTP API that we described above can be
implemented with the post
and get
routes:
(defn -main
[]
(Spark/post "/" (make-handler post-handler))
(Spark/get "/:hash" (make-handler get-handler)))
A route has three components ‐ the HTTP verb (get, post, put etc), a path ("/"
, "/:hash"
) and a handler.
The handler must be an implementation of a Java interface named Route
. This interface has a single method called handle
which is invoked to process a client request and generate a response. The convenience function
make-handler
is used to generate an implementation of Route
. This takes the actual handler function as argument
and arranges the Route
's handle
method to call it.
(defn make-handler
[f]
(reify Route
(handle [_ ^Request request ^Response response]
(f request response))))
The Request
object contains information
about the HTTP request, like its headers, content type, body etc. The Response
object expose methods that the handler can
call to generate a valid HTTP response.
Now we can proceed to implement the handler functions themselves. First, we will define the handler for the POST
request. This handler will read the JSON encoded request body, parse it to extract the URL and generate the base62 hash of the URL.
This hash is then send back in the response.
The function will also map the hash to the original URL in an in-memory lookup table.
The initial implementation of the POST
handler is shown below:
(def db (atom {}))
(defn post-handler
[request response]
(let [r (cheshire.core/parse-string (.body request) true)
url (:url r)
short-url (encode url)]
(swap! db assoc short-url url)
(.status response 200)
(.header response "Content-Type" "application/json")
(cheshire.core/generate-string {:hash short-url})))
The GET
handler receives a hash (or short-url) as input. This hash will be used as the key to query the lookup table.
If a URL is found mapped to this key, a redirect is generated for this URL. If no mapping is found,
an HTTP 404 (Not Found) response is returned.
(defn get-handler
[request response]
(if-let [url (get @db (.params request ":hash"))]
(.redirect response url)
(.status response 404)))
The first version of the URL shortening service is ready! You can download the complete project here.
Execute lein run
from the extracted project folder. The service should come up and start listening for incoming
HTTP requests on port 4567
. Here are a few curl
sessions to test the service:
$ curl -v -X POST -d '{"url": "http://sparkjava.com/documentation#getting-started"}'\
-H 'Content-Type: application/json' 'http://localhost:4567'
HTTP/1.1 200 OK
{"shortUrl":"1BqMVO"}
$ curl -v 'http://localhost:4567/1BqMVO'
HTTP/1.1 302 Found
Location: http://sparkjava.com/documentation#getting-started
One problem with the current implementation is that the hash->url mappings are stored in the memory of the service itself. If the JVM is shutdown, all data is lost and the users of the service will not be very happy :-). Moreover, it becomes impossible to scale the service by running multiple instances behind a load-balancer. So it is necessary to add a storage that can be shared by multiple instances of the service. This could be an RDBMS server like MySQL or a key-value store like Couchbase. I will use Couchbase for this example.
It is straightforward to make the service to talk to a data store like Couchbase, which just maps a string key to a string value.
This model is similar to the one used by the current in-memory store.
We can add a store
abstraction to the service which internally uses
the Couchbase client library for Clojure to talk to a Couchbase cluster:
(ns url-shortner.store
(:require [couchbase-clj.client :as cb]))
(defn open-connection
[props]
(cb/create-client props))
(defn close-connection
[conn]
(when conn
(cb/shutdown conn)))
(defn set-data
[conn k v]
(cb/set conn k v))
(defn get-data
[conn k]
(cb/get conn k))
The props
argument passed to open-connection
is a Clojure map that specifies the configuration (username, server urls etc)
required to connect to the Couchbase cluster. This configuration may change from one deployment site to another. That means, we need our service
to be able to dynamically load site-specific configuration information. An easy way to manage configuration is to encode it in
EDN format. This will allow the application to reuse
Clojure's built-in reader and parser to load and decode the configuration, as shown below:
(def config (read-string (slurp "./config.edn")))
(def db (store/open-connection (:store config)))
The contents of the configuration file is:
;; config.edn
{:web-server-port 8000
:store
{:username "Administrator"
:bucket "default"
:uris ["http://localhost:8091/pools"]}}
Note that we have made the port on which the service listens for incoming requests configurable as well.
Now we should update post-handler
to store the mapping in the remote store:
(store/set-data db short-url url)
get-handler
can lookup its response as:
(store/get-data db (.params request ":hash"))]
The -main
function has to be updated to start the server on the configured port:
(Spark/port (:web-server-port config))
Spark is not a framework designed with Clojure in mind. So it's a good idea to write a simpler and more idiomatic Clojure
interface on top of Spark. This interface should expose the Request
and Response
objects as native
Clojure data structures. The route specification should directly accept functions instead of implementations of the Route
interface.
The handlers should also be more functional in their behavior ‐ accept a request map and return a response map.
I wrote a Clojure wrapper for Spark that does all this. It provides enough abstractions for the web layer to be re-written in better Clojure style.
The code for the new web layer that makes use of this wrapper is reproduced below:
(ns url-shortner.core
(:require [cheshire.core :as json]
[url-shortner.encoder :as e]
[url-shortner.spark :refer :all]
[url-shortner.store :as store]))
(def config (read-string (slurp "./config.edn")))
(def db (store/open-connection (:store config)))
(defn post-handler
[request]
(let [r (json/parse-string (:body request) true)
url (:url r)
short-url (e/encode url)]
(store/set-data db short-url url)
{:status 200
:headers {:Content-Type "application/json"}
:body (json/generate-string {:hash short-url})}))
(defn get-handler
[request]
(if-let [url (store/get-data db (:hash (:params request)))]
{:status 302
:headers {:Location url}}
{:status 404
:body "Not Found"}))
(defn -main
[]
(port! (:web-server-port config))
(GET "/:hash" get-handler)
(POST "/" post-handler))
The application code no longer has to deal directly with low-level Java abstractions. Instead all request handling is implemented using first-class Clojure data structures and functions.
The complete source code for the updated service can be downloaded here.
Now you can start multiple instances of the service, by configuring a unique port number for each. Also make sure the
:store
configuration can connect the service to a running Couchbase cluster.
You can distribute POST and GET requests across these instances and see them serving the requests from data in
the shared data store.
Let us finish this post by automating the task of load-balancing between the several instances of the service. Nginx is a popular HTTP server and load balancer. To test load-balancing on my development box, I added the following to the local nginx configuration:
http {
upstream localhost {
server localhost:8000;
server localhost:8002;
}
server {
listen 8080;
location / {
proxy_pass http://localhost;
}
}
}
The above configuration basically means nginx will accept connections on port 8080 and forward those to one of the service instances running on port 8000 and 8002 of the same machine. The load-balancing method will be round-robin, which is the default.
Start two instances of the URL shortener, one on port 8000 and the other on 8002 and restart nginx.
The calls for POST http://localhost:8080
and GET http://localhost:8080/:hash
will be distributed between the two instances by nginx. To scale the service, start new instances
and add them to the upstream
configuration.
Spark provides a simple and clean interface for writing HTTP based services that can be easily integrated with any language running on the JVM. Adding a functional wrapper on top of its basic interface definitely makes it more appealing for server-side development in Clojure.