Variously CORS

26 August 2014

To make things snappier for web clients, all our APIs at work support CORS. Avid readers of the MetaBroadcast blog will remember that we've already talked about the fun and games this involves, but today I'm going to cover a particular subtlety that we missed previously.

Less-than-helpful error messages are less-than-helpful

Debugging CORS problems is Not Fun. Browsers don't deliberately mislead, but they certainly don't go out of their way to give helpful error messages. So when we had an intermittent problem with a small number of requests failing, seemingly randomly, it proved a little tricky to hunt down.

Before I go into the detail, I'll do a quick recap on CORS. CORS, cross-origin resource sharing, allows a website to request content from others. Validation is performed to avoid those age-old cross-site scripting vulnerabilities. It has a number of benefits over JSONP, an older alternative, such as performance and support of a wider set of HTTP verbs.

In essence, a server tells a web browser whether it's allowed to make a request, based on which website is making the call. For example, if I've developed a site at http://greattvmetadata.com/ which makes calls to Voila, Voila will check whether a given API key is allowed to get content when called from http://greattvmetadata.com/. Max already covered most of the server-side changes needed in that earlier post: adding number of headers telling a web browser which request types are allowed, and from what hosts.

So, back to the problem we were having. Occasionally, apparently at random, some requests were failing with the usual, not very informative, "something went wrong with CORS" error. We checked the server configuration, checked for any concurrency problems on the server-side when updating said configuration, nothing.

Vary-ious caching

After a little pondering, we finally figured out what was going on. Caching. Web browsers – and proxies between browsers and origin servers – will cache responses from origin servers. To do this they associate a response from a web server with a key, for example:

"https://voila.metabroadcast.com/1.0/content?id=cf2" => lots of data about EastEnders

The trick is deciding the key to use when caching responses. Obviously that involves the request URI, but with the likes of content negotiation, that's not quite good enough. Step forward the Vary response header. This chap tells clients what request headers a web server will change the response for. Accept is a common one to include, to cover that content negotiation case.

How does this affect CORS? Well, the web browser sends an Origin header to tell the server which website has made the request. The server then decides whether to allow the request, and sets a response header Access-Control-Allow-Origin if so. Often a server just responds with * which means requests originating from any website are allowed. We need to be more selective, though, so we respond with the client's Origin header if the server is happy to carry out the request.

You got it: the response varies depending on the Origin header. As we weren't telling the client that through the Vary header it had no idea. This is what was happening:

  1. Client requests https://voila.metabroadcast.com/1.0/content?id=cf2 from website https://oursite-stage.metabroadcast.com
  2. Server responds with details about upcoming episodes of EastEnders, and includes the header Access-Control-Allow-Origin: https://oursite-stage.metabroadcast.com
  3. We then switch to using the production website at https://oursite.metabroadcast.com (in the same browser)
  4. When a request is made for https://voila.metabroadcast.com/1.0/content?id=cf2 the browser decides it already has a cache of that, so returns it instead
  5. BUT that cached response has the header Access-Control-Allow-Origin: https://oursite-stage.metabroadcast.com, so the web browser denies the request with a CORS error

By adding the header Vary: Origin, the browser knows to include the value of the Origin request header in its cache key, so it'll look like

"Origin: https://oursite.metabroadcast.com,https://voila.metabroadcast.com/1.0/content?id=cf2" => lots of data about EastEnders

and so not share the response across different client URIs. A one-line change, and no more CORS errors!

In summary: if you're doing CORS and returning specific hosts in your Access-Control-Allow-Origin response header, make damn sure to include Origin in the the Vary response header too or you'll be in a world of pain.


This originally appeared on our company blog. Picture credit: A Syn



blog comments powered by Disqus