Every project that starts growing needs to track down performance issues and bottlenecks, audiobox.fm is no exception.
However there are cases when the cure is worst than the disease, this is one.
We do streaming, a lot of it, the entire concept of audiobox.fm revolves around streaming your media over the Internet and it should be fast as possible.
A solution was needed to store and stream this content privately, after some evaluations we eventually settled on Amazon S3 and things shaped up pretty nicely from then, instant streaming even from Europe by fetching data contained in an US bucket.
I then received an email from Amazon, in which they explained they opened up CloudFront, a content delivery network working over HTTP and RTMP.
The advantages over the traditional fetch-from-bucket system is that the content is served from the nearest cloud to the requesting user, thus greatly reducing latency.
It’s all fun and games until I noticed that the streaming was actually slower then fetching from the US bucket. It may be that the servers are suffering a heavy load, but I think there’s more.
I started digging around and I think I have some explanation to that, making audiobox a non-use case.
A bit of background
The doubts in using CloudFront started when I was investigating the possibility to use this CDN as an asset server, serving css/javascript to the end user; however I did noticed that many users on Amazon forums started to ask why their assets were not in sync with the actual content of the bucket after an update.
The answer is simple, while CloudFront fetches from the S3 bucket, it does cache the file, which is in fact the purpose of a CDN.
It’s not possible, at the moment, to manually expiry a file to be re-fetched from the bucket, instead the developer is asked to either:
- wait 24 hours
- rename the asset
So?
While many developers try to find a way to expire their assets we have the opposite problem, we would like to see them there stored forever (or at least for a long time).
When making an initial request to CloudFront, the system checks the existance of the file on the bucket and then it gets transferred using the internal Amazon resources to the nearest point.
If I ask a file when I’m in Europe, CloudFront will fetch the file from the US before serving it to me, thus adding a overhead in our request.
A 25MB file needs to travel through Amazon internal network, getting stored on CloudFront server and then served to me.
Now, this solution is super-fine when streaming content to the general public, let’s say a video, because it will be requested multiple times and chances are that the file is already in the geographically located CloudFront distribution.
But for a private collection of audio, say 1000 files, this is impratical because records will expire in 24 hours. Files don’t get stored on the CDN for more. There will always be a “fetch-again” from the US bucket, adding the extra overhead.
We will continue monitoring and testing CloudFront, but for our use there are disadvantages:
- slower streaming for a private, single-user cloud
- sometimes stream starts when the file has been full downloaded
- copy on CDN is useless because chances are that if the user wants to listen to a file again (in the 24 hours range) browser cache will help there, thus making no request to the CDN, rendering useless its purpose
- effective cost (CloudFront is not free)
- coupled with the fact that Safari/Webkit suffers a HTML 5 bug where the audio and video tag src gets requested twice (even three times sometimes) it’s killer
The ideal solution would be that CloudFront proactively mirrors buckets in every of its geographic location, but that will never happen for many reasons.
$1.99 domains with SSL purchase!