In CDN land there is a super debate over the number of PoPs (Point of Presence) required to get best performance for your website.
Here is the match up:
In the Left Corner, wearing blue trunks, is the all-time heavyweight argument of more PoPs is better. This argument states that having a wider distribution of lots of PoPs gets your content closer to the end user and thus speeds up the delivery
In the Right Corner, wearing multi-coloured trunks, is the peoples favourite of Super PoPs are better. This argument states that fewer PoPs but placed in strategic Internet Peering locations will deliver a superior performance for your website.
Having these arguments settled by coming out of two corners to duke it out in the ring sounds like fun but in reality, this will be settled mathematically.
When do PoPs Cache?
Firstly, let’s investigate when a PoP comes into play when serving a web page.
When a user requests a webpage for the very first time, they will not receive anything from the cache PoP. This is because the PoP cache is not primed. The PoP has not seen the content before and wont have a copy so has to go back to the origin to fetch the content. As it serves out the content it may keep a copy of the files to serve to the next person who requests the page.
In a simple configuration this would look as follows:
This shows that the first request for the page will go back to origin while the second and subsequent requests for the page will be served cached content from the CDN. All other factors being equal, the pages will load faster for the users and there will be less compute cost (and time) from the origin servers.
Mathematically, with one PoP, One Page (content wholly cacheable) and no cache expiry (see following), the cache hit ratio for say, 100 pages served would be 99%. 99 of 100 pages served would have their content served from the PoP.
To keep the above simple, there are some very key assumptions made;
- There is one CDN PoP
- The 2nd and subsequent users are requesting the same webpage
As soon as you start to flex either of these assumptions, the game changes significantly. Flexing both becomes even more fun.
More Than One Webpage Served
The majority of websites (but not all) do have more than one page which we serve to our browsers so the first page scenario described above will apply to every page served. We also need to consider that the cache life of assets for each page is limited. Due to a variety of factors, the cache life (or the time the asset will be stored in the PoP) will expire.
let’s say we have 1000 requests (R) in one day for 100 different pages (P) and the cache expiry (ET) on the content in those pages is 12 hours. let’s also assume that the pages are requested equally by 50 users. The Priming Page (PP) will prime the PoP on the first page served ie PP=1. The cache hit ratio in a 24 hour period we would expect (with a single PoP) would be:
(R/P-(PP*24/ET))/(R/P) or 80%
As the number requests increase or the ET increases, the cache hit ratio increases.
In reality most websites do not have an equal spread of pages served to customers, there will be some pages which have a much higher hit rate (like home page or category pages) with other pages having a much lower hit rate as they are requested less frequently (like a specific product page).
More Than One PoP
Now the maths starts to increase in fun. Just to keep things simple again, let’s assume you engage a CDN with two PoPs NP. And we have the same number of pages, requests and cache expiry time as above. Maybe we have half of your users connecting to one PoP and half of your users to the other.
In order to serve cached content, each PoP must be primed first. Requests per PoP becomes relevant.
What does this mean for our cache hit ratio?
(R/P-(PPNP24/ET))/(R/P) or 60%
Doubling the number of PoPs has decreased our cache hit ratio.
What About at Larger Scale?
What about a 2000 page website with say 3m pages viewed per month (100,000 per day) with a 12 hour cache expiry.
- 1 PoP = Theoretical Cache Hit Ratio of 96%
- 10 PoPs = Theoretical Cache Hit Ratio of 60%
- 100 PoPs = No content cached.
The above assumes an even distribution of pages to each PoP, an even distribution of content served throughout the day, and an even distribution of each type of page served.
What we find in practice is that there will be a weighting of different pages served. Category and Home pages are served more often than individual product and specific content pages. Graphically, this will look like the following:
There may be a similar shaped graph for the distribution of content through PoPs depending on the number and location of the PoPs. And, more than likely, the traffic served will peak and trough at various times of the day.
So it is the intersection of these curves which really determine the cache hit ratio rather than the simple linear equations I have outlined above. However, the number of PoPs, the number of pages served and the variety of different pages served are still key drivers of the cache hit ratio for a website. As a general rule;
- Less PoPs = Higher hit rate
- More Pages Served = Higher hit rate
- Fewer types of page served = Higher hit rate
- Greater concentration of pages served at particular times of day = Higher hit rate
When is More PoPs Better?
More PoPs is better when you have fewer types of page, lots of pages served and a very widely distributed website audience. Only in this instance can you get the PoP cache filled enough to give you the hit rate which means the short distance between the PoP and the end user can come into play. If there is a PoP close to a user and it’s not filled with the content the user wants, there might as well not be a PoP there in the first place!
Do You Want it All? More PoPs and a Higher Hit Rate – Yes Please
Is it possible to have a distribution of PoPs and a hit cache hit rate? Yes it is. For this you need a shared cache among the PoPs so that if the requested content is not present at one of the PoPs closest to the user, the content can be fetched from one of the other nodes rather than back to the origin servers. While this means the content for a first request is not always served from the PoP closest to the user, at least the origin infrastructure is not tasked with producing and sending content which is already out there (somewhere!). This would look like:
Whats the right PoPs answer? 42?
So when it comes to the right number of PoPs to run your website through, you should consider the volume of traffic your site serves, the depth or breadth of the pages and objects you serve and the geographic spread of your users.
Assuming you have your HTML and static asset cache control headers all correctly deployed, ultimately probably the best offload and performance PoP piece you can include in your caching network is a shared cache.