tag Why does Windows Azure not scale?




get-azurermvm tags (5)

I am trying to scale websites on Widows Azure. So far I‘ve tested Wordpress, Ghost (Blog) and a plain HTML site and it’s all the same: If I scale them up (add instances), they don’t get any faster. I am sure I must do something wrong… This is what I did:

  1. I created a new shared website, with a plain HTML Bootstrap template on it. http://demobootstrapsite.azurewebsites.net/
  2. Then I installed ab.exe from the Apache Project on a hosted bare metal server (4 Cores, 12 GB RAM, 100 MBit)

I ran the test two times. The first time with a single shared instance and the second time with two shared instances using this command:

ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/

This means ab.exe is going to create 10000 requests with 100 parallel threads.

I expected the response times of the test with two shared instances to be significantly lower than the response times with just one shared instance. But the mean time per request even rised a bit from 1452.519 ms with one shared instance to 1460.631 ms with two shared instances. Later I even ran the site on 8 shared instances with no effect at all. My first thought was that maybe the shared instances are the problem. So I put the site on a standard VM and ran the test again. But the problems remain the same. Also adding more instances didn’t make the site any faster (even a bit slower).

Later I‘ve whatched a Video with Scott Hanselman and Stefan Schackow in which they‘ve explained the Azure Scaling features. Stefan says that Azure has a kind of „sticky loadbalancing“ which will redirect a client always to the same instance/VM to avoid compatibility problems with statefull applications. So I‘ve checked the WebServer logs and I found a Logfile for every instance with about the same size. Usually that means that every instance was used during the test..

PS: During the test run I‘ve checked the response time oft the website from my local computer (from a different network than the server) and the response times were about 1.5s.

Here are the test results:

###################################### 
1 instance result
###################################### 

PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests


Server Software:        Microsoft-IIS/8.0
Server Hostname:        demobootstrapsite.azurewebsites.net
Server Port:            80

Document Path:          /
Document Length:        16396 bytes

Concurrency Level:      100
Time taken for tests:   145.252 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      168800000 bytes
HTML transferred:       163960000 bytes
Requests per second:    68.85 [#/sec] (mean)
Time per request:       1452.519 [ms] (mean)
Time per request:       14.525 [ms] (mean, across all concurrent requests)
Transfer rate:          1134.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   14   8.1     16      78
Processing:    47 1430  93.9   1435    1622
Waiting:       16  705 399.3    702    1544
Total:         62 1445  94.1   1451    1638

Percentage of the requests served within a certain time (ms)
  50%   1451
  66%   1466
  75%   1482
  80%   1498
  90%   1513
  95%   1529
  98%   1544
  99%   1560
 100%   1638 (longest request)

###################################### 
2 instances result
###################################### 

PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests


Server Software:        Microsoft-IIS/8.0
Server Hostname:        demobootstrapsite.azurewebsites.net
Server Port:            80

Document Path:          /
Document Length:        16396 bytes

Concurrency Level:      100
Time taken for tests:   146.063 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      168800046 bytes
HTML transferred:       163960000 bytes
Requests per second:    68.46 [#/sec] (mean)
Time per request:       1460.631 [ms] (mean)
Time per request:       14.606 [ms] (mean, across all concurrent requests)
Transfer rate:          1128.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   14   8.1     16      78
Processing:    31 1439  92.8   1451    1607
Waiting:       16  712 402.5    702    1529
Total:         47 1453  92.9   1466    1622

Percentage of the requests served within a certain time (ms)
  50%   1466
  66%   1482
  75%   1482
  80%   1498
  90%   1513
  95%   1529
  98%   1544
  99%   1560
 100%   1622 (longest request)


Given the complete absence in the question of the most important detail of such a test, it sounds to me you are merely testing your Internet connection bandwidth. 10 Mb/sec is a very common rate.

No, it doesn't scale.


"Scaling" the website in terms of resources adds more capacity to accept more requests, and won't increase the speed at which a single capacity instance can perform when not overloaded.

For example; assume a Small VM can accept 100 requests per second, processing each request at 1000ms, (and if it was 101 requests per second, each request would start to slow down to say 1500ms) then scaling to more Small VMs won't increase the speed at which a single request can be processed, it just raises us to accepting 200 requests per second under 1000ms each (as now both machines are not overloaded).

For per-request performance; the code itself (and CPU performance of the Azure VM) will impact how quickly a single request can be executed.


Some ideas:

  • Is Azure throttling to prevent a DOS attack? You are making a hell of a lot of requests from one location to a single page.
  • Try Small sized Web Sites rather than shared. Capacity and Scaling might be quite different. Load of 50 requests/sec doesn't seem terrible for a shared service.
  • Try to identify where that time is going. 1.4s is a really long time.
  • Run load tests from several different machines simultaneously, to determine if there's throttling going on or you're affected by sticky load balancing or other network artefacts.
  • You said it's ok under load of about 10 concurrent requests at 50 requests/second. Gradually increase the load you're putting on the server to determine the point at which it starts to choke. Do this across multiple machines too.
  • Can you log on to Web Sites? Probably not ... see if you can replicate the same issues on a Cloud Service Web Role and analyze from there using Performance Monitor and typical IIS tools to see where the bottleneck is, or if it's even on the machine versus Azure network infrastructure.

I usually run logparser against the iis logs that were generated at the time of the load test and calculate the RPS and latency (time-taken field) off that. This helps isolate the slowness from network, to server processing to actual load test tool reporting.





azure-web-roles