strategy - get-azurermvm tags




Why does Windows Azure not scale? (4)

I am trying to scale websites on Widows Azure. So far I‘ve tested Wordpress, Ghost (Blog) and a plain HTML site and it’s all the same: If I scale them up (add instances), they don’t get any faster. I am sure I must do something wrong… This is what I did:

  1. I created a new shared website, with a plain HTML Bootstrap template on it. http://demobootstrapsite.azurewebsites.net/
  2. Then I installed ab.exe from the Apache Project on a hosted bare metal server (4 Cores, 12 GB RAM, 100 MBit)

I ran the test two times. The first time with a single shared instance and the second time with two shared instances using this command:

ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/

This means ab.exe is going to create 10000 requests with 100 parallel threads.

I expected the response times of the test with two shared instances to be significantly lower than the response times with just one shared instance. But the mean time per request even rised a bit from 1452.519 ms with one shared instance to 1460.631 ms with two shared instances. Later I even ran the site on 8 shared instances with no effect at all. My first thought was that maybe the shared instances are the problem. So I put the site on a standard VM and ran the test again. But the problems remain the same. Also adding more instances didn’t make the site any faster (even a bit slower).

Later I‘ve whatched a Video with Scott Hanselman and Stefan Schackow in which they‘ve explained the Azure Scaling features. Stefan says that Azure has a kind of „sticky loadbalancing“ which will redirect a client always to the same instance/VM to avoid compatibility problems with statefull applications. So I‘ve checked the WebServer logs and I found a Logfile for every instance with about the same size. Usually that means that every instance was used during the test..

PS: During the test run I‘ve checked the response time oft the website from my local computer (from a different network than the server) and the response times were about 1.5s.

Here are the test results:

###################################### 
1 instance result
###################################### 

PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests


Server Software:        Microsoft-IIS/8.0
Server Hostname:        demobootstrapsite.azurewebsites.net
Server Port:            80

Document Path:          /
Document Length:        16396 bytes

Concurrency Level:      100
Time taken for tests:   145.252 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      168800000 bytes
HTML transferred:       163960000 bytes
Requests per second:    68.85 [#/sec] (mean)
Time per request:       1452.519 [ms] (mean)
Time per request:       14.525 [ms] (mean, across all concurrent requests)
Transfer rate:          1134.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   14   8.1     16      78
Processing:    47 1430  93.9   1435    1622
Waiting:       16  705 399.3    702    1544
Total:         62 1445  94.1   1451    1638

Percentage of the requests served within a certain time (ms)
  50%   1451
  66%   1466
  75%   1482
  80%   1498
  90%   1513
  95%   1529
  98%   1544
  99%   1560
 100%   1638 (longest request)

###################################### 
2 instances result
###################################### 

PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests


Server Software:        Microsoft-IIS/8.0
Server Hostname:        demobootstrapsite.azurewebsites.net
Server Port:            80

Document Path:          /
Document Length:        16396 bytes

Concurrency Level:      100
Time taken for tests:   146.063 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      168800046 bytes
HTML transferred:       163960000 bytes
Requests per second:    68.46 [#/sec] (mean)
Time per request:       1460.631 [ms] (mean)
Time per request:       14.606 [ms] (mean, across all concurrent requests)
Transfer rate:          1128.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   14   8.1     16      78
Processing:    31 1439  92.8   1451    1607
Waiting:       16  712 402.5    702    1529
Total:         47 1453  92.9   1466    1622

Percentage of the requests served within a certain time (ms)
  50%   1466
  66%   1482
  75%   1482
  80%   1498
  90%   1513
  95%   1529
  98%   1544
  99%   1560
 100%   1622 (longest request)

"Scaling" the website in terms of resources adds more capacity to accept more requests, and won't increase the speed at which a single capacity instance can perform when not overloaded.

For example; assume a Small VM can accept 100 requests per second, processing each request at 1000ms, (and if it was 101 requests per second, each request would start to slow down to say 1500ms) then scaling to more Small VMs won't increase the speed at which a single request can be processed, it just raises us to accepting 200 requests per second under 1000ms each (as now both machines are not overloaded).

For per-request performance; the code itself (and CPU performance of the Azure VM) will impact how quickly a single request can be executed.


Before you load test the websites, you should do a baseline test with a single instance, say with 10 concurrent threads, to check how the website handles when not under load. Then use this base line to understand how the websites behave under load.

For example, if the baseline shows the website responds in 1.5s to requests when not under load, and again with 1.5s under load, then this means the website is able to handle the load easily. If under load the website takes 3-4s using a single instance, then this means it doesn't handle the load so well - try to add another instance and check if the response time improves.



I usually run logparser against the iis logs that were generated at the time of the load test and calculate the RPS and latency (time-taken field) off that. This helps isolate the slowness from network, to server processing to actual load test tool reporting.





azure-web-roles