python - servers - load balancing django nginx
Does Django scale? (20)
I'm building a web application with
Django. The reasons I chose
- I wanted to work with free/open-source tools.
- I like
Pythonand feel it's a long-term language, whereas regarding
RubyI wasn't sure, and
PHPseemed like a huge hassle to learn.
- I'm building a prototype for an idea and wasn't thinking too much about the future. Development speed was the main factor, and I already knew
- I knew the migration to
Google App Enginewould be easier should I choose to do so in the future.
- I heard
Now that I'm getting closer to thinking about publishing my work, I start being concerned about scale. The only information I found about the scaling capabilities of
Django is provided by the
Django team (I'm not saying anything to disregard them, but this is clearly not objective information...).
- What's the "largest" site that's built on
Djangotoday? (I measure size mostly by user traffic)
Djangodeal with 100,000 users daily, each visiting the site for a couple of hours?
- Could a site like
StackOverflowrun on Django?
What's the "largest" site that's built on Django today? (I measure size mostly by user traffic)
In the US, Mahalo. I'm told they handle roughly 10 million uniques a month.
Abroad, the Globo network (a network of news, sports, and entertainment sites in Brazil); Alexa ranks them in to top 100 globally (around 80th currently).
Other notable Django users include PBS, National Geographic, Discovery, NASA (actually a number of different divisions within NASA), and the Library of Congress.
Can Django deal with 100k users daily, each visiting the site for a couple of hours?
Yes -- but only if you've written your application right, and if you've got enough hardware. Django's not a magic bullet.
Could a site like run on Django?
Yes (but see above).
Technology-wise, easily: see soclone for one attempt. Traffic-wise, compete pegs at under 1 million uniques per month. I can name at least dozen Django sites with more traffic than SO.
"What are the largest sites built on Django today?"
There isn't any single place that collects information about traffic on Django built sites, so I'll have to take a stab at it using data from various locations. First, we have a list of Django sites on the front page of the main Django project page and then a list of Django built sites at djangosites.org. Going through the lists and picking some that I know have decent traffic we see:
pownce.com (no longer active): alexa rank about 65k. Mike Malone of Pownce, in his EuroDjangoCon presentation on Scaling Django Web Apps says "hundreds of hits per second". This is a very good presentation on how to scale Django, and makes some good points including (current) shortcomings in Django scalability.
HP had a site built with Django 1.5: ePrint center. However, as for novemer/2015 the entire website was migrated and this link is just a redirect. This website was a world-wide service attending subscription to Instant Ink and related services HP offered (*).
"Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?"
Yes, see above.
"Could a site like run on Django?"
My gut feeling is yes but, as others answered and Mike Malone mentions in his presentation, database design is critical. Strong proof might also be found at www.cnprog.com if we can find any reliable traffic stats. Anyway, it's not just something that will happen by throwing together a bunch of Django models :)
There are, of course, many more sites and bloggers of interest, but I have got to stop somewhere!
Blog post about Using Django to build high-traffic site michaelmoore.com described as a top 10,000 website. Quantcast stats and compete.com stats.
(*) The author of the edit, including such reference, used to work as outsourced developer in that project.
Another example is rasp.yandex.ru, Russian transport timetable service. Its attendance satisfies your requirements.
See further details as mentioned below:
It’s not uncommon to hear people say “Django doesn’t scale”. Depending on how you look at it, the statement is either completely true or patently false. Django, on its own, doesn’t scale.
The same can be said of Ruby on Rails, Flask, PHP, or any other language used by a database-driven dynamic website.
The good news, however, is that Django interacts beautifully with a suite of caching and load balancing tools that will allow it to scale to as much traffic as you can throw at it.
Contrary to what you may have read online, it can do so without replacing core components often labeled as “too slow” such as the database ORM or the template layer.
Disqus serves over 8 billion page views per month. Those are some huge numbers.
These teams have proven Django most certainly does scale. Our experience here at Lincoln Loop backs it up.
We’ve built big Django sites capable of spending the day on the Reddit homepage without breaking a sweat.
Django’s scaling success stories are almost too numerous to list at this point.
It backs Disqus, Instagram, and Pinterest. Want some more proof? Instagram was able to sustain over 30 million users on Django with only 3 engineers (2 of which had no back-end development
Even-though there have been a lot of great answers here, I just feel like pointing out, that nobody have put emphasis on..
It depends on the application
If you application is light on writes, as in you are reading a lot more data from the DB than you are writing. Then scaling django should be fairly trivial, heck, it comes with some fairly decent output/view caching straight out of the box. Make use of that, and say, redis as a cache provider, put a load balancer in front of it, spin up n-instances and you should be able to deal with a VERY large amount of traffic.
Now, if you have to do thousands of complex writes a second? Different story. Is Django going to be a bad choice? Well, not necessarily, depends on how you architect your solution really, and also, what your requirements are.
Just my two cents :-)
Here's a list of some relatively high-profile things built in Django:
The Guardian's "Investigate your MP's expenses" app
Politifact.com (here's a Blog post talking about the (positive) experience. Site won a Pulitzer.
NY Times' Represent app
Peter Harkins, one of the programmers over at WaPo, lists all the stuff they’ve built with Django on his blog
It's a little old, but someone from the LA Times gave a basic overview of why they went with Django.
The Onion's AV Club was recently moved from (I think Drupal) to Django.
I imagine a number of these these sites probably gets well over 100k+ hits per day. Django can certainly do 100k hits/day and more. But YMMV in getting your particular site there depending on what you're building.
There are caching options at the Django level (for example caching querysets and views in memcached can work wonders) and beyond (upstream caches like Squid). Database Server specifications will also be a factor (and usually the place to splurge), as is how well you've tuned it. Don't assume, for example, that Django's going set up indexes properly. Don't assume that the default PostgreSQL or MySQL configuration is the right one.
Furthermore, you always have the option of having multiple application servers running Django if that is the slow point, with a software or hardware load balancer in front.
Finally, are you serving static content on the same server as Django? Are you using Apache or something like nginx or lighttpd? Can you afford to use a CDN for static content? These are things to think about, but it's all very speculative. 100k hits/day isn't the only variable: how much do you want to spend? How much expertise do you have managing all these components? How much time do you have to pull it all together?
I don't think the issue is really about Django scaling.
I really suggest you look into your architecture that's what is going to help you with you scaling needs.If you get that wrong there is not point on how well Django performs. Performance != Scale. You can have a system that has amazing performance but does not scale and vice versa.
Is your application database bound? If it is then your scale issues lie there as well. How are you planning on interacting with the database from Django? What happens when you database cannot process requests as fast as Django accepts them? What happens when your data outgrows one physical machine. You need to account for how you plan on dealing with those circumstances.
Moreover, What happens when your traffic outgrows one app server? how you handle sessions in this case can be tricky, more often than not you'd probably require a shared nothing architecture. Again that depends on your application.
In short Languages is not what determines scale, a language is responsible for performance(again depending on your applications different languages perform differently). It is your design and architecture that makes scaling a reality.
I hope it helps, would be glad to help further if you have questions.
I have been using Django for over a year now, and am very impressed with how it manages to combine modularity, scalability and speed of development. Like with any technology, it comes with a learning curve. However, this learning curve is made a lot less steep by the excellent documentation from the Django community. Django has been able to handle everything I have thrown at it really well. It looks like it will be able to scale well into the future.
BidRodeo Penny Auctions is a moderately sized Django powered website. It is a very dynamic website and does handle a good number of page views a day.
I was at the EuroDjangoCon conference the other week, and this was the subject of a couple of talks - including from the founders of what was the largest Django-based site, Pownce (slides from one talk here). The main message is that it's not Django you have to worry about, but things like proper caching, load balancing, database optimisation, etc.
Django actually has hooks for most of those things - caching, in particular, is made very easy.
I'm sure you're looking for a more solid answer, but the most obvious objective validation I can think of is that Google pushes Django for use with its App Engine framework. If anybody knows about and deals with scalability on a regular basis, it's Google. From what I've read, the most limiting factor seems to be the database back-end, which is why Google uses their own...
If you want to use Open source then there are many options for you. But python is best among them it have many libraries and a super awesome community. These are reasons which might change your mind:
Python is very good but it is a interpreted language which makes it slow. But many accelerator and caching services are there which partly solve this problem.
If you are thinking about rapid development then Ruby on Rails is best among all. The main motto of this(ROR) framework is to give a comfortable experience to the developers. If you compare Ruby and Python both have nearly same syntaxes.
Google App Engine is very good service but it will bind you in its scope, you don't get chance to experiment new things. Instead of it you can use Digital Ocean cloud which will only take $5/Month charge for its simplest droplet. Heroku is another free service where you can deploy your product.
Yes! Yes! What you heard is totally correct but here are some examples which are using other technologies
- Rails: Github, Twitter(previously), Shopify, Airbnb, Slideshare, Heroku etc.
- PHP: Facebook, Wikipedia, Flickr, Yahoo, Tumbler, Mailchimp etc.
Conclusion is a framework or language won't do everything for you. A better architecture, designing and strategy will give you a scalable website. Instagram is biggest example,this small team is managing such huge data. Here is one blog about its architecture must read it.
My experience with Django is minimal but I do remember in The Django Book they have a chapter where they interview people running some of the larger Django applications. Here is a link. I guess it could provide some insights.
It says curse.com is one of the largest Django applications with around 60-90 million page views in a month.
Note that if you're expecting 100K users per day, that are active for hours at a time (meaning max of 20K+ concurrent users), you're going to need A LOT of servers. SO has ~15,000 registered users, and most of them are probably not active daily. While the bulk of traffic comes from unregistered users, I'm guessing that very few of them stay on the site more than a couple minutes (i.e. they follow google search results then leave).
For that volume, expect at least 30 servers ... which is still a rather heavy 1,000 concurrent users per server.
Playing devil's advocate a little bit:
You should check the DjangoCon 2008 Keynote, delivered by Cal Henderson, titled "Why I hate Django" where he pretty much goes over everything Django is missing that you might want to do in a high traffic website. At the end of the day you have to take this all with an open mind because it is perfectly possible to write Django apps that scale, but I thought it was a good presentation and relevant to your question.
Spreading the tasks evenly, in short optimizing each and every aspect including DBs, Files, Images, CSS etc. and balancing the load with several other resources is necessary once your site/application starts growing. OR you make some more space for it to grow. Implementation of latest technologies like CDN, Cloud are must with huge sites. Just developing and tweaking an application won't give your the cent percent satisfation, other components also play an important role.
The problem is not to know if django can scale or not.
The right way is to understand and know which are the network design patterns and tools to put under your django/symfony/rails project to scale well.
Some ideas can be :
- Inversed proxy. Ex : Nginx, Varnish
- Memcache Session. Ex : Redis
- Clusterization on your project and db for load balancing and fault tolerance : Ex : Docker
- Use third party to store assets. Ex : Amazon S3
Hope it help a bit. This is my tiny rock to the mountain.
Today we use many web apps and sites for our needs. Most of them are highly useful. I will show you some of them used by python or django.
The Washington Post’s website is a hugely popular online news source to accompany their daily paper. Its’ huge amount of views and traffic can be easily handled by the Django web framework.
Washington Post - 52.2 million unique visitors (March, 2015)
The National Aeronautics and Space Administration’s official website is the place to find news, pictures, and videos about their ongoing space exploration. This Django website can easily handle huge amounts of views and traffic.
2 million visitors monthly
The Guardian is a British news and media website owned by the Guardian Media Group. It contains nearly all of the content of the newspapers The Guardian and The Observer. This huge data is handled by Django.
The Guardian (commenting system) - 41,6 million unique visitors (October, 2014)
We all know YouTube as the place to upload cat videos and fails. As one of the most popular websites in existence, it provides us with endless hours of video entertainment. The Python programming language powers it and the features we love.
DropBox started the online document storing revolution that has become part of daily life. We now store almost everything in the cloud. Dropbox allows us to store, sync, and share almost anything using the power of Python.
Survey Monkey is the largest online survey company. They can handle over one million responses every day on their rewritten Python website.
Quora is the number one place online to ask a question and receive answers from a community of individuals. On their Python website relevant results are answered, edited, and organized by these community members.
A majority of the code for Bitly URL shortening services and analytics are all built with Python. Their service can handle hundreds of millions of events per day.
Reddit is known as the front page of the internet. It is the place online to find information or entertainment based on thousands of different categories. Posts and links are user generated and are promoted to the top through votes. Many of Reddit’s capabilities rely on Python for their functionality.
Hipmunk is an online consumer travel site that compares the top travel sites to find you the best deals. This Python website’s tools allow you to find the cheapest hotels and flights for your destination.
Yes it can. It could be Django with Python or Ruby on Rails. It will still scale.
There are few different techniques. First, caching is not scaling. You could have several application servers balanced with nginx as the front in addition to hardware balancer(s). To scale on the database side you can go pretty far with read slave in MySQL / PostgreSQL if you go the RDBMS way.
Some good examples of heavy traffic websites in Django could be:
- Pownce when they were still there.
- Discus (generic shared comments manager)
- All the newspaper related websites: Washington Post and others.
You can feel safe.
You can definitely run a high-traffic site in Django. Check out this pre-Django 1.0 but still relevant post here: http://menendez.com/blog/launching-high-performance-django-site/