[python] How to make Django slugify work properly with Unicode strings?
The Mozilla website team has been working on an implementation : https://github.com/mozilla/unicode-slugify sample code at http://davedash.com/2011/03/24/how-we-slug-at-mozilla/
What can I do to prevent
slugify filter from stripping out non-ASCII alphanumeric characters? (I'm using Django 1.0.2)
cnprog.com has Chinese characters in question URLs, so I looked in their code. They are not using
slugify in templates, instead they're calling this method in
Question model to get permalinks
def get_absolute_url(self): return '%s%s' % (reverse('question', args=[self.id]), self.title)
Are they slugifying the URLs or not?
I'm afraid django's definition of slug means ascii, though the django docs don't explicitly state this. This is the source of the defaultfilters for the slugify... you can see that the values are being converted to ascii, with the 'ignore' option in case of errors:
import unicodedata value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore') value = unicode(re.sub('[^\w\s-]', '', value).strip().lower()) return mark_safe(re.sub('[-\s]+', '-', value))
Based on that, I'd guess that cnprog.com is not using an official
slugify function. You may wish to adapt the django snippet above if you want a different behaviour.
Having said that, though, the RFC for URLs does state that non-us-ascii characters (or, more specifically, anything other than the alphanumerics and $-_.+!*'()) should be encoded using the %hex notation. If you look at the actual raw GET request that your browser sends (say, using Firebug), you'll see that the chinese characters are in fact encoded before being sent... the browser just makes it look pretty in the display. I suspect this is why slugify insists on ascii only, fwiw.
You might want to look at: https://github.com/un33k/django-uuslug
It will take care of both "U"s for you. U in unique and U in Unicode.
It will do the job for you hassle free.