From: Neil Smith Date: Wed, 12 Mar 2014 12:47:05 +0000 (+0000) Subject: Moved discussion of accents to cipher breaking X-Git-Url: https://git.njae.me.uk/?p=cipher-training.git;a=commitdiff_plain;h=f913957d6ba4fad2173987710269dad18ded9077 Moved discussion of accents to cipher breaking --- diff --git a/slides/caesar-encipher.html b/slides/caesar-encipher.html index f45a166..3bc519c 100644 --- a/slides/caesar-encipher.html +++ b/slides/caesar-encipher.html @@ -120,49 +120,6 @@ if __name__ == "__main__": --- -# Accents - -```python ->>> caesar_encipher_letter('é', 1) -``` -What does it produce? - -What should it produce? - -## Unicode, combining codepoints, and normal forms - -Text encodings will bite you when you least expect it. - -* urlencoding is the other pain point. - ---- - -# Five minutes on StackOverflow later... - -```python -def unaccent(text): - """Remove all accents from letters. - It does this by converting the unicode string to decomposed compatibility - form, dropping all the combining accents, then re-encoding the bytes. - - >>> unaccent('hello') - 'hello' - >>> unaccent('HELLO') - 'HELLO' - >>> unaccent('héllo') - 'hello' - >>> unaccent('héllö') - 'hello' - >>> unaccent('HÉLLÖ') - 'HELLO' - """ - return unicodedata.normalize('NFKD', text).\ - encode('ascii', 'ignore').\ - decode('utf-8') -``` - ---- - # Doing all the letters ## Test-first developement