From f913957d6ba4fad2173987710269dad18ded9077 Mon Sep 17 00:00:00 2001 From: Neil Smith Date: Wed, 12 Mar 2014 12:47:05 +0000 Subject: [PATCH] Moved discussion of accents to cipher breaking --- slides/caesar-encipher.html | 43 ------------------------------------- 1 file changed, 43 deletions(-) diff --git a/slides/caesar-encipher.html b/slides/caesar-encipher.html index f45a166..3bc519c 100644 --- a/slides/caesar-encipher.html +++ b/slides/caesar-encipher.html @@ -120,49 +120,6 @@ if __name__ == "__main__": --- -# Accents - -```python ->>> caesar_encipher_letter('é', 1) -``` -What does it produce? - -What should it produce? - -## Unicode, combining codepoints, and normal forms - -Text encodings will bite you when you least expect it. - -* urlencoding is the other pain point. - ---- - -# Five minutes on StackOverflow later... - -```python -def unaccent(text): - """Remove all accents from letters. - It does this by converting the unicode string to decomposed compatibility - form, dropping all the combining accents, then re-encoding the bytes. - - >>> unaccent('hello') - 'hello' - >>> unaccent('HELLO') - 'HELLO' - >>> unaccent('héllo') - 'hello' - >>> unaccent('héllö') - 'hello' - >>> unaccent('HÉLLÖ') - 'HELLO' - """ - return unicodedata.normalize('NFKD', text).\ - encode('ascii', 'ignore').\ - decode('utf-8') -``` - ---- - # Doing all the letters ## Test-first developement -- 2.34.1