projects
/
cipher-training.git
/ blobdiff
commit
grep
author
committer
pickaxe
?
search:
re
summary
|
shortlog
|
log
|
commit
|
commitdiff
|
tree
raw
|
inline
| side by side
Removed cipher challenge files
[cipher-training.git]
/
slides
/
word-segmentation.html
diff --git
a/slides/word-segmentation.html
b/slides/word-segmentation.html
index 16fcb0ad8c36c77041e790799046dd580e5128fa..9c3b3092babc6ba5770692c391c193d2d9e39446 100644
(file)
--- a/
slides/word-segmentation.html
+++ b/
slides/word-segmentation.html
@@
-36,6
+36,11
@@
color: #ff6666;
text-shadow: 0 0 20px #333;
padding: 2px 5px;
color: #ff6666;
text-shadow: 0 0 20px #333;
padding: 2px 5px;
+ }
+ .indexlink {
+ position: absolute;
+ bottom: 1em;
+ left: 1em;
}
.float-right {
float: right;
}
.float-right {
float: right;
@@
-54,6
+59,12
@@
---
---
+layout: true
+
+.indexlink[[Index](index.html)]
+
+---
+
# The problem
Ciphertext is re-split into groups to hide word bounaries.
# The problem
Ciphertext is re-split into groups to hide word bounaries.
@@
-118,7
+129,7
@@
Constructor (`__init__`) takes a data file, does all the adding up and taking lo
```python
class Pdist(dict):
def __init__(self, data=[]):
```python
class Pdist(dict):
def __init__(self, data=[]):
- for key, count in data
2
:
+ for key, count in data:
...
self.total = ...
def __missing__(self, key):
...
self.total = ...
def __missing__(self, key):
@@
-138,9
+149,9
@@
def Pwords(words):
```python
>>> 'hello' in Pw.keys() >>> Pwords(['hello'])
True -4.25147684171819
```python
>>> 'hello' in Pw.keys() >>> Pwords(['hello'])
True -4.25147684171819
->>> 'inigo' in Pw
.keys()
>>> Pwords(['hello', 'my'])
+>>> 'inigo' in Pw
>>> Pwords(['hello', 'my'])
True -6.995724679281423
True -6.995724679281423
->>> 'blj' in Pw
.keys()
>>> Pwords(['hello', 'my', 'name'])
+>>> 'blj' in Pw
>>> Pwords(['hello', 'my', 'name'])
False -10.098177451501074
>>> Pw['hello'] >>> Pwords(['hello', 'my', 'name', 'is'])
-4.25147684171819 -12.195018236240843
False -10.098177451501074
>>> Pw['hello'] >>> Pwords(['hello', 'my', 'name', 'is'])
-4.25147684171819 -12.195018236240843
@@
-166,9
+177,9
@@
To segment a string:
return the split with highest score
```
return the split with highest score
```
-Indexing pulls out letters. `'sometext'[0]` = 's' ; `'
keyword'[3]` = 'e' ; `'keyword
'[-1]` = 't'
+Indexing pulls out letters. `'sometext'[0]` = 's' ; `'
sometext'[3]` = 'e' ; `'sometext
'[-1]` = 't'
-Slices pulls out substrings. `'
keyword'[1:4]` = 'ome' ; `'keyword'[:3]` = 'som' ; `'keyword
'[5:]` = 'ext'
+Slices pulls out substrings. `'
sometext'[1:4]` = 'ome' ; `'sometext'[:3]` = 'som' ; `'sometext
'[5:]` = 'ext'
`range()` will sweep across the string
`range()` will sweep across the string