projects
/
cipher-training.git
/ blobdiff
commit
grep
author
committer
pickaxe
?
search:
re
summary
|
shortlog
|
log
|
commit
|
commitdiff
|
tree
raw
|
inline
| side by side
Made a few tweaks
[cipher-training.git]
/
slides
/
word-segmentation.html
diff --git
a/slides/word-segmentation.html
b/slides/word-segmentation.html
index 35721ab3fea6fc8529a893cb9d98f06ca8eb7b8d..6215255ca3c4825937d0d1177e4a54b64bce6b23 100644
(file)
--- a/
slides/word-segmentation.html
+++ b/
slides/word-segmentation.html
@@
-129,7
+129,7
@@
Constructor (`__init__`) takes a data file, does all the adding up and taking lo
```python
class Pdist(dict):
def __init__(self, data=[]):
```python
class Pdist(dict):
def __init__(self, data=[]):
- for key, count in data
2
:
+ for key, count in data:
...
self.total = ...
def __missing__(self, key):
...
self.total = ...
def __missing__(self, key):
@@
-177,9
+177,9
@@
To segment a string:
return the split with highest score
```
return the split with highest score
```
-Indexing pulls out letters. `'sometext'[0]` = 's' ; `'
keyword'[3]` = 'e' ; `'keyword
'[-1]` = 't'
+Indexing pulls out letters. `'sometext'[0]` = 's' ; `'
sometext'[3]` = 'e' ; `'sometext
'[-1]` = 't'
-Slices pulls out substrings. `'
keyword'[1:4]` = 'ome' ; `'keyword'[:3]` = 'som' ; `'keyword
'[5:]` = 'ext'
+Slices pulls out substrings. `'
sometext'[1:4]` = 'ome' ; `'sometext'[:3]` = 'som' ; `'sometext
'[5:]` = 'ext'
`range()` will sweep across the string
`range()` will sweep across the string