Got the gem layout working
[porter2stemmer.git] / doc / Readme_rdoc.html
1 <?xml version="1.0" encoding="utf-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4
5 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
6 <head>
7 <meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
8
9 <title>File: Readme.rdoc [RDoc Documentation]</title>
10
11 <link type="text/css" media="screen" href="./rdoc.css" rel="stylesheet" />
12
13 <script src="./js/jquery.js" type="text/javascript"
14 charset="utf-8"></script>
15 <script src="./js/thickbox-compressed.js" type="text/javascript"
16 charset="utf-8"></script>
17 <script src="./js/quicksearch.js" type="text/javascript"
18 charset="utf-8"></script>
19 <script src="./js/darkfish.js" type="text/javascript"
20 charset="utf-8"></script>
21 </head>
22
23 <body class="file">
24 <div id="metadata">
25 <div id="home-metadata">
26 <div id="home-section" class="section">
27 <h3 class="section-header">
28 <a href="./index.html">Home</a>
29 <a href="./index.html#classes">Classes</a>
30 <a href="./index.html#methods">Methods</a>
31 </h3>
32 </div>
33 </div>
34
35 <div id="project-metadata">
36
37
38 <div id="fileindex-section" class="section project-section">
39 <h3 class="section-header">Files</h3>
40 <ul>
41
42 <li class="file"><a href="./Readme_rdoc.html">Readme.rdoc</a></li>
43
44 </ul>
45 </div>
46
47
48 <div id="classindex-section" class="section project-section">
49 <h3 class="section-header">Class Index
50 <span class="search-toggle"><img src="./images/find.png"
51 height="16" width="16" alt="[+]"
52 title="show/hide quicksearch" /></span></h3>
53 <form action="#" method="get" accept-charset="utf-8" class="initially-hidden">
54 <fieldset>
55 <legend>Quicksearch</legend>
56 <input type="text" name="quicksearch" value=""
57 class="quicksearch-field" />
58 </fieldset>
59 </form>
60
61 <ul class="link-list">
62
63 <li><a href="./Porter2.html">Porter2</a></li>
64
65 <li><a href="./String.html">String</a></li>
66
67 <li><a href="./TestPorter2.html">TestPorter2</a></li>
68
69 </ul>
70 <div id="no-class-search-results" style="display: none;">No matching classes.</div>
71 </div>
72
73
74 </div>
75 </div>
76
77 <div id="documentation">
78 <h2>The Porter 2 stemmer</h2>
79 <p>
80 This is the Porter 2 stemming algorithm, as described at <a
81 href="http://snowball.tartarus.org/algorithms/english/stemmer.html">snowball.tartarus.org/algorithms/english/stemmer.html</a>
82 The original paper is:
83 </p>
84 <p>
85 Porter, 1980, &#8220;An algorithm for suffix stripping&#8221;,
86 <em>Program</em>, Vol. 14, no. 3, pp 130-137
87 </p>
88 <h2>Features of this implementation</h2>
89 <p>
90 This stemmer is written in pure Ruby, making it easy to modify for language
91 variants. For instance, the original Porter stemmer only works for
92 American English and does not recognise British English&#8217;s
93 &#8217;-ise&#8217; as an alternate spelling of &#8217;-ize&#8217;. This
94 implementation has been extended to handle correctly British English.
95 </p>
96 <p>
97 This stemmer also features a comprehensive test set of over 29,000 words,
98 taken from the <a
99 href="http://snowball.tartarus.org/algorithms/english/stemmer.html">Porter
100 2 stemmer website</a>.
101 </p>
102 <h2>Files</h2>
103 <p>
104 Constants for the stemmer are in the <a href="Porter2.html">Porter2</a>
105 module.
106 </p>
107 <p>
108 Procedures that implement the stemmer are added to the <a
109 href="String.html">String</a> class.
110 </p>
111 <p>
112 The stemmer algorithm is implemented in the <a
113 href="String.html#method-i-porter2_stem">String#porter2_stem</a> procedure.
114 </p>
115 <h2>Internationalisation</h2>
116 <p>
117 There isn&#8217;t much, as this is a stemmer that only works for English.
118 </p>
119 <p>
120 The <tt>gb_english</tt> flag to the various procedures allows the stemmer
121 to treat the British English &#8217;-ise&#8217; the same as the American
122 English &#8217;-ize&#8217;.
123 </p>
124 <h2>Longest suffixes</h2>
125 <p>
126 Several places in the algorithm require matching the longest suffix of a
127 word. The regexp engine in Ruby 1.9 seems to handle alterntives in regexps
128 by finding the alternative that matches at the first position in the
129 string. As we&#8217;re only talking about suffixes, that first match is
130 also the longest suffix. If the regexp engine changes, this behaviour may
131 change and break the stemmer.
132 </p>
133 <h2>Usage</h2>
134 <p>
135 Call the <a
136 href="String.html#method-i-porter2_stem">String#porter2_stem</a> or <a
137 href="String.html#method-i-stem">String#stem</a> methods on a string to
138 return its stem
139 </p>
140 <pre>
141 &quot;consistency&quot;.stem # =&gt; &quot;consist&quot;
142 &quot;knitting&quot;.stem # =&gt; &quot;knit&quot;
143 &quot;articulated&quot;.stem # =&gt; &quot;articul&quot;
144 &quot;nationalize&quot;.stem # =&gt; &quot;nation&quot;
145 &quot;nationalise&quot;.stem # =&gt; &quot;nationalis&quot;
146 &quot;nationalise&quot;.stem(true) # =&gt; &quot;nation&quot;
147 </pre>
148 <h2>Author</h2>
149 <p>
150 The Porter 2 stemming algorithm was developed by <a
151 href="http://snowball.tartarus.org/algorithms/english/stemmer.html">Martin
152 Porter</a>. This implementation is by <a href="http://www.njae.me.uk">Neil
153 Smith</a>.
154 </p>
155
156 </div>
157
158 <div id="validator-badges">
159 <p><small><a href="http://validator.w3.org/check/referer">[Validate]</a></small></p>
160 <p><small>Generated with the <a href="http://deveiate.org/projects/Darkfish-Rdoc/">Darkfish
161 Rdoc Generator</a> 1.1.6</small>.</p>
162 </div>
163 </body>
164 </html>
165