Tidied up the gem requirements and fixed the use of Bundler
[porter2stemmer.git] / rdoc / README_rdoc.html
1 <?xml version="1.0" encoding="utf-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4
5 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
6 <head>
7 <meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
8
9 <title>File: README.rdoc [porter2stemmer 1.0.0]</title>
10
11 <link type="text/css" media="screen" href="./rdoc.css" rel="stylesheet" />
12
13 <script src="./js/jquery.js" type="text/javascript"
14 charset="utf-8"></script>
15 <script src="./js/thickbox-compressed.js" type="text/javascript"
16 charset="utf-8"></script>
17 <script src="./js/quicksearch.js" type="text/javascript"
18 charset="utf-8"></script>
19 <script src="./js/darkfish.js" type="text/javascript"
20 charset="utf-8"></script>
21 </head>
22
23 <body class="file">
24 <div id="metadata">
25 <div id="home-metadata">
26 <div id="home-section" class="section">
27 <h3 class="section-header">
28 <a href="./index.html">Home</a>
29 <a href="./index.html#classes">Classes</a>
30 <a href="./index.html#methods">Methods</a>
31 </h3>
32 </div>
33 </div>
34
35 <div id="project-metadata">
36
37
38 <div id="fileindex-section" class="section project-section">
39 <h3 class="section-header">Files</h3>
40 <ul>
41
42 <li class="file"><a href="./README_rdoc.html">README.rdoc</a></li>
43
44 </ul>
45 </div>
46
47
48 <div id="classindex-section" class="section project-section">
49 <h3 class="section-header">Class Index
50 <span class="search-toggle"><img src="./images/find.png"
51 height="16" width="16" alt="[+]"
52 title="show/hide quicksearch" /></span></h3>
53 <form action="#" method="get" accept-charset="utf-8" class="initially-hidden">
54 <fieldset>
55 <legend>Quicksearch</legend>
56 <input type="text" name="quicksearch" value=""
57 class="quicksearch-field" />
58 </fieldset>
59 </form>
60
61 <ul class="link-list">
62
63 <li><a href="./Porter2.html">Porter2</a></li>
64
65 <li><a href="./String.html">String</a></li>
66
67 </ul>
68 <div id="no-class-search-results" style="display: none;">No matching classes.</div>
69 </div>
70
71
72 </div>
73 </div>
74
75 <div id="documentation">
76 <h1>porter2stemmer</h1>
77 <h2>The Porter 2 stemmer</h2>
78 <p>
79 This is the Porter 2 stemming algorithm, as described at <a
80 href="http://snowball.tartarus.org/algorithms/english/stemmer.html">snowball.tartarus.org/algorithms/english/stemmer.html</a>
81 The original paper is:
82 </p>
83 <p>
84 Porter, 1980, &#8220;An algorithm for suffix stripping&#8221;,
85 <em>Program</em>, Vol. 14, no. 3, pp 130-137
86 </p>
87 <h2>Features of this implementation</h2>
88 <p>
89 This stemmer is written in pure Ruby, making it easy to modify for language
90 variants. For instance, the original Porter stemmer only works for
91 American English and does not recognise British English&#8217;s
92 &#8217;-ise&#8217; as an alternate spelling of &#8217;-ize&#8217;. This
93 implementation has been extended to handle correctly British English.
94 </p>
95 <p>
96 This stemmer also features a comprehensive test set of over 29,000 words,
97 taken from the <a
98 href="http://snowball.tartarus.org/algorithms/english/stemmer.html">Porter
99 2 stemmer website</a>.
100 </p>
101 <h2>Files</h2>
102 <p>
103 Constants for the stemmer are in the <a href="Porter2.html">Porter2</a>
104 module.
105 </p>
106 <p>
107 Procedures that implement the stemmer are added to the <a
108 href="String.html">String</a> class.
109 </p>
110 <p>
111 The stemmer algorithm is implemented in the <a
112 href="String.html#method-i-porter2_stem">String#porter2_stem</a> procedure.
113 </p>
114 <h2>Internationalisation</h2>
115 <p>
116 There isn&#8217;t much, as this is a stemmer that only works for English.
117 </p>
118 <p>
119 The <tt>gb_english</tt> flag to the various procedures allows the stemmer
120 to treat the British English &#8217;-ise&#8217; the same as the American
121 English &#8217;-ize&#8217;.
122 </p>
123 <h2>Longest suffixes</h2>
124 <p>
125 Several places in the algorithm require matching the longest suffix of a
126 word. The regexp engine in Ruby 1.9 seems to handle alterntives in regexps
127 by finding the alternative that matches at the first position in the
128 string. As we&#8217;re only talking about suffixes, that first match is
129 also the longest suffix. If the regexp engine changes, this behaviour may
130 change and break the stemmer.
131 </p>
132 <h2>Usage</h2>
133 <p>
134 Call the <a
135 href="String.html#method-i-porter2_stem">String#porter2_stem</a> or <a
136 href="String.html#method-i-stem">String#stem</a> methods on a string to
137 return its stem
138 </p>
139 <pre>
140 &quot;consistency&quot;.stem # =&gt; &quot;consist&quot;
141 &quot;knitting&quot;.stem # =&gt; &quot;knit&quot;
142 &quot;articulated&quot;.stem # =&gt; &quot;articul&quot;
143 &quot;nationalize&quot;.stem # =&gt; &quot;nation&quot;
144 &quot;nationalise&quot;.stem # =&gt; &quot;nationalis&quot;
145 &quot;nationalise&quot;.stem(true) # =&gt; &quot;nation&quot;
146 </pre>
147 <h2>Author</h2>
148 <p>
149 The Porter 2 stemming algorithm was developed by <a
150 href="http://snowball.tartarus.org/algorithms/english/stemmer.html">Martin
151 Porter</a>. This implementation is by <a href="http://www.njae.me.uk">Neil
152 Smith</a>.
153 </p>
154 <h2>Contributing to porter2stemmer</h2>
155 <ul>
156 <li><p>
157 Check out the latest master to make sure the feature hasn&#8217;t been
158 implemented or the bug hasn&#8217;t been fixed yet
159 </p>
160 </li>
161 <li><p>
162 Check out the issue tracker to make sure someone already hasn&#8217;t
163 requested it and/or contributed it
164 </p>
165 </li>
166 <li><p>
167 Fork the project
168 </p>
169 </li>
170 <li><p>
171 Start a feature/bugfix branch
172 </p>
173 </li>
174 <li><p>
175 Commit and push until you are happy with your contribution
176 </p>
177 </li>
178 <li><p>
179 Make sure to add tests for it. This is important so I don&#8217;t break it
180 in a future version unintentionally.
181 </p>
182 </li>
183 <li><p>
184 Please try not to mess with the Rakefile, version, or history. If you want
185 to have your own version, or is otherwise necessary, that is fine, but
186 please isolate to its own commit so I can cherry-pick around it.
187 </p>
188 </li>
189 </ul>
190 <h2>Copyright</h2>
191 <p>
192 Copyright &#169; 2011 Neil Smith. See LICENSE.txt for further details.
193 </p>
194
195 </div>
196
197 <div id="validator-badges">
198 <p><small><a href="http://validator.w3.org/check/referer">[Validate]</a></small></p>
199 <p><small>Generated with the <a href="http://deveiate.org/projects/Darkfish-Rdoc/">Darkfish
200 Rdoc Generator</a> 1.1.6</small>.</p>
201 </div>
202 </body>
203 </html>
204