X-Git-Url: https://git.njae.me.uk/?a=blobdiff_plain;f=rdoc%2FREADME_rdoc.html;fp=rdoc%2FREADME_rdoc.html;h=d7ac8d3ae28fca9691d04997df7f96ea6875327a;hb=b8f204e08d491bd5185d3d9e94ed98366a359af7;hp=0000000000000000000000000000000000000000;hpb=c8c08d5fafba205c5a8e4138edb0df059a63de36;p=porter2stemmer.git diff --git a/rdoc/README_rdoc.html b/rdoc/README_rdoc.html new file mode 100644 index 0000000..d7ac8d3 --- /dev/null +++ b/rdoc/README_rdoc.html @@ -0,0 +1,204 @@ + + + + + + + + File: README.rdoc [porter2stemmer 1.0.0] + + + + + + + + + + +
+
+
+

+ Home + Classes + Methods +

+
+
+ +
+ + +
+

Files

+ +
+ + +
+

Class Index + [+]

+
+
+ Quicksearch + +
+
+ + + +
+ + +
+
+ +
+

porter2stemmer

+

The Porter 2 stemmer

+

+This is the Porter 2 stemming algorithm, as described at snowball.tartarus.org/algorithms/english/stemmer.html +The original paper is: +

+

+Porter, 1980, “An algorithm for suffix stripping”, +Program, Vol. 14, no. 3, pp 130-137 +

+

Features of this implementation

+

+This stemmer is written in pure Ruby, making it easy to modify for language +variants. For instance, the original Porter stemmer only works for +American English and does not recognise British English’s +’-ise’ as an alternate spelling of ’-ize’. This +implementation has been extended to handle correctly British English. +

+

+This stemmer also features a comprehensive test set of over 29,000 words, +taken from the Porter +2 stemmer website. +

+

Files

+

+Constants for the stemmer are in the Porter2 +module. +

+

+Procedures that implement the stemmer are added to the String class. +

+

+The stemmer algorithm is implemented in the String#porter2_stem procedure. +

+

Internationalisation

+

+There isn’t much, as this is a stemmer that only works for English. +

+

+The gb_english flag to the various procedures allows the stemmer +to treat the British English ’-ise’ the same as the American +English ’-ize’. +

+

Longest suffixes

+

+Several places in the algorithm require matching the longest suffix of a +word. The regexp engine in Ruby 1.9 seems to handle alterntives in regexps +by finding the alternative that matches at the first position in the +string. As we’re only talking about suffixes, that first match is +also the longest suffix. If the regexp engine changes, this behaviour may +change and break the stemmer. +

+

Usage

+

+Call the String#porter2_stem or String#stem methods on a string to +return its stem +

+
+  "consistency".stem       # => "consist"
+  "knitting".stem          # => "knit"
+  "articulated".stem       # => "articul"
+  "nationalize".stem       # => "nation"
+  "nationalise".stem       # => "nationalis"
+  "nationalise".stem(true) # => "nation"
+
+

Author

+

+The Porter 2 stemming algorithm was developed by Martin +Porter. This implementation is by Neil +Smith. +

+

Contributing to porter2stemmer

+ +

Copyright

+

+Copyright © 2011 Neil Smith. See LICENSE.txt for further details. +

+ +
+ +
+

[Validate]

+

Generated with the Darkfish + Rdoc Generator 1.1.6.

+
+ + +