X-Git-Url: https://git.njae.me.uk/?a=blobdiff_plain;f=rdoc%2FREADME_rdoc.html;fp=rdoc%2FREADME_rdoc.html;h=d7ac8d3ae28fca9691d04997df7f96ea6875327a;hb=b8f204e08d491bd5185d3d9e94ed98366a359af7;hp=0000000000000000000000000000000000000000;hpb=c8c08d5fafba205c5a8e4138edb0df059a63de36;p=porter2stemmer.git diff --git a/rdoc/README_rdoc.html b/rdoc/README_rdoc.html new file mode 100644 index 0000000..d7ac8d3 --- /dev/null +++ b/rdoc/README_rdoc.html @@ -0,0 +1,204 @@ + + + + +
+ + ++This is the Porter 2 stemming algorithm, as described at snowball.tartarus.org/algorithms/english/stemmer.html +The original paper is: +
++Porter, 1980, “An algorithm for suffix stripping”, +Program, Vol. 14, no. 3, pp 130-137 +
++This stemmer is written in pure Ruby, making it easy to modify for language +variants. For instance, the original Porter stemmer only works for +American English and does not recognise British English’s +’-ise’ as an alternate spelling of ’-ize’. This +implementation has been extended to handle correctly British English. +
++This stemmer also features a comprehensive test set of over 29,000 words, +taken from the Porter +2 stemmer website. +
++Constants for the stemmer are in the Porter2 +module. +
++Procedures that implement the stemmer are added to the String class. +
++The stemmer algorithm is implemented in the String#porter2_stem procedure. +
++There isn’t much, as this is a stemmer that only works for English. +
++The gb_english flag to the various procedures allows the stemmer +to treat the British English ’-ise’ the same as the American +English ’-ize’. +
++Several places in the algorithm require matching the longest suffix of a +word. The regexp engine in Ruby 1.9 seems to handle alterntives in regexps +by finding the alternative that matches at the first position in the +string. As we’re only talking about suffixes, that first match is +also the longest suffix. If the regexp engine changes, this behaviour may +change and break the stemmer. +
++Call the String#porter2_stem or String#stem methods on a string to +return its stem +
++ "consistency".stem # => "consist" + "knitting".stem # => "knit" + "articulated".stem # => "articul" + "nationalize".stem # => "nation" + "nationalise".stem # => "nationalis" + "nationalise".stem(true) # => "nation" ++
+The Porter 2 stemming algorithm was developed by Martin +Porter. This implementation is by Neil +Smith. +
++Check out the latest master to make sure the feature hasn’t been +implemented or the bug hasn’t been fixed yet +
++Check out the issue tracker to make sure someone already hasn’t +requested it and/or contributed it +
++Fork the project +
++Start a feature/bugfix branch +
++Commit and push until you are happy with your contribution +
++Make sure to add tests for it. This is important so I don’t break it +in a future version unintentionally. +
++Please try not to mess with the Rakefile, version, or history. If you want +to have your own version, or is otherwise necessary, that is fine, but +please isolate to its own commit so I can cherry-pick around it. +
++Copyright © 2011 Neil Smith. See LICENSE.txt for further details. +
+ +Generated with the Darkfish + Rdoc Generator 1.1.6.
+