From: Neil Smith Date: Thu, 30 Jun 2011 15:43:26 +0000 (+0100) Subject: Final pre-release commit X-Git-Tag: v1.0.0~1 X-Git-Url: https://git.njae.me.uk/?p=porter2stemmer.git;a=commitdiff_plain;h=b8f204e08d491bd5185d3d9e94ed98366a359af7 Final pre-release commit --- diff --git a/README.md b/README.md new file mode 100644 index 0000000..2150b26 --- /dev/null +++ b/README.md @@ -0,0 +1,58 @@ +The Porter 2 stemmer +==================== +This is the Porter 2 stemming algorithm, as described at +http://snowball.tartarus.org/algorithms/english/stemmer.html +The original paper is: + +Porter, 1980, "An algorithm for suffix stripping", _Program_, Vol. 14, +no. 3, pp 130-137 + +Features of this implementation +=============================== +This stemmer is written in pure Ruby, making it easy to modify for language variants. +For instance, the original Porter stemmer only works for American English and does +not recognise British English's '-ise' as an alternate spelling of '-ize'. This +implementation has been extended to handle correctly British English. + +This stemmer also features a comprehensive test set of over 29,000 words, taken from the +[Porter 2 stemmer website](http://snowball.tartarus.org/algorithms/english/stemmer.html). + +Files +===== +Constants for the stemmer are in the Porter2 module. + +Procedures that implement the stemmer are added to the String class. + +The stemmer algorithm is implemented in the String#porter2_stem procedure. + +Internationalisation +==================== +There isn't much, as this is a stemmer that only works for English. + +The `gb_english` flag to the various procedures allows the stemmer to treat the British +English '-ise' the same as the American English '-ize'. + +Longest suffixes +================ +Several places in the algorithm require matching the longest suffix of a word. The +regexp engine in Ruby 1.9 seems to handle alterntives in regexps by finding the +alternative that matches at the first position in the string. As we're only talking +about suffixes, that first match is also the longest suffix. If the regexp engine changes, +this behaviour may change and break the stemmer. + +Usage +===== +Call the String#porter2_stem or String#stem methods on a string to return its stem + "consistency".stem # => "consist" + "knitting".stem # => "knit" + "articulated".stem # => "articul" + "nationalize".stem # => "nation" + "nationalise".stem # => "nationalis" + "nationalise".stem(true) # => "nation" + +Author +====== +The Porter 2 stemming algorithm was developed by +[Martin Porter](http://snowball.tartarus.org/algorithms/english/stemmer.html). +This implementation is by [Neil Smith](http://www.njae.me.uk). + diff --git a/pkg/porter2stemmer-1.0.0.gem b/pkg/porter2stemmer-1.0.0.gem new file mode 100644 index 0000000..71bdd5f Binary files /dev/null and b/pkg/porter2stemmer-1.0.0.gem differ diff --git a/rdoc/Porter2.html b/rdoc/Porter2.html new file mode 100644 index 0000000..3df1a20 --- /dev/null +++ b/rdoc/Porter2.html @@ -0,0 +1,249 @@ + + + + + + + Module: Porter2 + + + + + + + + + + + +
+
+
+

+ Home + Classes + Methods +

+
+
+ +
+
+

In Files

+ +
+ + +
+ +
+ + + + + + + + + + + + +
+ +
+ + +
+

Files

+ +
+ + +
+

Class Index + [+]

+
+
+ Quicksearch + +
+
+ + + +
+ + +
+
+ +
+

Porter2

+ +
+

+Constants for the Porter 2 stemmer +

+ +
+ + + +
+

Constants

+
+ +
C
+ +

+A non-vowel +

+ + +
V
+ +

+A vowel: a e i o u y +

+ + +
CW
+ +

+A non-vowel other than w, x, or Y +

+ + +
Double
+ +

+Doubles created when adding a suffix: these are undoubled when stemmed +

+ + +
Valid_LI
+ +

+A valid letter that can come before ‘li’ (or ‘ly’) +

+ + +
SHORT_SYLLABLE
+ +

+A specification for a short syllable. +

+

+A short syllable in a word is either: +

+
    +
  1. +a vowel followed by a non-vowel other than w, x or Y and preceded by a +non-vowel, or +

    +
  2. +
  3. +a vowel at the beginning of the word followed by a non-vowel. +

    +
  4. +
+

+(The original document is silent on whether sequences of two or more +non-vowels make a syllable long. But as this specification is only used to +find sequences of non-vowel - vowel - non-vowel - end-of-word, this +ambiguity does not have an effect.) +

+ + +
STEP_2_MAPS
+ +

+Suffix transformations used in porter2_step2. (ogi, li endings dealt with +in procedure) +

+ + +
STEP_3_MAPS
+ +

+Suffix transformations used in porter2_step3. (ative ending dealt with in +procedure) +

+ + +
STEP_4_MAPS
+ +

+Suffix transformations used in porter2_step4. (ion ending dealt with in +procedure) +

+ + +
SPECIAL_CASES
+ +

+Special-case stemmings +

+ + +
STEP_1A_SPECIAL_CASES
+ +

+Special case words to stop processing after step 1a. +

+ + +
+
+ + + + + + + + +
+ + +
+ +

Disabled; run with --debug to generate this.

+ +
+ +
+

[Validate]

+

Generated with the Darkfish + Rdoc Generator 1.1.6.

+
+ + + + diff --git a/rdoc/README_rdoc.html b/rdoc/README_rdoc.html new file mode 100644 index 0000000..d7ac8d3 --- /dev/null +++ b/rdoc/README_rdoc.html @@ -0,0 +1,204 @@ + + + + + + + + File: README.rdoc [porter2stemmer 1.0.0] + + + + + + + + + + +
+
+
+

+ Home + Classes + Methods +

+
+
+ +
+ + +
+

Files

+ +
+ + +
+

Class Index + [+]

+
+
+ Quicksearch + +
+
+ + + +
+ + +
+
+ +
+

porter2stemmer

+

The Porter 2 stemmer

+

+This is the Porter 2 stemming algorithm, as described at snowball.tartarus.org/algorithms/english/stemmer.html +The original paper is: +

+

+Porter, 1980, “An algorithm for suffix stripping”, +Program, Vol. 14, no. 3, pp 130-137 +

+

Features of this implementation

+

+This stemmer is written in pure Ruby, making it easy to modify for language +variants. For instance, the original Porter stemmer only works for +American English and does not recognise British English’s +’-ise’ as an alternate spelling of ’-ize’. This +implementation has been extended to handle correctly British English. +

+

+This stemmer also features a comprehensive test set of over 29,000 words, +taken from the Porter +2 stemmer website. +

+

Files

+

+Constants for the stemmer are in the Porter2 +module. +

+

+Procedures that implement the stemmer are added to the String class. +

+

+The stemmer algorithm is implemented in the String#porter2_stem procedure. +

+

Internationalisation

+

+There isn’t much, as this is a stemmer that only works for English. +

+

+The gb_english flag to the various procedures allows the stemmer +to treat the British English ’-ise’ the same as the American +English ’-ize’. +

+

Longest suffixes

+

+Several places in the algorithm require matching the longest suffix of a +word. The regexp engine in Ruby 1.9 seems to handle alterntives in regexps +by finding the alternative that matches at the first position in the +string. As we’re only talking about suffixes, that first match is +also the longest suffix. If the regexp engine changes, this behaviour may +change and break the stemmer. +

+

Usage

+

+Call the String#porter2_stem or String#stem methods on a string to +return its stem +

+
+  "consistency".stem       # => "consist"
+  "knitting".stem          # => "knit"
+  "articulated".stem       # => "articul"
+  "nationalize".stem       # => "nation"
+  "nationalise".stem       # => "nationalis"
+  "nationalise".stem(true) # => "nation"
+
+

Author

+

+The Porter 2 stemming algorithm was developed by Martin +Porter. This implementation is by Neil +Smith. +

+

Contributing to porter2stemmer

+ +

Copyright

+

+Copyright © 2011 Neil Smith. See LICENSE.txt for further details. +

+ +
+ +
+

[Validate]

+

Generated with the Darkfish + Rdoc Generator 1.1.6.

+
+ + + diff --git a/rdoc/String.html b/rdoc/String.html new file mode 100644 index 0000000..37dcde8 --- /dev/null +++ b/rdoc/String.html @@ -0,0 +1,1142 @@ + + + + + + + Class: String + + + + + + + + + + + +
+
+
+

+ Home + Classes + Methods +

+
+
+ +
+
+

In Files

+ +
+ + +
+ + + +
+ + +
+

Files

+ +
+ + +
+

Class Index + [+]

+
+
+ Quicksearch + +
+
+ + + +
+ + +
+
+ +
+

String

+ +
+

+Implementation of the Porter 2 stemmer. String#porter2_stem is the +main stemming procedure. +

+ +
+ + + + + + + + + +
+

Public Instance Methods

+ + +
+ + +
+ + porter2_ends_with_short_syllable?() + click to toggle source + +
+ +
+ +

+Returns true if the word ends with a short syllable +

+ + + +
+
+    # File lib/porter2stemmer/implementation.rb, line 59
+59:   def porter2_ends_with_short_syllable?
+60:     self =~ /#{Porter2::SHORT_SYLLABLE}$/ ? true : false
+61:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_is_short_word?() + click to toggle source + +
+ +
+ +

+A word is short if it ends in a short syllable, and R1 is null +

+ + + +
+
+    # File lib/porter2stemmer/implementation.rb, line 65
+65:   def porter2_is_short_word?
+66:     self.porter2_ends_with_short_syllable? and self.porter2_r1.empty?
+67:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_postprocess() + click to toggle source + +
+ +
+ +

+Turn all Y letters into y +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 261
+261:   def porter2_postprocess
+262:     self.gsub(/Y/, 'y')
+263:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_preprocess() + click to toggle source + +
+ +
+ +

+Preprocess the word. Remove any initial ’, if present. Then, set +initial y, or y after a vowel, to Y +

+

+(The comment to ‘establish the regions R1 and R2’ in the +original description is an implementation optimisation that identifies +where the regions start. As no modifications are made to the word that +affect those positions, you may want to cache them now. This implementation +doesn’t do that.) +

+ + + +
+
+    # File lib/porter2stemmer/implementation.rb, line 25
+25:   def porter2_preprocess    
+26:     w = self.dup
+27: 
+28:     # remove any initial apostrophe
+29:     w.gsub!(/^'*(.)/, '\1')
+30:     
+31:     # set initial y, or y after a vowel, to Y
+32:     w.gsub!(/^y/, "Y")
+33:     w.gsub!(/(#{Porter2::V})y/, '\1Y')
+34:     
+35:     w
+36:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_r1() + click to toggle source + +
+ +
+ +

+R1 is the portion of the word after the first non-vowel after the first +vowel (with words beginning ‘gener-’, ‘commun-’, +and ‘arsen-’ treated as special cases +

+ + + +
+
+    # File lib/porter2stemmer/implementation.rb, line 41
+41:   def porter2_r1
+42:     if self =~ /^(gener|commun|arsen)(?<r1>.*)/
+43:       Regexp.last_match(:r1)
+44:     else
+45:       self =~ /#{Porter2::V}#{Porter2::C}(?<r1>.*)$/
+46:       Regexp.last_match(:r1) || ""
+47:     end
+48:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_r2() + click to toggle source + +
+ +
+ +

+R2 is the portion of R1 (porter2_r1) after the first +non-vowel after the first vowel +

+ + + +
+
+    # File lib/porter2stemmer/implementation.rb, line 52
+52:   def porter2_r2
+53:     self.porter2_r1 =~ /#{Porter2::V}#{Porter2::C}(?<r2>.*)$/
+54:     Regexp.last_match(:r2) || ""
+55:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_stem(gb_english = false) + click to toggle source + +
+ +
+ +

+Perform the stemming procedure. If gb_english is true, treat +’-ise’ and similar suffixes as ’-ize’ in American +English. +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 269
+269:   def porter2_stem(gb_english = false)
+270:     preword = self.porter2_tidy
+271:     return preword if preword.length <= 2
+272: 
+273:     word = preword.porter2_preprocess
+274:     
+275:     if Porter2::SPECIAL_CASES.has_key? word
+276:       Porter2::SPECIAL_CASES[word]
+277:     else
+278:       w1a = word.porter2_step0.porter2_step1a
+279:       if Porter2::STEP_1A_SPECIAL_CASES.include? w1a 
+280:         w1a
+281:       else
+282:         w1a.porter2_step1b(gb_english).porter2_step1c.porter2_step2(gb_english).porter2_step3(gb_english).porter2_step4(gb_english).porter2_step5.porter2_postprocess
+283:       end
+284:     end
+285:   end
+
+ +
+ + +
+ Also aliased as: stem +
+ + + +
+ + +
+ + +
+ + porter2_stem_verbose(gb_english = false) + click to toggle source + +
+ +
+ +

+A verbose version of porter2_stem that prints the +output of each stage to STDOUT +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 288
+288:   def porter2_stem_verbose(gb_english = false)
+289:     preword = self.porter2_tidy
+290:     puts "Preword: #{preword}"
+291:     return preword if preword.length <= 2
+292: 
+293:     word = preword.porter2_preprocess
+294:     puts "Preprocessed: #{word}"
+295:     
+296:     if Porter2::SPECIAL_CASES.has_key? word
+297:       puts "Returning #{word} as special case #{Porter2::SPECIAL_CASES[word]}"
+298:       Porter2::SPECIAL_CASES[word]
+299:     else
+300:       r1 = word.porter2_r1
+301:       r2 = word.porter2_r2
+302:       puts "R1 = #{r1}, R2 = #{r2}"
+303:     
+304:       w0 = word.porter2_step0 ; puts "After step 0:  #{w0} (R1 = #{w0.porter2_r1}, R2 = #{w0.porter2_r2})"
+305:       w1a = w0.porter2_step1a ; puts "After step 1a: #{w1a} (R1 = #{w1a.porter2_r1}, R2 = #{w1a.porter2_r2})"
+306:       
+307:       if Porter2::STEP_1A_SPECIAL_CASES.include? w1a
+308:         puts "Returning #{w1a} as 1a special case"
+309:         w1a
+310:       else
+311:         w1b = w1a.porter2_step1b(gb_english) ; puts "After step 1b: #{w1b} (R1 = #{w1b.porter2_r1}, R2 = #{w1b.porter2_r2})"
+312:         w1c = w1b.porter2_step1c ; puts "After step 1c: #{w1c} (R1 = #{w1c.porter2_r1}, R2 = #{w1c.porter2_r2})"
+313:         w2 = w1c.porter2_step2(gb_english) ; puts "After step 2:  #{w2} (R1 = #{w2.porter2_r1}, R2 = #{w2.porter2_r2})"
+314:         w3 = w2.porter2_step3(gb_english) ; puts "After step 3:  #{w3} (R1 = #{w3.porter2_r1}, R2 = #{w3.porter2_r2})"
+315:         w4 = w3.porter2_step4(gb_english) ; puts "After step 4:  #{w4} (R1 = #{w4.porter2_r1}, R2 = #{w4.porter2_r2})"
+316:         w5 = w4.porter2_step5 ; puts "After step 5:  #{w5}"
+317:         wpost = w5.porter2_postprocess ; puts "After postprocess: #{wpost}"
+318:         wpost
+319:       end
+320:     end
+321:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step0() + click to toggle source + +
+ +
+ +

+Search for the longest among the suffixes, +

+
    +
  • +‘ +

    +
  • +
  • +’s +

    +
  • +
  • +’s’ +

    +
  • +
+

+and remove if found. +

+ + + +
+
+    # File lib/porter2stemmer/implementation.rb, line 75
+75:   def porter2_step0
+76:     self.sub!(/(.)('s'|'s|')$/, '\1') || self
+77:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step1a() + click to toggle source + +
+ +
+ +

+Search for the longest among the following suffixes, and perform the action +indicated. +

+ + + + + +
sses

+replace by ss +

+
ied, ies

+replace by i if preceded by more than one letter, otherwise by ie +

+
s

+delete if the preceding word part contains a vowel not immediately before +the s +

+
us, ss

+do nothing +

+
+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 85
+ 85:   def porter2_step1a
+ 86:     if self =~ /sses$/
+ 87:       self.sub(/sses$/, 'ss')
+ 88:     elsif self =~ /..(ied|ies)$/
+ 89:       self.sub(/(ied|ies)$/, 'i')
+ 90:     elsif self =~ /(ied|ies)$/
+ 91:       self.sub(/(ied|ies)$/, 'ie')
+ 92:     elsif self =~ /(us|ss)$/
+ 93:       self
+ 94:     elsif self =~ /s$/
+ 95:       if self =~ /(#{Porter2::V}.+)s$/
+ 96:         self.sub(/s$/, '') 
+ 97:       else
+ 98:         self
+ 99:       end
+100:     else
+101:       self
+102:     end
+103:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step1b(gb_english = false) + click to toggle source + +
+ +
+ +

+Search for the longest among the following suffixes, and perform the action +indicated. +

+ + + +
eed, eedly

+replace by ee if the suffix is also in R1 +

+
ed, edly, ing, ingly

+delete if the preceding word part contains a vowel and, after the +deletion: +

+
    +
  • +if the word ends at, bl or iz: add e, or +

    +
  • +
+
    +
  • +if the word ends with a double: remove the last letter, or +

    +
  • +
+
    +
  • +if the word is short: add e +

    +
  • +
+
+

+(If gb_english is true, treat the ‘is’ suffix as +‘iz’ above.) +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 115
+115:   def porter2_step1b(gb_english = false)
+116:     if self =~ /(eed|eedly)$/
+117:       if self.porter2_r1 =~ /(eed|eedly)$/
+118:         self.sub(/(eed|eedly)$/, 'ee')
+119:       else
+120:         self
+121:       end
+122:     else
+123:       w = self.dup
+124:       if w =~ /#{Porter2::V}.*(ed|edly|ing|ingly)$/
+125:         w.sub!(/(ed|edly|ing|ingly)$/, '')
+126:         if w =~ /(at|lb|iz)$/
+127:           w += 'e' 
+128:         elsif w =~ /is$/ and gb_english
+129:           w += 'e' 
+130:         elsif w =~ /#{Porter2::Double}$/
+131:           w.chop!
+132:         elsif w.porter2_is_short_word?
+133:           w += 'e'
+134:         end
+135:       end
+136:       w
+137:     end
+138:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step1c() + click to toggle source + +
+ +
+ +

+Replace a suffix of y or Y by i if it is preceded by a non-vowel which is +not the first letter of the word. +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 143
+143:   def porter2_step1c
+144:     if self =~ /.+#{Porter2::C}(y|Y)$/
+145:       self.sub(/(y|Y)$/, 'i')
+146:     else
+147:       self
+148:     end
+149:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step2(gb_english = false) + click to toggle source + +
+ +
+ +

+Search for the longest among the suffixes listed in the keys of +Porter2::STEP_2_MAPS. If one is found and that suffix occurs in R1, +replace it with the value found in STEP_2_MAPS. +

+

+(Suffixes ‘ogi’ and ‘li’ are treated as special +cases in the procedure.) +

+

+(If gb_english is true, replace the ‘iser’ and +‘isation’ suffixes with ‘ise’, similarly to how +‘izer’ and ‘ization’ are treated.) +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 160
+160:   def porter2_step2(gb_english = false)
+161:     r1 = self.porter2_r1
+162:     s2m = Porter2::STEP_2_MAPS.dup
+163:     if gb_english
+164:       s2m["iser"] = "ise"
+165:       s2m["isation"] = "ise"
+166:     end
+167:     step_2_re = Regexp.union(s2m.keys.map {|r| Regexp.new(r + "$")})
+168:     if self =~ step_2_re
+169:       if r1 =~ /#{$&}$/
+170:         self.sub(/#{$&}$/, s2m[$&])
+171:       else
+172:         self
+173:       end
+174:     elsif r1 =~ /li$/ and self =~ /(#{Porter2::Valid_LI})li$/
+175:       self.sub(/li$/, '')
+176:     elsif r1 =~ /ogi$/ and self =~ /logi$/
+177:       self.sub(/ogi$/, 'og')
+178:     else
+179:       self
+180:     end
+181:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step3(gb_english = false) + click to toggle source + +
+ +
+ +

+Search for the longest among the suffixes listed in the keys of +Porter2::STEP_3_MAPS. If one is found and that suffix occurs in R1, +replace it with the value found in STEP_3_MAPS. +

+

+(Suffix ‘ative’ is treated as a special case in the procedure.) +

+

+(If gb_english is true, replace the ‘alise’ suffix +with ‘al’, similarly to how ‘alize’ is treated.) +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 192
+192:   def porter2_step3(gb_english = false)
+193:     if self =~ /ative$/ and self.porter2_r2 =~ /ative$/
+194:       self.sub(/ative$/, '')
+195:     else
+196:       s3m = Porter2::STEP_3_MAPS.dup
+197:       if gb_english
+198:         s3m["alise"] = "al"
+199:       end
+200:       step_3_re = Regexp.union(s3m.keys.map {|r| Regexp.new(r + "$")})
+201:       r1 = self.porter2_r1
+202:       if self =~ step_3_re and r1 =~ /#{$&}$/ 
+203:         self.sub(/#{$&}$/, s3m[$&])
+204:       else
+205:         self
+206:       end
+207:     end
+208:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step4(gb_english = false) + click to toggle source + +
+ +
+ +

+Search for the longest among the suffixes listed in the keys of +Porter2::STEP_4_MAPS. If one is found and that suffix occurs in R2, +replace it with the value found in STEP_4_MAPS. +

+

+(Suffix ‘ion’ is treated as a special case in the procedure.) +

+

+(If gb_english is true, delete the ‘ise’ suffix if +found.) +

+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 218
+218:   def porter2_step4(gb_english = false)
+219:     if self.porter2_r2 =~ /ion$/ and self =~ /(s|t)ion$/
+220:       self.sub(/ion$/, '')
+221:     else
+222:       s4m = Porter2::STEP_4_MAPS.dup
+223:       if gb_english
+224:         s4m["ise"] = ""
+225:       end
+226:       step_4_re = Regexp.union(s4m.keys.map {|r| Regexp.new(r + "$")})
+227:       r2 = self.porter2_r2
+228:       if self =~ step_4_re
+229:         if r2 =~ /#{$&}/
+230:           self.sub(/#{$&}$/, s4m[$&])
+231:         else
+232:           self
+233:         end
+234:       else
+235:         self
+236:       end
+237:     end
+238:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_step5() + click to toggle source + +
+ +
+ +

+Search for the the following suffixes, and, if found, perform the action +indicated. +

+ + + +
e

+delete if in R2, or in R1 and not preceded by a short syllable +

+
l

+delete if in R2 and preceded by l +

+
+ + + +
+
+     # File lib/porter2stemmer/implementation.rb, line 244
+244:   def porter2_step5
+245:     if self =~ /ll$/ and self.porter2_r2 =~ /l$/
+246:       self.sub(/ll$/, 'l') 
+247:     elsif self =~ /e$/ and self.porter2_r2 =~ /e$/ 
+248:       self.sub(/e$/, '') 
+249:     else
+250:       r1 = self.porter2_r1
+251:       if self =~ /e$/ and r1 =~ /e$/ and not self =~ /#{Porter2::SHORT_SYLLABLE}e$/
+252:         self.sub(/e$/, '')
+253:       else
+254:         self
+255:       end
+256:     end
+257:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + porter2_tidy() + click to toggle source + +
+ +
+ +

+Tidy up the word before we get down to the algorithm +

+ + + +
+
+    # File lib/porter2stemmer/implementation.rb, line 7
+ 7:   def porter2_tidy
+ 8:     preword = self.to_s.strip.downcase
+ 9:     
+10:     # map apostrophe-like characters to apostrophes
+11:     preword.gsub!(/‘/, "'")
+12:     preword.gsub!(/’/, "'")
+13: 
+14:     preword
+15:   end
+
+ +
+ + + + +
+ + +
+ + +
+ + stem(gb_english = false) + click to toggle source + +
+ +
+ + + + + +
+ + + + +
+ Alias for: porter2_stem +
+ +
+ + +
+ + +
+ + +
+ +

Disabled; run with --debug to generate this.

+ +
+ +
+

[Validate]

+

Generated with the Darkfish + Rdoc Generator 1.1.6.

+
+ + + + diff --git a/rdoc/lib/porter2stemmer/constants_rb.html b/rdoc/lib/porter2stemmer/constants_rb.html new file mode 100644 index 0000000..2b525f3 --- /dev/null +++ b/rdoc/lib/porter2stemmer/constants_rb.html @@ -0,0 +1,55 @@ + + + + + + + + File: constants.rb [porter2stemmer 1.0.0] + + + + + + + + + + +
+
+
Last Modified
+
2011-01-09 09:20:05 +0000
+ + +
Requires
+
+
    + +
+
+ + + +
+
+ +
+ +
+

Description

+

+coding: utf-8 +

+ +
+ +
+ + + diff --git a/rdoc/lib/porter2stemmer/implementation_rb.html b/rdoc/lib/porter2stemmer/implementation_rb.html new file mode 100644 index 0000000..754ad35 --- /dev/null +++ b/rdoc/lib/porter2stemmer/implementation_rb.html @@ -0,0 +1,55 @@ + + + + + + + + File: implementation.rb [porter2stemmer 1.0.0] + + + + + + + + + + +
+
+
Last Modified
+
2011-01-08 10:20:57 +0000
+ + +
Requires
+
+
    + +
+
+ + + +
+
+ +
+ +
+

Description

+

+coding: utf-8 +

+ +
+ +
+ + + diff --git a/rdoc/lib/porter2stemmer_rb.html b/rdoc/lib/porter2stemmer_rb.html new file mode 100644 index 0000000..a5be8cb --- /dev/null +++ b/rdoc/lib/porter2stemmer_rb.html @@ -0,0 +1,59 @@ + + + + + + + + File: porter2stemmer.rb [porter2stemmer 1.0.0] + + + + + + + + + + +
+
+
Last Modified
+
2011-03-18 14:12:11 +0000
+ + +
Requires
+
+
    + +
  • porter2stemmer/constants
  • + +
  • porter2stemmer/implementation
  • + +
+
+ + + +
+
+ +
+ +
+

Description

+

+coding: utf-8 +

+ +
+ +
+ + + diff --git a/rdoc/rdoc.css b/rdoc/rdoc.css new file mode 100644 index 0000000..ffe9960 --- /dev/null +++ b/rdoc/rdoc.css @@ -0,0 +1,706 @@ +/* + * "Darkfish" Rdoc CSS + * $Id: rdoc.css 54 2009-01-27 01:09:48Z deveiant $ + * + * Author: Michael Granger + * + */ + +/* Base Green is: #6C8C22 */ + +*{ padding: 0; margin: 0; } + +body { + background: #efefef; + font: 14px "Helvetica Neue", Helvetica, Tahoma, sans-serif; +} +body.class, body.module, body.file { + margin-left: 40px; +} +body.file-popup { + font-size: 90%; + margin-left: 0; +} + +h1 { + font-size: 300%; + text-shadow: rgba(135,145,135,0.65) 2px 2px 3px; + color: #6C8C22; +} +h2,h3,h4 { margin-top: 1.5em; } + +:link, +:visited { + color: #6C8C22; + text-decoration: none; +} +:link:hover, +:visited:hover { + border-bottom: 1px dotted #6C8C22; +} + +pre { + background: #ddd; + padding: 0.5em 0; +} + + +/* @group Generic Classes */ + +.initially-hidden { + display: none; +} + +.quicksearch-field { + width: 98%; + background: #ddd; + border: 1px solid #aaa; + height: 1.5em; + -webkit-border-radius: 4px; +} +.quicksearch-field:focus { + background: #f1edba; +} + +.missing-docs { + font-size: 120%; + background: white url(images/wrench_orange.png) no-repeat 4px center; + color: #ccc; + line-height: 2em; + border: 1px solid #d00; + opacity: 1; + padding-left: 20px; + text-indent: 24px; + letter-spacing: 3px; + font-weight: bold; + -webkit-border-radius: 5px; + -moz-border-radius: 5px; +} + +.target-section { + border: 2px solid #dcce90; + border-left-width: 8px; + padding: 0 1em; + background: #fff3c2; +} + +/* @end */ + + +/* @group Index Page, Standalone file pages */ +body.indexpage { + margin: 1em 3em; +} +body.indexpage p, +body.indexpage div, +body.file p { + margin: 1em 0; +} + +.indexpage ul, +.file #documentation ul { + line-height: 160%; + list-style: none; +} +.indexpage ul :link, +.indexpage ul :visited { + font-size: 16px; +} + +.indexpage li, +.file #documentation li { + padding-left: 20px; + background: url(images/bullet_black.png) no-repeat left 4px; +} +.indexpage li.module { + background: url(images/package.png) no-repeat left 4px; +} +.indexpage li.class { + background: url(images/ruby.png) no-repeat left 4px; +} +.indexpage li.file { + background: url(images/page_white_text.png) no-repeat left 4px; +} +.file li p, +.indexpage li p { + margin: 0 0; +} + +/* @end */ + +/* @group Top-Level Structure */ + +.class #metadata, +.file #metadata, +.module #metadata { + float: left; + width: 260px; +} + +.class #documentation, +.file #documentation, +.module #documentation { + margin: 2em 1em 5em 300px; + min-width: 340px; +} + +.file #metadata { + margin: 0.8em; +} + +#validator-badges { + clear: both; + margin: 1em 1em 2em; +} + +/* @end */ + +/* @group Metadata Section */ +#metadata .section { + background-color: #dedede; + -moz-border-radius: 5px; + -webkit-border-radius: 5px; + border: 1px solid #aaa; + margin: 0 8px 16px; + font-size: 90%; + overflow: hidden; +} +#metadata h3.section-header { + margin: 0; + padding: 2px 8px; + background: #ccc; + color: #666; + -moz-border-radius-topleft: 4px; + -moz-border-radius-topright: 4px; + -webkit-border-top-left-radius: 4px; + -webkit-border-top-right-radius: 4px; + border-bottom: 1px solid #aaa; +} +#metadata #home-section h3.section-header { + border-bottom: 0; +} + +#metadata ul, +#metadata dl, +#metadata p { + padding: 8px; + list-style: none; +} + +#file-metadata ul { + padding-left: 28px; + list-style-image: url(images/page_green.png); +} + +dl.svninfo { + color: #666; + margin: 0; +} +dl.svninfo dt { + font-weight: bold; +} + +ul.link-list li { + white-space: nowrap; +} +ul.link-list .type { + font-size: 8px; + text-transform: uppercase; + color: white; + background: #969696; + padding: 2px 4px; + -webkit-border-radius: 5px; +} + +/* @end */ + + +/* @group Project Metadata Section */ +#project-metadata { + margin-top: 3em; +} + +.file #project-metadata { + margin-top: 0em; +} + +#project-metadata .section { + border: 1px solid #aaa; +} +#project-metadata h3.section-header { + border-bottom: 1px solid #aaa; + position: relative; +} +#project-metadata h3.section-header .search-toggle { + position: absolute; + right: 5px; +} + + +#project-metadata form { + color: #777; + background: #ccc; + padding: 8px 8px 16px; + border-bottom: 1px solid #bbb; +} +#project-metadata fieldset { + border: 0; +} + +#no-class-search-results { + margin: 0 auto 1em; + text-align: center; + font-size: 14px; + font-weight: bold; + color: #aaa; +} + +/* @end */ + + +/* @group Documentation Section */ +#description { + font-size: 100%; + color: #333; +} + +#description p { + margin: 1em 0.4em; +} + +#description li p { + margin: 0; +} + +#description ul { + margin-left: 1.5em; +} +#description ul li { + line-height: 1.4em; +} + +#description dl, +#documentation dl { + margin: 8px 1.5em; + border: 1px solid #ccc; +} +#description dl { + font-size: 14px; +} + +#description dt, +#documentation dt { + padding: 2px 4px; + font-weight: bold; + background: #ddd; +} +#description dd, +#documentation dd { + padding: 2px 12px; +} +#description dd + dt, +#documentation dd + dt { + margin-top: 0.7em; +} + +#documentation .section { + font-size: 90%; +} +#documentation h3.section-header { + margin-top: 2em; + padding: 0.75em 0.5em; + background-color: #dedede; + color: #333; + font-size: 150%; + border: 1px solid #bbb; + -moz-border-radius: 3px; + -webkit-border-radius: 3px; +} + +#constants-list > dl, +#attributes-list > dl { + margin: 1em 0 2em; + border: 0; +} +#constants-list > dl dt, +#attributes-list > dl dt { + padding-left: 0; + font-weight: bold; + font-family: Monaco, "Andale Mono"; + background: inherit; +} +#constants-list > dl dt a, +#attributes-list > dl dt a { + color: inherit; +} +#constants-list > dl dd, +#attributes-list > dl dd { + margin: 0 0 1em 0; + padding: 0; + color: #666; +} + +/* @group Method Details */ + +#documentation .method-source-code { + display: none; +} + +#documentation .method-detail { + margin: 0.5em 0; + padding: 0.5em 0; + cursor: pointer; +} +#documentation .method-detail:hover { + background-color: #f1edba; +} +#documentation .method-heading { + position: relative; + padding: 2px 4px 0 20px; + font-size: 125%; + font-weight: bold; + color: #333; + background: url(images/brick.png) no-repeat left bottom; +} +#documentation .method-heading :link, +#documentation .method-heading :visited { + color: inherit; +} +#documentation .method-click-advice { + position: absolute; + top: 2px; + right: 5px; + font-size: 10px; + color: #9b9877; + visibility: hidden; + padding-right: 20px; + line-height: 20px; + background: url(images/zoom.png) no-repeat right top; +} +#documentation .method-detail:hover .method-click-advice { + visibility: visible; +} + +#documentation .method-alias .method-heading { + color: #666; + background: url(images/brick_link.png) no-repeat left bottom; +} + +#documentation .method-description, +#documentation .aliases { + margin: 0 20px; + line-height: 1.2em; + color: #666; +} +#documentation .aliases { + padding-top: 4px; + font-style: italic; + cursor: default; +} +#documentation .method-description p { + padding: 0; +} +#documentation .method-description p + p { + margin-bottom: 0.5em; +} +#documentation .method-description ul { + margin-left: 1.5em; +} + +#documentation .attribute-method-heading { + background: url(images/tag_green.png) no-repeat left bottom; +} +#documentation #attribute-method-details .method-detail:hover { + background-color: transparent; + cursor: default; +} +#documentation .attribute-access-type { + font-size: 60%; + text-transform: uppercase; + vertical-align: super; + padding: 0 2px; +} +/* @end */ + +/* @end */ + + + +/* @group Source Code */ + +div.method-source-code { + background: #262626; + color: #efefef; + margin: 1em; + padding: 0.5em; + border: 1px dashed #999; + overflow: hidden; +} + +div.method-source-code pre { + background: inherit; + padding: 0; + color: white; + overflow: auto; +} + +/* @group Ruby keyword styles */ + +.ruby-constant { color: #7fffd4; background: transparent; } +.ruby-keyword { color: #00ffff; background: transparent; } +.ruby-ivar { color: #eedd82; background: transparent; } +.ruby-operator { color: #00ffee; background: transparent; } +.ruby-identifier { color: #ffdead; background: transparent; } +.ruby-node { color: #ffa07a; background: transparent; } +.ruby-comment { color: #b22222; font-weight: bold; background: transparent; } +.ruby-regexp { color: #ffa07a; background: transparent; } +.ruby-value { color: #7fffd4; background: transparent; } + +/* @end */ +/* @end */ + + +/* @group File Popup Contents */ + +.file #metadata, +.file-popup #metadata { +} + +.file-popup dl { + font-size: 80%; + padding: 0.75em; + background-color: #dedede; + color: #333; + border: 1px solid #bbb; + -moz-border-radius: 3px; + -webkit-border-radius: 3px; +} +.file dt { + font-weight: bold; + padding-left: 22px; + line-height: 20px; + background: url(images/page_white_width.png) no-repeat left top; +} +.file dt.modified-date { + background: url(images/date.png) no-repeat left top; +} +.file dt.requires { + background: url(images/plugin.png) no-repeat left top; +} +.file dt.scs-url { + background: url(images/wrench.png) no-repeat left top; +} + +.file dl dd { + margin: 0 0 1em 0; +} +.file #metadata dl dd ul { + list-style: circle; + margin-left: 20px; + padding-top: 0; +} +.file #metadata dl dd ul li { +} + + +.file h2 { + margin-top: 2em; + padding: 0.75em 0.5em; + background-color: #dedede; + color: #333; + font-size: 120%; + border: 1px solid #bbb; + -moz-border-radius: 3px; + -webkit-border-radius: 3px; +} + +/* @end */ + + + + +/* @group ThickBox Styles */ +#TB_window { + font: 12px Arial, Helvetica, sans-serif; + color: #333333; +} + +#TB_secondLine { + font: 10px Arial, Helvetica, sans-serif; + color:#666666; +} + +#TB_window :link, +#TB_window :visited { color: #666666; } +#TB_window :link:hover, +#TB_window :visited:hover { color: #000; } +#TB_window :link:active, +#TB_window :visited:active { color: #666666; } +#TB_window :link:focus, +#TB_window :visited:focus { color: #666666; } + +#TB_overlay { + position: fixed; + z-index:100; + top: 0px; + left: 0px; + height:100%; + width:100%; +} + +.TB_overlayMacFFBGHack {background: url(images/macFFBgHack.png) repeat;} +.TB_overlayBG { + background-color:#000; + filter:alpha(opacity=75); + -moz-opacity: 0.75; + opacity: 0.75; +} + +* html #TB_overlay { /* ie6 hack */ + position: absolute; + height: expression(document.body.scrollHeight > document.body.offsetHeight ? document.body.scrollHeight : document.body.offsetHeight + 'px'); +} + +#TB_window { + position: fixed; + background: #ffffff; + z-index: 102; + color:#000000; + display:none; + border: 4px solid #525252; + text-align:left; + top:50%; + left:50%; +} + +* html #TB_window { /* ie6 hack */ +position: absolute; +margin-top: expression(0 - parseInt(this.offsetHeight / 2) + (TBWindowMargin = document.documentElement && document.documentElement.scrollTop || document.body.scrollTop) + 'px'); +} + +#TB_window img#TB_Image { + display:block; + margin: 15px 0 0 15px; + border-right: 1px solid #ccc; + border-bottom: 1px solid #ccc; + border-top: 1px solid #666; + border-left: 1px solid #666; +} + +#TB_caption{ + height:25px; + padding:7px 30px 10px 25px; + float:left; +} + +#TB_closeWindow{ + height:25px; + padding:11px 25px 10px 0; + float:right; +} + +#TB_closeAjaxWindow{ + padding:7px 10px 5px 0; + margin-bottom:1px; + text-align:right; + float:right; +} + +#TB_ajaxWindowTitle{ + float:left; + padding:7px 0 5px 10px; + margin-bottom:1px; + font-size: 22px; +} + +#TB_title{ + background-color: #6C8C22; + color: #dedede; + height:40px; +} +#TB_title :link, +#TB_title :visited { + color: white !important; + border-bottom: 1px dotted #dedede; +} + +#TB_ajaxContent{ + clear:both; + padding:2px 15px 15px 15px; + overflow:auto; + text-align:left; + line-height:1.4em; +} + +#TB_ajaxContent.TB_modal{ + padding:15px; +} + +#TB_ajaxContent p{ + padding:5px 0px 5px 0px; +} + +#TB_load{ + position: fixed; + display:none; + height:13px; + width:208px; + z-index:103; + top: 50%; + left: 50%; + margin: -6px 0 0 -104px; /* -height/2 0 0 -width/2 */ +} + +* html #TB_load { /* ie6 hack */ +position: absolute; +margin-top: expression(0 - parseInt(this.offsetHeight / 2) + (TBWindowMargin = document.documentElement && document.documentElement.scrollTop || document.body.scrollTop) + 'px'); +} + +#TB_HideSelect{ + z-index:99; + position:fixed; + top: 0; + left: 0; + background-color:#fff; + border:none; + filter:alpha(opacity=0); + -moz-opacity: 0; + opacity: 0; + height:100%; + width:100%; +} + +* html #TB_HideSelect { /* ie6 hack */ + position: absolute; + height: expression(document.body.scrollHeight > document.body.offsetHeight ? document.body.scrollHeight : document.body.offsetHeight + 'px'); +} + +#TB_iframeContent{ + clear:both; + border:none; + margin-bottom:-1px; + margin-top:1px; + _margin-bottom:1px; +} + +/* @end */ + +/* @group Debugging Section */ + +#debugging-toggle { + text-align: center; +} +#debugging-toggle img { + cursor: pointer; +} + +#rdoc-debugging-section-dump { + display: none; + margin: 0 2em 2em; + background: #ccc; + border: 1px solid #999; +} + + + +/* @end */