X-Git-Url: https://git.njae.me.uk/?a=blobdiff_plain;f=doc%2FString.html;fp=doc%2FString.html;h=0000000000000000000000000000000000000000;hb=c8c08d5fafba205c5a8e4138edb0df059a63de36;hp=f04ae9aa7771b3591944e48aa30427256ac4f35d;hpb=aa5699112116010af665e83447bd7d1a17d0e2f7;p=porter2stemmer.git diff --git a/doc/String.html b/doc/String.html deleted file mode 100644 index f04ae9a..0000000 --- a/doc/String.html +++ /dev/null @@ -1,1144 +0,0 @@ - - - - - - - Class: String - - - - - - - - - - - -
-
-
-

- Home - Classes - Methods -

-
-
- -
-
-

In Files

- -
- - -
- -
- - - -
-

Parent

- - - -
- - - - - - - - - - - - -
- -
- - -
-

Files

- -
- - -
-

Class Index - [+]

-
-
- Quicksearch - -
-
- - - -
- - -
-
- -
-

String

- -
-

-Implementation of the Porter 2 stemmer. String#porter2_stem is the -main stemming procedure. -

- -
- - - - - - - - - -
-

Public Instance Methods

- - -
- - -
- - porter2_ends_with_short_syllable?() - click to toggle source - -
- -
- -

-Returns true if the word ends with a short syllable -

- - - -
-
-    # File lib/porter2_implementation.rb, line 59
-59:   def porter2_ends_with_short_syllable?
-60:     self =~ /#{Porter2::SHORT_SYLLABLE}$/ ? true : false
-61:   end
-
- -
- - - - -
- - -
- - -
- - porter2_is_short_word?() - click to toggle source - -
- -
- -

-A word is short if it ends in a short syllable, and R1 is null -

- - - -
-
-    # File lib/porter2_implementation.rb, line 65
-65:   def porter2_is_short_word?
-66:     self.porter2_ends_with_short_syllable? and self.porter2_r1.empty?
-67:   end
-
- -
- - - - -
- - -
- - -
- - porter2_postprocess() - click to toggle source - -
- -
- -

-Turn all Y letters into y -

- - - -
-
-     # File lib/porter2_implementation.rb, line 261
-261:   def porter2_postprocess
-262:     self.gsub(/Y/, 'y')
-263:   end
-
- -
- - - - -
- - -
- - -
- - porter2_preprocess() - click to toggle source - -
- -
- -

-Preprocess the word. Remove any initial ’, if present. Then, set -initial y, or y after a vowel, to Y -

-

-(The comment to ‘establish the regions R1 and R2’ in the -original description is an implementation optimisation that identifies -where the regions start. As no modifications are made to the word that -affect those positions, you may want to cache them now. This implementation -doesn’t do that.) -

- - - -
-
-    # File lib/porter2_implementation.rb, line 25
-25:   def porter2_preprocess    
-26:     w = self.dup
-27: 
-28:     # remove any initial apostrophe
-29:     w.gsub!(/^'*(.)/, '\1')
-30:     
-31:     # set initial y, or y after a vowel, to Y
-32:     w.gsub!(/^y/, "Y")
-33:     w.gsub!(/(#{Porter2::V})y/, '\1Y')
-34:     
-35:     w
-36:   end
-
- -
- - - - -
- - -
- - -
- - porter2_r1() - click to toggle source - -
- -
- -

-R1 is the portion of the word after the first non-vowel after the first -vowel (with words beginning ‘gener-’, ‘commun-’, -and ‘arsen-’ treated as special cases -

- - - -
-
-    # File lib/porter2_implementation.rb, line 41
-41:   def porter2_r1
-42:     if self =~ /^(gener|commun|arsen)(?<r1>.*)/
-43:       Regexp.last_match(:r1)
-44:     else
-45:       self =~ /#{Porter2::V}#{Porter2::C}(?<r1>.*)$/
-46:       Regexp.last_match(:r1) || ""
-47:     end
-48:   end
-
- -
- - - - -
- - -
- - -
- - porter2_r2() - click to toggle source - -
- -
- -

-R2 is the portion of R1 (porter2_r1) after the first -non-vowel after the first vowel -

- - - -
-
-    # File lib/porter2_implementation.rb, line 52
-52:   def porter2_r2
-53:     self.porter2_r1 =~ /#{Porter2::V}#{Porter2::C}(?<r2>.*)$/
-54:     Regexp.last_match(:r2) || ""
-55:   end
-
- -
- - - - -
- - -
- - -
- - porter2_stem(gb_english = false) - click to toggle source - -
- -
- -

-Perform the stemming procedure. If gb_english is true, treat -’-ise’ and similar suffixes as ’-ize’ in American -English. -

- - - -
-
-     # File lib/porter2_implementation.rb, line 269
-269:   def porter2_stem(gb_english = false)
-270:     preword = self.porter2_tidy
-271:     return preword if preword.length <= 2
-272: 
-273:     word = preword.porter2_preprocess
-274:     
-275:     if Porter2::SPECIAL_CASES.has_key? word
-276:       Porter2::SPECIAL_CASES[word]
-277:     else
-278:       w1a = word.porter2_step0.porter2_step1a
-279:       if Porter2::STEP_1A_SPECIAL_CASES.include? w1a 
-280:         w1a
-281:       else
-282:         w1a.porter2_step1b(gb_english).porter2_step1c.porter2_step2(gb_english).porter2_step3(gb_english).porter2_step4(gb_english).porter2_step5.porter2_postprocess
-283:       end
-284:     end
-285:   end
-
- -
- - -
- Also aliased as: stem -
- - - -
- - -
- - -
- - porter2_stem_verbose(gb_english = false) - click to toggle source - -
- -
- -

-A verbose version of porter2_stem that prints the -output of each stage to STDOUT -

- - - -
-
-     # File lib/porter2_implementation.rb, line 288
-288:   def porter2_stem_verbose(gb_english = false)
-289:     preword = self.porter2_tidy
-290:     puts "Preword: #{preword}"
-291:     return preword if preword.length <= 2
-292: 
-293:     word = preword.porter2_preprocess
-294:     puts "Preprocessed: #{word}"
-295:     
-296:     if Porter2::SPECIAL_CASES.has_key? word
-297:       puts "Returning #{word} as special case #{Porter2::SPECIAL_CASES[word]}"
-298:       Porter2::SPECIAL_CASES[word]
-299:     else
-300:       r1 = word.porter2_r1
-301:       r2 = word.porter2_r2
-302:       puts "R1 = #{r1}, R2 = #{r2}"
-303:     
-304:       w0 = word.porter2_step0 ; puts "After step 0:  #{w0} (R1 = #{w0.porter2_r1}, R2 = #{w0.porter2_r2})"
-305:       w1a = w0.porter2_step1a ; puts "After step 1a: #{w1a} (R1 = #{w1a.porter2_r1}, R2 = #{w1a.porter2_r2})"
-306:       
-307:       if Porter2::STEP_1A_SPECIAL_CASES.include? w1a
-308:         puts "Returning #{w1a} as 1a special case"
-309:         w1a
-310:       else
-311:         w1b = w1a.porter2_step1b(gb_english) ; puts "After step 1b: #{w1b} (R1 = #{w1b.porter2_r1}, R2 = #{w1b.porter2_r2})"
-312:         w1c = w1b.porter2_step1c ; puts "After step 1c: #{w1c} (R1 = #{w1c.porter2_r1}, R2 = #{w1c.porter2_r2})"
-313:         w2 = w1c.porter2_step2(gb_english) ; puts "After step 2:  #{w2} (R1 = #{w2.porter2_r1}, R2 = #{w2.porter2_r2})"
-314:         w3 = w2.porter2_step3(gb_english) ; puts "After step 3:  #{w3} (R1 = #{w3.porter2_r1}, R2 = #{w3.porter2_r2})"
-315:         w4 = w3.porter2_step4(gb_english) ; puts "After step 4:  #{w4} (R1 = #{w4.porter2_r1}, R2 = #{w4.porter2_r2})"
-316:         w5 = w4.porter2_step5 ; puts "After step 5:  #{w5}"
-317:         wpost = w5.porter2_postprocess ; puts "After postprocess: #{wpost}"
-318:         wpost
-319:       end
-320:     end
-321:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step0() - click to toggle source - -
- -
- -

-Search for the longest among the suffixes, -

-
    -
  • -‘ -

    -
  • -
  • -’s -

    -
  • -
  • -’s’ -

    -
  • -
-

-and remove if found. -

- - - -
-
-    # File lib/porter2_implementation.rb, line 75
-75:   def porter2_step0
-76:     self.sub!(/(.)('s'|'s|')$/, '\1') || self
-77:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step1a() - click to toggle source - -
- -
- -

-Search for the longest among the following suffixes, and perform the action -indicated. -

- - - - - -
sses

-replace by ss -

-
ied, ies

-replace by i if preceded by more than one letter, otherwise by ie -

-
s

-delete if the preceding word part contains a vowel not immediately before -the s -

-
us, ss

-do nothing -

-
- - - -
-
-     # File lib/porter2_implementation.rb, line 85
- 85:   def porter2_step1a
- 86:     if self =~ /sses$/
- 87:       self.sub(/sses$/, 'ss')
- 88:     elsif self =~ /..(ied|ies)$/
- 89:       self.sub(/(ied|ies)$/, 'i')
- 90:     elsif self =~ /(ied|ies)$/
- 91:       self.sub(/(ied|ies)$/, 'ie')
- 92:     elsif self =~ /(us|ss)$/
- 93:       self
- 94:     elsif self =~ /s$/
- 95:       if self =~ /(#{Porter2::V}.+)s$/
- 96:         self.sub(/s$/, '') 
- 97:       else
- 98:         self
- 99:       end
-100:     else
-101:       self
-102:     end
-103:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step1b(gb_english = false) - click to toggle source - -
- -
- -

-Search for the longest among the following suffixes, and perform the action -indicated. -

- - - -
eed, eedly

-replace by ee if the suffix is also in R1 -

-
ed, edly, ing, ingly

-delete if the preceding word part contains a vowel and, after the -deletion: -

-
    -
  • -if the word ends at, bl or iz: add e, or -

    -
  • -
-
    -
  • -if the word ends with a double: remove the last letter, or -

    -
  • -
-
    -
  • -if the word is short: add e -

    -
  • -
-
-

-(If gb_english is true, treat the ‘is’ suffix as -‘iz’ above.) -

- - - -
-
-     # File lib/porter2_implementation.rb, line 115
-115:   def porter2_step1b(gb_english = false)
-116:     if self =~ /(eed|eedly)$/
-117:       if self.porter2_r1 =~ /(eed|eedly)$/
-118:         self.sub(/(eed|eedly)$/, 'ee')
-119:       else
-120:         self
-121:       end
-122:     else
-123:       w = self.dup
-124:       if w =~ /#{Porter2::V}.*(ed|edly|ing|ingly)$/
-125:         w.sub!(/(ed|edly|ing|ingly)$/, '')
-126:         if w =~ /(at|lb|iz)$/
-127:           w += 'e' 
-128:         elsif w =~ /is$/ and gb_english
-129:           w += 'e' 
-130:         elsif w =~ /#{Porter2::Double}$/
-131:           w.chop!
-132:         elsif w.porter2_is_short_word?
-133:           w += 'e'
-134:         end
-135:       end
-136:       w
-137:     end
-138:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step1c() - click to toggle source - -
- -
- -

-Replace a suffix of y or Y by i if it is preceded by a non-vowel which is -not the first letter of the word. -

- - - -
-
-     # File lib/porter2_implementation.rb, line 143
-143:   def porter2_step1c
-144:     if self =~ /.+#{Porter2::C}(y|Y)$/
-145:       self.sub(/(y|Y)$/, 'i')
-146:     else
-147:       self
-148:     end
-149:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step2(gb_english = false) - click to toggle source - -
- -
- -

-Search for the longest among the suffixes listed in the keys of -Porter2::STEP_2_MAPS. If one is found and that suffix occurs in R1, -replace it with the value found in STEP_2_MAPS. -

-

-(Suffixes ‘ogi’ and ‘li’ are treated as special -cases in the procedure.) -

-

-(If gb_english is true, replace the ‘iser’ and -‘isation’ suffixes with ‘ise’, similarly to how -‘izer’ and ‘ization’ are treated.) -

- - - -
-
-     # File lib/porter2_implementation.rb, line 160
-160:   def porter2_step2(gb_english = false)
-161:     r1 = self.porter2_r1
-162:     s2m = Porter2::STEP_2_MAPS.dup
-163:     if gb_english
-164:       s2m["iser"] = "ise"
-165:       s2m["isation"] = "ise"
-166:     end
-167:     step_2_re = Regexp.union(s2m.keys.map {|r| Regexp.new(r + "$")})
-168:     if self =~ step_2_re
-169:       if r1 =~ /#{$&}$/
-170:         self.sub(/#{$&}$/, s2m[$&])
-171:       else
-172:         self
-173:       end
-174:     elsif r1 =~ /li$/ and self =~ /(#{Porter2::Valid_LI})li$/
-175:       self.sub(/li$/, '')
-176:     elsif r1 =~ /ogi$/ and self =~ /logi$/
-177:       self.sub(/ogi$/, 'og')
-178:     else
-179:       self
-180:     end
-181:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step3(gb_english = false) - click to toggle source - -
- -
- -

-Search for the longest among the suffixes listed in the keys of -Porter2::STEP_3_MAPS. If one is found and that suffix occurs in R1, -replace it with the value found in STEP_3_MAPS. -

-

-(Suffix ‘ative’ is treated as a special case in the procedure.) -

-

-(If gb_english is true, replace the ‘alise’ suffix -with ‘al’, similarly to how ‘alize’ is treated.) -

- - - -
-
-     # File lib/porter2_implementation.rb, line 192
-192:   def porter2_step3(gb_english = false)
-193:     if self =~ /ative$/ and self.porter2_r2 =~ /ative$/
-194:       self.sub(/ative$/, '')
-195:     else
-196:       s3m = Porter2::STEP_3_MAPS.dup
-197:       if gb_english
-198:         s3m["alise"] = "al"
-199:       end
-200:       step_3_re = Regexp.union(s3m.keys.map {|r| Regexp.new(r + "$")})
-201:       r1 = self.porter2_r1
-202:       if self =~ step_3_re and r1 =~ /#{$&}$/ 
-203:         self.sub(/#{$&}$/, s3m[$&])
-204:       else
-205:         self
-206:       end
-207:     end
-208:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step4(gb_english = false) - click to toggle source - -
- -
- -

-Search for the longest among the suffixes listed in the keys of -Porter2::STEP_4_MAPS. If one is found and that suffix occurs in R2, -replace it with the value found in STEP_4_MAPS. -

-

-(Suffix ‘ion’ is treated as a special case in the procedure.) -

-

-(If gb_english is true, delete the ‘ise’ suffix if -found.) -

- - - -
-
-     # File lib/porter2_implementation.rb, line 218
-218:   def porter2_step4(gb_english = false)
-219:     if self.porter2_r2 =~ /ion$/ and self =~ /(s|t)ion$/
-220:       self.sub(/ion$/, '')
-221:     else
-222:       s4m = Porter2::STEP_4_MAPS.dup
-223:       if gb_english
-224:         s4m["ise"] = ""
-225:       end
-226:       step_4_re = Regexp.union(s4m.keys.map {|r| Regexp.new(r + "$")})
-227:       r2 = self.porter2_r2
-228:       if self =~ step_4_re
-229:         if r2 =~ /#{$&}/
-230:           self.sub(/#{$&}$/, s4m[$&])
-231:         else
-232:           self
-233:         end
-234:       else
-235:         self
-236:       end
-237:     end
-238:   end
-
- -
- - - - -
- - -
- - -
- - porter2_step5() - click to toggle source - -
- -
- -

-Search for the the following suffixes, and, if found, perform the action -indicated. -

- - - -
e

-delete if in R2, or in R1 and not preceded by a short syllable -

-
l

-delete if in R2 and preceded by l -

-
- - - -
-
-     # File lib/porter2_implementation.rb, line 244
-244:   def porter2_step5
-245:     if self =~ /ll$/ and self.porter2_r2 =~ /l$/
-246:       self.sub(/ll$/, 'l') 
-247:     elsif self =~ /e$/ and self.porter2_r2 =~ /e$/ 
-248:       self.sub(/e$/, '') 
-249:     else
-250:       r1 = self.porter2_r1
-251:       if self =~ /e$/ and r1 =~ /e$/ and not self =~ /#{Porter2::SHORT_SYLLABLE}e$/
-252:         self.sub(/e$/, '')
-253:       else
-254:         self
-255:       end
-256:     end
-257:   end
-
- -
- - - - -
- - -
- - -
- - porter2_tidy() - click to toggle source - -
- -
- -

-Tidy up the word before we get down to the algorithm -

- - - -
-
-    # File lib/porter2_implementation.rb, line 7
- 7:   def porter2_tidy
- 8:     preword = self.to_s.strip.downcase
- 9:     
-10:     # map apostrophe-like characters to apostrophes
-11:     preword.gsub!(/‘/, "'")
-12:     preword.gsub!(/’/, "'")
-13: 
-14:     preword
-15:   end
-
- -
- - - - -
- - -
- - -
- - stem(gb_english = false) - click to toggle source - -
- -
- - - - - -
- - - - -
- Alias for: porter2_stem -
- -
- - -
- - -
- - -
- -

Disabled; run with --debug to generate this.

- -
- -
-

[Validate]

-

Generated with the Darkfish - Rdoc Generator 1.1.6.

-
- - - -