X-Git-Url: https://git.njae.me.uk/?a=blobdiff_plain;f=rdoc%2FREADME_rdoc.html;fp=rdoc%2FREADME_rdoc.html;h=d7ac8d3ae28fca9691d04997df7f96ea6875327a;hb=b8f204e08d491bd5185d3d9e94ed98366a359af7;hp=0000000000000000000000000000000000000000;hpb=c8c08d5fafba205c5a8e4138edb0df059a63de36;p=porter2stemmer.git diff --git a/rdoc/README_rdoc.html b/rdoc/README_rdoc.html new file mode 100644 index 0000000..d7ac8d3 --- /dev/null +++ b/rdoc/README_rdoc.html @@ -0,0 +1,204 @@ +<?xml version="1.0" encoding="utf-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head> + <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> + + <title>File: README.rdoc [porter2stemmer 1.0.0]</title> + + <link type="text/css" media="screen" href="./rdoc.css" rel="stylesheet" /> + + <script src="./js/jquery.js" type="text/javascript" + charset="utf-8"></script> + <script src="./js/thickbox-compressed.js" type="text/javascript" + charset="utf-8"></script> + <script src="./js/quicksearch.js" type="text/javascript" + charset="utf-8"></script> + <script src="./js/darkfish.js" type="text/javascript" + charset="utf-8"></script> +</head> + +<body class="file"> + <div id="metadata"> + <div id="home-metadata"> + <div id="home-section" class="section"> + <h3 class="section-header"> + <a href="./index.html">Home</a> + <a href="./index.html#classes">Classes</a> + <a href="./index.html#methods">Methods</a> + </h3> + </div> + </div> + + <div id="project-metadata"> + + + <div id="fileindex-section" class="section project-section"> + <h3 class="section-header">Files</h3> + <ul> + + <li class="file"><a href="./README_rdoc.html">README.rdoc</a></li> + + </ul> + </div> + + + <div id="classindex-section" class="section project-section"> + <h3 class="section-header">Class Index + <span class="search-toggle"><img src="./images/find.png" + height="16" width="16" alt="[+]" + title="show/hide quicksearch" /></span></h3> + <form action="#" method="get" accept-charset="utf-8" class="initially-hidden"> + <fieldset> + <legend>Quicksearch</legend> + <input type="text" name="quicksearch" value="" + class="quicksearch-field" /> + </fieldset> + </form> + + <ul class="link-list"> + + <li><a href="./Porter2.html">Porter2</a></li> + + <li><a href="./String.html">String</a></li> + + </ul> + <div id="no-class-search-results" style="display: none;">No matching classes.</div> + </div> + + + </div> + </div> + + <div id="documentation"> + <h1>porter2stemmer</h1> +<h2>The Porter 2 stemmer</h2> +<p> +This is the Porter 2 stemming algorithm, as described at <a +href="http://snowball.tartarus.org/algorithms/english/stemmer.html">snowball.tartarus.org/algorithms/english/stemmer.html</a> +The original paper is: +</p> +<p> +Porter, 1980, “An algorithm for suffix stripping”, +<em>Program</em>, Vol. 14, no. 3, pp 130-137 +</p> +<h2>Features of this implementation</h2> +<p> +This stemmer is written in pure Ruby, making it easy to modify for language +variants. For instance, the original Porter stemmer only works for +American English and does not recognise British English’s +’-ise’ as an alternate spelling of ’-ize’. This +implementation has been extended to handle correctly British English. +</p> +<p> +This stemmer also features a comprehensive test set of over 29,000 words, +taken from the <a +href="http://snowball.tartarus.org/algorithms/english/stemmer.html">Porter +2 stemmer website</a>. +</p> +<h2>Files</h2> +<p> +Constants for the stemmer are in the <a href="Porter2.html">Porter2</a> +module. +</p> +<p> +Procedures that implement the stemmer are added to the <a +href="String.html">String</a> class. +</p> +<p> +The stemmer algorithm is implemented in the <a +href="String.html#method-i-porter2_stem">String#porter2_stem</a> procedure. +</p> +<h2>Internationalisation</h2> +<p> +There isn’t much, as this is a stemmer that only works for English. +</p> +<p> +The <tt>gb_english</tt> flag to the various procedures allows the stemmer +to treat the British English ’-ise’ the same as the American +English ’-ize’. +</p> +<h2>Longest suffixes</h2> +<p> +Several places in the algorithm require matching the longest suffix of a +word. The regexp engine in Ruby 1.9 seems to handle alterntives in regexps +by finding the alternative that matches at the first position in the +string. As we’re only talking about suffixes, that first match is +also the longest suffix. If the regexp engine changes, this behaviour may +change and break the stemmer. +</p> +<h2>Usage</h2> +<p> +Call the <a +href="String.html#method-i-porter2_stem">String#porter2_stem</a> or <a +href="String.html#method-i-stem">String#stem</a> methods on a string to +return its stem +</p> +<pre> + "consistency".stem # => "consist" + "knitting".stem # => "knit" + "articulated".stem # => "articul" + "nationalize".stem # => "nation" + "nationalise".stem # => "nationalis" + "nationalise".stem(true) # => "nation" +</pre> +<h2>Author</h2> +<p> +The Porter 2 stemming algorithm was developed by <a +href="http://snowball.tartarus.org/algorithms/english/stemmer.html">Martin +Porter</a>. This implementation is by <a href="http://www.njae.me.uk">Neil +Smith</a>. +</p> +<h2>Contributing to porter2stemmer</h2> +<ul> +<li><p> +Check out the latest master to make sure the feature hasn’t been +implemented or the bug hasn’t been fixed yet +</p> +</li> +<li><p> +Check out the issue tracker to make sure someone already hasn’t +requested it and/or contributed it +</p> +</li> +<li><p> +Fork the project +</p> +</li> +<li><p> +Start a feature/bugfix branch +</p> +</li> +<li><p> +Commit and push until you are happy with your contribution +</p> +</li> +<li><p> +Make sure to add tests for it. This is important so I don’t break it +in a future version unintentionally. +</p> +</li> +<li><p> +Please try not to mess with the Rakefile, version, or history. If you want +to have your own version, or is otherwise necessary, that is fine, but +please isolate to its own commit so I can cherry-pick around it. +</p> +</li> +</ul> +<h2>Copyright</h2> +<p> +Copyright © 2011 Neil Smith. See LICENSE.txt for further details. +</p> + + </div> + + <div id="validator-badges"> + <p><small><a href="http://validator.w3.org/check/referer">[Validate]</a></small></p> + <p><small>Generated with the <a href="http://deveiate.org/projects/Darkfish-Rdoc/">Darkfish + Rdoc Generator</a> 1.1.6</small>.</p> + </div> +</body> +</html> +