Tidied up the gem requirements and fixed the use of Bundler
[porter2stemmer.git] / README.rdoc
1 = porter2stemmer
2
3 ==The Porter 2 stemmer
4 This is the Porter 2 stemming algorithm, as described at
5 http://snowball.tartarus.org/algorithms/english/stemmer.html
6 The original paper is:
7
8 Porter, 1980, "An algorithm for suffix stripping", _Program_, Vol. 14, no. 3, pp 130-137
9
10 ==Features of this implementation
11 This stemmer is written in pure Ruby, making it easy to modify for language variants.
12 For instance, the original Porter stemmer only works for American English and does
13 not recognise British English's '-ise' as an alternate spelling of '-ize'. This
14 implementation has been extended to handle correctly British English.
15
16 This stemmer also features a comprehensive test set of over 29,000 words, taken from the
17 {Porter 2 stemmer website}[http://snowball.tartarus.org/algorithms/english/stemmer.html].
18
19 ==Files
20 Constants for the stemmer are in the Porter2 module.
21
22 Procedures that implement the stemmer are added to the String class.
23
24 The stemmer algorithm is implemented in the String#porter2_stem procedure.
25
26 ==Internationalisation
27 There isn't much, as this is a stemmer that only works for English.
28
29 The +gb_english+ flag to the various procedures allows the stemmer to treat the British
30 English '-ise' the same as the American English '-ize'.
31
32 ==Longest suffixes
33 Several places in the algorithm require matching the longest suffix of a word. The
34 regexp engine in Ruby 1.9 seems to handle alterntives in regexps by finding the
35 alternative that matches at the first position in the string. As we're only talking
36 about suffixes, that first match is also the longest suffix. If the regexp engine changes,
37 this behaviour may change and break the stemmer.
38
39 ==Usage
40 Call the String#porter2_stem or String#stem methods on a string to return its stem
41 "consistency".stem # => "consist"
42 "knitting".stem # => "knit"
43 "articulated".stem # => "articul"
44 "nationalize".stem # => "nation"
45 "nationalise".stem # => "nationalis"
46 "nationalise".stem(true) # => "nation"
47
48 ==Author
49 The Porter 2 stemming algorithm was developed by
50 {Martin Porter}[http://snowball.tartarus.org/algorithms/english/stemmer.html].
51 This implementation is by {Neil Smith}[http://www.njae.me.uk].
52
53 == Contributing to porter2stemmer
54
55 * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
56 * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
57 * Fork the project
58 * Start a feature/bugfix branch
59 * Commit and push until you are happy with your contribution
60 * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
61 * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
62
63 == Copyright
64
65 Copyright (c) 2011 Neil Smith. See LICENSE.txt for
66 further details.
67