Algorithm::RabinKarp

Rabin-Karp streaming hash
Download

Algorithm::RabinKarp Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Norman Nunley, Jr
  • Publisher web site:
  • http://search.cpan.org/~nnunley/

Algorithm::RabinKarp Tags


Algorithm::RabinKarp Description

Rabin-Karp streaming hash Algorithm::RabinKarp is an implementation of Rabin and Karp's streaming hash, as described in "Winnowing: Local Algorithms for Document Fingerprinting" by Schleimer, Wilkerson, and Aiken. Following the suggestion of Schleimer, I am using their second equation: $H ] = (( $H ] - $c ** $k ) + $c ) * $kThe results of this hash encodes information about the next k values in the stream (hense k-gram.) This means for any given stream of length n integer values (or characters), you will get back n - k + 1 hash values.For best results, you will want to create a code generator that filters your data to remove all unnecessary information. For example, in a large english document, you should probably remove all white space, as well as removing all capitalization.SYNOPSIS my $text = "A do run run run, a do run run"; my $kgram = Algorithm::RabinKarp->new($window, $text);or my $kgram2 = Algorithm::RabinKarp->new($window, $fh);or my $kgram3 = Algorithm::RabinKarp->new($window, sub { ... return $num, $position; }); my ($hash, $start_position, $end_position) = $kgram->next; my @values = $kgram->values; my %occurances; # a dictionary of all kgrams. while (my ($hash, @pos) = @{shift @values}) { push @{$occurances{$hash}}, \@pos; } my $needle = Algorithm::RabinKarp->new(6, "needle"); open my $fh, 'new(6, $fh); my $needle_hash = $needle->next; while (my ($hay_hash, @pos) = $haystack->next) { warn "Possible match for 'needle' at @pos" if $needle_hash eq $hay_hash; } Requirements: · Perl


Algorithm::RabinKarp Related Software