Heritrix

Open source, extensible, web-scale, and archival-quality web crawler
Download

Heritrix Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Price:
  • FREE
  • Publisher Name:
  • Heritrix Team
  • Publisher web site:
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 39.1 MB

Heritrix Tags


Heritrix Description

Open source, extensible, web-scale, and archival-quality web crawler Heritrix is an open source flexible, robust, extensible, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accessible content.Heritrix (sometimes spelled heretrix, or misspelled or mis-said as heratrix/heritix/ heretix/heratix) is an archaic word for heiress (woman who inherits). What's New in This Release: Bug: · List of classes is not present in select menu for DecideRules · WARC metadata records should declare MIME-type 'application/warc-fields' (rather than 'text/anvl') · bottleneck in StatisticsTracker.saveSourceStats? · META http-equiv refresh content containing only a number misinterpreted as a URI Improvement: · ${HOSTNAME} in arc suffix is only replaced completely · update to BDB-JE 3.3.74 · Update 'public suffix list' (effective_tld_names.dat)


Heritrix Related Software