HeritrixOpen source, extensible, web-scale, and archival-quality web crawler | |
Download |
Heritrix Ranking & Summary
Advertisement
- License:
- GPL
- Price:
- FREE
- Publisher Name:
- Heritrix Team
- Publisher web site:
- Operating Systems:
- Mac OS X
- File Size:
- 39.1 MB
Heritrix Tags
Heritrix Description
Open source, extensible, web-scale, and archival-quality web crawler Heritrix is an open source flexible, robust, extensible, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accessible content.Heritrix (sometimes spelled heretrix, or misspelled or mis-said as heratrix/heritix/ heretix/heratix) is an archaic word for heiress (woman who inherits). What's New in This Release: Bug: · List of classes is not present in select menu for DecideRules · WARC metadata records should declare MIME-type 'application/warc-fields' (rather than 'text/anvl') · bottleneck in StatisticsTracker.saveSourceStats? · META http-equiv refresh content containing only a number misinterpreted as a URI Improvement: · ${HOSTNAME} in arc suffix is only replaced completely · update to BDB-JE 3.3.74 · Update 'public suffix list' (effective_tld_names.dat)
Heritrix Related Software