PDFMiner

PDF parser and interpreter written entirely in Python
Download

PDFMiner Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • Yusuke Shinyama
  • Publisher web site:
  • http://www.unixuser.org/~euske/
  • Operating Systems:
  • Mac OS X 10.0 or later
  • File Size:
  • 1.8 MB

PDFMiner Tags


PDFMiner Description

PDF parser and interpreter written entirely in Python PDFMiner is a suite of programs that aims to help analyzing text data from PDF documents. It includes a PDF parser, a PDF renderer (though only rendering text is supported for now), and a couple of nice tools to extract texts. Unlike other PDF-related tools, PDFMiner allows you to obtain the exact location of texts in a page, as well as other layout information such as font name or font size, which could be useful for analyzing the document. Here are some key features of "PDFMiner": · Written entirely in Python. · PDF-1.7 specification support. · Non-ASCII languages and vertical writing scripts support. · Various font types (Type1, TrueType, Type3, and CID) support. · Basic encryption (RC4). · PDF to HTML conversion (with a sample converter web app). · Outline (TOC) extraction. · Tagged contents extraction. Requirements: · Python 2.5 or later What's New in This Release: · Fixed rectangle handling. Able to extract image boundaries.


PDFMiner Related Software