Pages

Converting pdf to text in linux

The contents of a pdf file can be converted to a simple text file using the tool "pdftotext". The format of using pdftotext is



Let us say we have a pdf file by the name temp.pdf, of which we want to convert to text the page numbers 1 to 4 and create a text file by the name output.txt .



The file output.txt will have all the text contents of the temp.pdf, pages 1 to 4 but will not contain the images. The formatting by default in not maintained in the text file.

To be able to maintain the formatting also we need to pass the option -layout.



Now the text file will have almost the same format as the pdf file.

2 comments:

  1. I am using pdf to text in c#, very easy to use, the conversion is very accurate.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete