How To: read PDF files from the command line

Hi!

On a previous post I showed a fantastic Android application called Termux

The app is basically a mini-Linux command-line distro, full of software and things to do.
I use it mostly to write, I connect my usb keyboard and I magically have all the almighty terminal power on my SmartPhone.

With the same text editors you can also read text, actually for that specific matter there are plenty of pagers too.

But these software can’t read all file extensions out of the box, one of these extension are PDF files. So how to read PDF files from the command line?

There are 2 ways to achieve this task, both have the original pdf file converted in another format and both these tools are part of the poppler package:

pdftotext converts a PDF file to a simple text file
pdftohtml convert PDF to html

pdftotext can be used with many pagers, editors and browsers as does pdftohtml, however the latter works best with browsers.

As both software can get a pdf file from a URL I’ll use a PDF from the internet (George Orwell 1984, under public domain in Australia) so that you can copy and paste all these command to get the same result as I do.

The software used:

Converter: pdftotext, pdftohtml (both from the poppler package as said)
Pagers: more, less, most, vimpager
Browsers: lynx, w3m, elinks
Editors: vim, micro, ne, vile, neovim

All these software works out of the box, anything else not listed here may behave in a strange way.

pdftotext

use it this way:

pdftotext file.pdf -

This way the output is redirected to stdout and printed on your terminal. Try it out yourself:

pdftotext http://www.planetebook.com/ebooks/1984.pdf -

Press Ctrl+C to interrupt the command.

As you can see it works! At this point it can be piped to another application.

Pagers:

more

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |more

less

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |less

You can also color it up by using tput (also works with more)

tput setaf 3; tput setab 4; pdftotext http://www.planetebook.com/ebooks/1984.pdf - |less

most

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |most

vimpager

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |vimpager

Browsers:

Lynx

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |lynx -stdin

w3m

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |w3m

Elinks

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |elinks

Press \ to toggle source or rendered document, it may render without new lines (press v on w3m).

Editors:

micro

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |micro

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |ne

vile

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |vile

vim

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |vim -

neovim

pdftotext http://www.planetebook.com/ebooks/1984.pdf - |nvim -

pdftohtml

Use it this way:

pdftohtml -stdout -i file.pdf

Always remember the -i option, if you don’t all the pdf images will be saved on your storage to later be accessible from the html (this means tens or even hundreds of images saved on the current directory).

As said this works and looks great with Browsers because they render html (remember to press either \ or v to switch to rendered mode), therefore I’m listing only the browsers here. Anyway the process is the same as above.

Browsers:

lynx

pdftohtml -stdout -i http://www.planetebook.com/ebooks/1984.pdf |lynx -stdin

w3m

pdftohtml -stdout -i http://www.planetebook.com/ebooks/1984.pdf |w3m

elinks

pdftohtml -stdout -i http://www.planetebook.com/ebooks/1984.pdf |elinks

That’s basically how to use these 2 poppler components.
On a future post I’m going to show you how to do the same with epub files.

TA SALÜDE

	manero on Gallium Hud · the almost defin…
	linanw on How to install ffmpeg on Mac O…
	Michael on Gallium Hud · the almost defin…
	Lyøn Panther on How To: read PDF files from th…
	Tizer on Stream youtube videos from Sma…
	BK on How To: read EPUB files from t…
	HanFox on Gallium Hud · the almost defin…
	Ik on Manage Screen brightness on Li…
	Richárd István Thier on Getting Wine MIDI to work on…
	abdullah on How to install Netbeans on Mac…
	wilwad on How to install ffmpeg on Mac O…
	Albin Larsson on Getting Wine MIDI to work on…
	manero on Introduzione
	manero on MuPDF ·· apply a patch for cus…
	Tiffany on PC#14 – coming soon…

manero's blog

How To: read PDF files from the command line

pdftotext

Pagers:

Browsers:

Editors:

pdftohtml

Browsers:

One thought on “How To: read PDF files from the command line”

Leave a comment Cancel reply

pdftotext

Pagers:

Browsers:

Editors:

pdftohtml

Browsers:

Share this:

One thought on “How To: read PDF files from the command line”

Leave a comment Cancel reply