Hi!

This post follow a previous one on how to read PDF files from the command line by using poppler

The method used for PDF was to convert them to text or html and then to pipe the output to a pager, browser or editor.

The same method can be used for EPUB files by using a software called epub2text (github page).
But a better way is not to convert them but rather extract the content of an EPUB file by using unzip.

In fact, as written on Wikipedia:

An EPUB file is a ZIP archive that contains, in effect, a website—including HTML files, images, CSS style sheets, and other assets. It also contains metadata.

Therefore the conversion process can be skipped.

To try these commands out I need a EPUB file, Gutenberg.org has plenty of them, I’m going to download Plato – The Republic

wget https://www.gutenberg.org/ebooks/1497.epub.images -O plato_the_republic.epub

You find other famous books here:


Unzip

The commands are quite simple, to list what’s inside an archive run:

unzip -l plato_the_republic.epub

The output should look like this:

At this point I can pipe the content to stdout so that it appears on the command line.
Now the best way to render this text is with a browser, images and other files are printed as text but html will be properly rendered by the browser.

unzip -p plato_the_republic.epub|w3m

In this case there are no images at all, not even the cover, thus the text looks nice with just some strange stuff at the beginning:

However if your file has images a huge wall of incomprehensible text will appear before the html file that contains the actual book’s words.

Try it for yourself

wget https://www.gutenberg.org/ebooks/74.epub.images -O mark_twain_the_adventures_of_tom_sawyer.epub
unzip -p mark_twain_the_adventures_of_tom_sawyer.epub |w3m

To avoid this you can either output the html file extension only:

unzip -l mark_twain_the_adventures_of_tom_sawyer.epub "*.h*"

Or exclude the unwanted files from being printed to stdout:

unzip -l mark_twain_the_adventures_of_tom_sawyer.epub -x "*.j*"

epub2txt

This works similar to what showed on the previous post about PDF files, check that page to see some pagers, editors or browser available to display the text.

To use it:

epub2txt plato_the_republic.epub |less

TA SALÜDE