Things on a Heap

|
A Collection of Programming Ramblings by chjdev

Read Scientific Papers on Your Kindle

Reading scientific on your Kindle (or other eBook reader) usually sucks. The text is usually only available as PDF or PS files and formatted in a way that is meant for printing in A4, or US Letter. A two-column layout is also very common, which further complicates things. In this post I show you a simple way to get these papers on your eBook reader for comfortable reading.

Step 1: Preprocessing with BRISS

First we will preprocess the file a bit to make the next step easier / more successful. Using the cool little BRISS tool we will crop out unnecessary parts and only leave the main text area. The idea is to get rid of line numbers, notes in the margin (e.g. the arXiv line in our test document), etc.

BRISS is a graphical tool. You can use the menu to load the PDF or just start it from the terminal: briss Text\ Understanding\ from\ Scratch.pdf You will be prompted to enter the range of pages that will be analyzed to find the main text body. Usually it’s fine to just leave it at the default. BRISS now tries to find the main text area.

Tweak the boxes until they only cover the relevant text and crop the PDF by clicking Action > Crop PDF. We now have a PDF document with all possibly misleading fluff cut out and can move on to the next step.

Step 2: Optimizing with k2pdfopt

To optimize the cropped PDF for our Kindle we’ll use the k2pdfopt tool. It has a plethora of options suiting many needs, but the default modes usually work fine.

./k2pdfopt -ppgs -dev kpw -mode 2col Text\ Understanding\ from\ Scratch_cropped.pdf

And that’s it, now you have a Kindle optimized PDF!

Warning the default modes include the -n flag, which will enable native PDF output. This is the preferable mode since it leads to smaller, better files because it uses native PDF instructions instead of rendering the pages to bitmaps. However, (at least the 1st gen Paperwhite) may crash opening files generated with this option, because it runs out of memory. This forced me to factory reset my device a couple of times during first experiments.

Solution either disable native output by specifying -n- leading to bigger, uglier files, or install Ghostscript (if you haven’t already) and include the -ppgs option. This will post process the file using Ghostscript and fix the issue.

You have a question or found an issue?
Then head over to Github and open an Issue please!