Read Scientific Papers on Your Kindle
Reading scientific on your Kindle (or other eBook reader) usually sucks. The text is usually only available as PDF or PS files and formatted in a way that is meant for printing in A4, or US Letter. A two-column layout is also very common, which further complicates things. In this post I show you a simple way to get these papers on your eBook reader for comfortable reading.
Step 1: Preprocessing with BRISS
First we will preprocess the file a bit to make the next step easier / more
successful. Using the cool little BRISS
tool we will crop out unnecessary parts and only leave the main text area. The
idea is to get rid of line numbers, notes in the margin (e.g. the arXiv line in
our test document), etc.
BRISS
is a graphical tool. You can use the menu to load the PDF or
just start it from the terminal: briss Text\ Understanding\ from\
Scratch.pdf
You will be prompted to enter the range of pages that will
be analyzed to find the main text body. Usually it’s fine to just leave it at
the default. BRISS
now tries to find the main text area.
Tweak the boxes until they only cover the relevant text and crop the PDF by
clicking Action > Crop PDF
. We now have a PDF document with all
possibly misleading fluff cut out and can move on to the next step.
Step 2: Optimizing with k2pdfopt
To optimize the cropped PDF for our Kindle we’ll use the
k2pdfopt
tool. It has a
plethora of options
suiting many needs, but the default modes usually work fine.
./k2pdfopt -ppgs -dev kpw -mode 2col Text\ Understanding\ from\ Scratch_cropped.pdf
And that’s it, now you have a Kindle optimized PDF!
Warning the default modes include the -n
flag, which will
enable native PDF output. This is the preferable mode since it leads to
smaller, better files because it uses native PDF instructions instead of
rendering the pages to bitmaps. However, (at least the 1st gen Paperwhite) may
crash opening files generated with this option, because it runs out of memory.
This forced me to factory reset my device a couple of times during first
experiments.
Solution either disable native output by specifying -n-
leading to bigger, uglier files, or install Ghostscript (if you haven’t
already) and include the -ppgs
option. This will post process the file using Ghostscript and fix the issue.
Then head over to Github and open an Issue please!