Converting eBooks to Sony Reader format

2007/03/22

Since yesterday, I made nice progress in solving my issues with content creation for PRS500 and it’s readability. There are several ways how to proceed:

The simplest is to download Book Designer. It is free for non-commercial use and current version 5.0 Alpha does the job very well. It allows you to load source in text, HTML, Lit, PDF, PalmDoc (prd/prc), rb and few other formats and process them into native LRF format – plus few others I do not really care about. The result is nice, readable LRF file with three sizes, nicely formatted, with metada. As added benefit, because the author is Russian, the program does not assume that English alphabet is the only one in existence and allows to select encoding. The result is quite good – most of the extended characters from Czech/Slovak are there, some are missing and displayed as space (namely ř,ě,ľ …) but it is readable. What is maybe better option is that with English as language and default encoding, the software “downscales” the extended characters to closest English pairs: ř -> r,ě -> e – which results in familiar computer Czech/Slovak. I am very comfortable with option 2, and will work on getting correct font for #1.

If you want to read more about the program go here and here – as long as you can read Russian. I found out that even after 22 years of not using Russian, I can still reasonably well read and understand it …

The program is useful for creating Palmbooks as well as Microsoft Reader Lit book. I did not try that yet. User interface of Book Designer is not exactly Apple-made – extremely technical,  geekish – looking like designed  by engineer for engineers 🙂  – here is how it looks like.  But it is the functionality that counts. Thank you – whoever made this possible :-).

If you want actually understand how the LRF format works and how the book is formatted on very low level, read the format spec and then download the BBeBinder from Google Code. It is C# 2.0 project, which aims to create something similar that BookDesigner – but as opensource, GPL-ed application. It is very early version (0.2) but in the true spirit of opensource, it actually (mostly) works. I have downloaded it and looked inside the code. The solution contains BBeB encoding/decoding library and main program, which was nicely designed with extensibility in mind. Using plugins, it allows to add additional input data formats (currently works well for text files, some HTML and I had mixed results with others).

If both of my projects were not in C# space (which is causing me being slightly over-see-sharped at the moment), I would not mind volunteering few hours into this – to make sure that Central European encoding is handled OK :-).

Advertisements

Sony eBook Reader – software updated

2007/03/21

After few weeks, I have got back to actually using my PRS500 again and I have finished the books loaded on the device. When I have connected the reader, first the CONNECT software as well as the PRS500 downloaded the updates. The firmware in reader is up to version 1.0.02.0130 from 1.0.00.21081 and should bring longer battery life and stability improvements. It better does, because so far the battery life is nowhere in the proclaimed range of 7000 page turns … Time will show.

The desktop software update actually did improve user experience a lot – it went from “terrible” to “almost acceptable”. New version added full screen preview mode and the interaction with CONNECT is much more pleasant – it does not freeze any more, contains “Eject” button to disconnect the Reader and (although it may be subjective feeling) the communication with the device is much faster. The program looks more and more like iTunes 🙂 – but I have no problem with that …

Last thing to do is to verify whether Sony did not install some rootkit, trojan or another pest on my machine. Since the last debacle with rootkit DRM I simply do not trust them any more. Good news is that the update did not require reboot, which lessens the likelyhood of a rootkit …

I  have played a bit with formatting content for the reading. So far I am mostly using TXT books because there is no conversion required. To get best results try avoid hard line breaks – leave long lines – one line per paragraph and separate the paragraphs with empty line. TXT format can be presented in all 3 sizes and usually is quite acceptable. The problems of TXT based books is lack of ANY formatting (beyond paragraph), no metadata (author, year, tags), no images … If your book is not in English, you may have problems with extended characters. So far I have not found good encoding that would present the extended Slovak/Czech characters correctly.

To get any kind of formatting, you need to use PDF, RTF or native LRF formats (the LRF format is also known as BBeB – broadband eBook). The PDF files needs to be specially formatted to be readable on Reader – ideally with the font size 12 to 20 and page size about 8.8 x 13 cm. More detailed instructions on PDF formatting from Sony are here. With PDF, device will allow only two font sizes. The Feedbooks web site offers some preformatted books for Reader.

For RTF, the size or the page does not matter so much, what is important is the font size. Best results are with 16 to 20 points and device alllows three sizes. The reformatting can be done in Word or OpenOffice. Unfortunately, same issue with extended characters is present in RTF rendering as well.

To get the best results, you need to convert eBook from TXT, PDF or RTF to LRF format. The LRF books can have metadata, table of content and nicer formatting with 3 sizes. I started to evaluate few tools to create the LRF content – will get back to it.

Few more good resources on the Net: the discussion forum on MobileRead  as well as Wiki. If you are into Shakespeare, here are his works in PRS500 format



More on Sony Reader

2007/01/27

I have been living with my Sony eReader for almost a month now and managed to read about two full books. If not the time crush in biometric project, I would have probably read much more than just two books – nevertheless, I do have some real life experience with the device.

First – the claimed battery life of 7500 page turns is BS. I have observed real battery life about 15-20 hours, so unless you manage to turn about 500 pages per hour, it just does not add up. I have observed that battery was good for about 1 full book (about 400 letter pages, or about 900 “small” pages on reader), read over 2 weeks period, the battery dropped from full charge to one segment. It is possible that it depends on how reading style – if you are turning pages and reading continuously, it lasts longer. Browsing through books seems to consume considerable energy. I think that 15-20 hours reading time is more than enough – it will keep you busy on flight to Australia :-). Certainly huge improvement against PDA based readers, where the battery life is 4-5 hours max. My 2 years old PocketPC is now down to about 3.5 hrs. I am still using it because a) I have lots of books in PalmDoc and CHM formats and b) I like to read in bed …

I have looked at the content of the SD card as it is in file system. The Reader does not seem to do any conversion for the files involved – the copied PDF and RTF files are binary equal to the original files. The structure of the SD card is:

\Sony Reader\
\Sony Reader\books
\Sony Reader\database

The books folder contains all files copied into SD. The database contains single XML file cache.xml, which holds directory of the books. I do not know yet whether the file indeed a directory or what it says – a cache. Simple experiment to do is to copy few files and test of Reader will find them and updates the cache. The book metadata has following format as:

<text author="AUTHOR NAME" page="0" part="0" scale="0" sourceid="37" id="269">
date="Sat, 30 Dec 2006 22:42:12 GMT" mime="text/plain"
path="Sony Reader/books/FILENAME.txt" size="339623"
title="BOOK TITLE">
<layouts>
<layout part="0" scale="0">
...

I am not sure how the one or more encoded binary chunks – layouts (for various scales) work. The encoded jpeg -thumbnail is on the hand pretty obvious.

The source of the information appearing in the metadata depends on the source format. For PDF and RTF files, Connect software seems to access the document properties, for TXT files I did not figure it out yet.

So to get good content on the Reader requires

a) get the metadata right and
b) convert the content into best readable format.

Which is, unfortunately the proprietary format BBeB. The readibility of the BBeB is far the best, it offer nicer text, better sizing and overall much better reading experience. The good news is, that it is possible to create content in this format and first few tools are appearing on the net. I will try them out and post the results here as soon as I am done. Until that time, you can download free ebooks (Project Gutenberg and others) on Manybooks.net directly in the eReader format.

This site is very good source of information on the eReader and things around it.


Got the Sony reader :-)

2006/12/24

Despite Sony’s attempt to ignore non-US markets and leave Canadians in the cold (pun intended, even if it is still above zero), I have become a proud owner of the wonderful gadget for all eBooks fans. I still do not have it physically, so I cannot put it under the tree. It is coming next week, when my friend who now enjoys the vacation down in the south returns. He was not only very kind to buy one for me too, but he also wrote a wonderfull review which he is going to post on the Net. I will add a link as soon at it happens. Until that time, here is the summary, in his very own words:

“So in short, I am really impressed by the Sony eReader. Everything from case construction, functionality, file format support, screen resolution, and most importantly the actual readability of the unit all hit the target. You really have to see it to believe it. The only negative comments are the lag-times between screen refreshes, and the somewhat unintuitive menu navigation. As far as future functionality (wish list), I can imagine that full color support will be the next thing on the list, as well as searching and annotation of documents, and maybe even support for wireless connectivity for good measure.


Too many ebooks, too little time

2006/12/08

http://manybooks.net/. ‘nough said.