The Magazine is a PDF File
You may like it or not, but should your software house be committed to develop a magazine iPad app, the magazine will be with high probability given to you as a PDF file. As there is no way to “escape” from it, at the end you will need to develop your own pdf reader or integrate some free or commercial external library.
The reason why pdf is still the dominant format in the e-publishing world is clear: most of the publishers are porting their existing printed publications on the iPad, and for obvious budget reasons they want to reuse all the investment done in the creation of their issues. You will not be able to escape from the pdf format dictatorship with the exception of two cases: the publication is brand new and only digital, so there are no previous investments to drive the final choice, or the publisher has large budgets and/or is a strong user experience (UX) believer and accepts to allocate the extra budget to recreate a different format for its publications. Both cases are not so uncommon with those publishers that already did the effort to bring their products to the web (with the notable exception of those that did it in Flash!), but the large part of the small and medium publishers will
still be locked to the pdf format.
Unfortunately the pdf is not the best way to port a magazine in the iPad. And this for several reasons:
• printed magazines page size is usually larger than the iPad screen: this means that when the page fits to the screen, all characters appear smaller and then something readable in the printed paper could become unreadable without zoom; but zoom is not always efficient and in particular it’s not loved by readers that may lose their “orientation” inside the page.
• printed magazines pages have not the same aspect ratio of the iPad screen: this means that a page that fits in the screen will be bordered by top/bottom or left/right empty stripes.
• often printed page layouts are optimized for facing pages, e.g. a panorama picture which is spread between two pages; when the device is kept in portrait orientation, these graphical details will be lost, instead if the device is kept in landscape you will be able to appreciate the two-pages layout but characters will be too small to be read comfortably.
• as these files are not optimized for digital, normally the outlines (table of contents) and annotations (links to pages or external resources) are not exported; this means that even if your pdf reader code is aware of this information, in the majority of cases it is not available and then you will need to define a different way to provide it.
• the official pdf format supports multimedia content; unfortunately the iOS is not able to manage it, so all interactive content must be provided outside the pdf file.
The page rendering is achieved in iOS (and OSX too) through the Quartz 2D API, provided within the Core Graphics framework (shorted with CG). Quartz 2D is the two-dimensional drawing engine on which are based many (but not all) of the drawing capabilities of iOS. The
PDF API is a subset of the huge CG API. This API is “old fashioned” and is not based on Objective-C but on pure
old C; besides all memory management rules will follow the Core Foundation (CF) rules which are different from Obj-C one: this means that special attention must be provided to avoid memory leaks, as each PDF page manipulation can take several megabytes and leaks will easily trigger the memory watchdog, thus force quitting your app.
be immediate to render a PDF page, by following these basic steps:
1. get the CG reference to the pdf page to be drawn;
2. get the current graphics context for the view that will contain the page;
3. instruct Quartz to draw the pdf page to the context.
As you can see, apart the required steps needed by the drawing model of Quartz, the full rendering is accomplished by the system and you don’t need to have any knowledge of the data format of a pdf file. So for you the pdf rendering processor is just a black box, and this is clear when you
see that all CG data structures are in fact opaque and their inner contents can be accessed only via API.
extreme zoom-ins), it is impossible to render the full zoomed page in a canvas much higher than the device screen:
here we have pixels, not vectors, and it would be immediate to crash the app because all the memory has gone away for one page only. So you will be forced to introduce tiling techniques that will limit the effective rendering to the visible part of the page, not always an easy task.
More difficult is document parsing: this is required if you want to extract outlines, annotations, do some text search and highlight. In such case apart a few meta data extraction functions, what the API gives you is a set of functions that will allow you to explore the data structures inside the document. You will not be able to get any information from the file if you don’t explore the data tree correctly and if you don’t follow the specs of the PDF document.
This is worsened by the many versions the PDF specs got in the years and by the fact that many publishers still use old software that exports the content in the old formats.
I have developed a general purpose PDF explorer, this was part of a commitment of a client that asked me to develop a general purpose PDF reader; but as it is really hard to apply all the specs of the PDF official reference, my suggestion is to concentrate on the most used features and test them with many documents. As I said before, CG navigates the data tree but it doesn’t interpret it for us!
The last section of this part, long explanation but required given the importance of the topic, is how to provide multimedia content on top of a PDF file: all in all the iPad is a so versatile device that we cannot limit ourselves to simple page rendering. By adding extra content to the printed page you can leverage the device characteristics and still taking benefit on the investment done in the magazine creation.
There are many reasons to justify this choice: e.g. a printed advertisement can offer a video instead of a static picture, or a printed link to a web page can be replaced by an active link to a web view, or finally we can show the current weather using an html5 widget. As I previously said it is not recommended to introduce all this content inside the pdf file: it will not be rendered by Quartz and you will still be forced to traverse the data tree to extract the CG object reference for further manipulation. Finally not all publishers are aware of these functionalities or their digital publishing software is too old to fully support them.
So the best solution is based on the “overlay technique”.
This methodology consists in representing the pages in two layers:
• the bottom layer (“rendering layer”) will contain the PDF rendering, so it will contain the bitmap image of the page;
• the top layer (“overlay layer”) will draw all overlays and is sensible to user touches.
The overlay layer is typically made of UIKit components, so we’ll add a UIWebView for html widgets, we’ll introduce a UIScrollView to display a gallery of sliding images, or we’ll add a Media Player view for video execution. Typically the overlay descriptions are provided on a separate file, e.g. an xml, json or plist, and they will be packed together with the pdf file and all assets (movies, images, html files, music
files) in a zip file.
The app will download the zip file, will unpack it and then for each page it will use the pdf page to fill the rendering layer, and the overlay information associated to that page to build the overlay layer.
Note that this technique can be applied also in the other rendering techniques we’ll talk about in the next paragraphs, in such case it allows to overcome many of the pdf format limitations. The major requirement for the deve loper is to define a suitable format, follow all page zoom
and rotations with a corresponding overlay transformation and finally provide the publisher with the instruments and
guidelines required to easily create such overlays.