Document Viewer Integration in Plone

by Nathan Van Gheem last modified Apr 29, 2012 11:11 AM
Presenting the collective.documentviewer package for plone that allows you to display PDFs and other office documents.
Post

collective.documentviewer integrates the great New York Times Document Viewer into Plone.

Features

  • OCR
  • Searchable on OCR text
  • works with many different document types
  • plone.app.async integration with task monitor
  • configuration options
  • PDF Group view for display groups of PDFs

Installation

There is an extensive set of system installation requirements that you must install in order for document viewer to work correctly. Additionally, it is recommended that you install and setup plone.app.async along with this package.

How it works

The docsplit tool is used to generate images and text files for each PDF. The viewer is simply just a viewer of images and text files so it's easy to style and customize. The downside of this is that, for every PDF page, 4 files are generated. For sites with a lot of large PDFs, even with blog storage, it's a lot of extra data the zodb has to manage. That is why basic file storage is also available.

The OCR text is also indexed locally with the PDF(using repoze.catalog) and globally with the plone catalog. This is done because a custom index is required for document viewer in order to search text in the PDF.

Configuration Options

After product activation, there will be a control panel item, "Document Viewer Settings."

  • Image sizes -- Customize the size of images generated for the viewer
  • Storage type -- Allows you to setup file storage for your PDF data
  • OCR -- by default, this is off because it can take quite some time to OCR documents(and with no plone.app.async installed, some users could end up being very unhappy)
  • Detect Text -- detect if text is already found on PDF, if so, do not OCR
  • Auto select layout -- automatically, for PDF and any enabled document types, select the document viewer layout
  • Auto layout type -- the types of files that should also be enabled for the document viewer layout.

A Tour

With screenshots, I will go over the various features.

Settings

Make sure to customize any settings before you start using it. Also, make sure to activate any office formats you'd like to be able to use:

ScreenShot20120429at5.29.39AM.png

The Viewer

Document Viewer Viewer

Document Viewer Pages

Document Viewer Search

Group View

A view is also added for folders and collections to display groups of PDFs and search within the groups.

Document Viewer Group View

Converting Office Documents

Office documents are also then able to be converted for the viewer.

Document Viewer Excel Doc

Async Integration

How plone.app.async integration is managed.

You can view the current status of your conversion async task by clicking the "Document Viewer Convert" button at any time. Or, if it isn't converting, you can reinitiate conversion there.

Where you can initiate conversions and see status.

You can also monitor all tasks currently in the queue:

Document Viewer Async tasks

What's left

  • It's not internationalized at all. Apologies to non-english plone users.
  • Better mobile viewer
  • If you're converting a lot of documents at once with plone.app.async, there seems to be a issue with conflict errors. Unfortunately, this could cause your document to be converted more than once..
comments powered by Disqus

Navigation