Jump to content

An accessible conversion solution for PDFs?


melat0nin

Recommended Posts

Hi all

 

I work at a governmental institution (The Scottish Law Commission - www.scotlawcom.gov.uk) where we regularly consult the public and others on our work.  We produce PDFs of our consultations papers which have nice formatting etc.

 

At present when we consult we use shitty Word questionnaires which people fill in and email back, and which need manual processing to get any useful information/data from.

 

I'm looking to move to an online survey system which will streamline matters hopefully.

 

Here's my question - I'd like users to be able to have a copy of the consultation paper with them when they are doing the survey, so they can read the relevant parts when answering the questions.  Not everyone gets a paper copy so some will have to use an electronic copy.  I don't want to force people to use PDF and I'm trying to work out a better way of making this happen.

 

Our budget is tiny so I'm looking for something free if possible.

 

None of the DOC-to-HTML converters I've tried give perfect results (Word's HTML is awful, OOo has nicer HTML but the file sizes are still massive, neither gives a carbon copy of the PDF output, etc etc...).

 

My idea is that it would be cool to have the PDF split into images (PNG, presumably), which would be direct facsimiles of the PDF pages, so formatting would not be affected.  Then, if these were inserted into an HTML page with bookmarks at each image file (i.e. page), the survey software could link directly to that relevant page.

 

I'm thinking something like www.url.com/phptopng.php?file=/downloads/paper1.pdf#page10

 

Does anyone know of any scripts that can achieve this? I've looked through Hotscripts.com and various others but come up with nothing workable.  I thought of Scribd but that involves Flash which is just replacing one plugin (PDF) with another - ideally I want it to be as open and accessible as possible.

 

Any thoughts would be much appreciated :)

Link to comment
Share on other sites

Why not incorporate whatever is in the PDF into the survey system instead of keep focusing on the PDF?

 

Thanks for the reply.  The reason is there are potentially large chunks of text/discussion (potentially pages) for the topic which a given question relates to, so it's not really practical.  Plus this would require duplicating paragraph numbering and footnoting etc to ensure parity with the original document, which would be a nightmare!

Link to comment
Share on other sites

What you could do is to create a system where you enter the content into. From there you could export it as a PDF (dynamically create it) and send it out to people, or offer people to download it. During the survey you could also link to relevant pages that dynamically fetch the information that was entered into a database and show it.

 

Doing that you focus on storing the raw content, which can be used for a variety of purposes in differing contexts. It would be technically easier to implement instead of having it all revolve around a PDF which was generated in some other way (e.g. via MS Word or whatever). Without knowing how you currently do, I also think it would be administratively easier.

Link to comment
Share on other sites

What you could do is to create a system where you enter the content into. From there you could export it as a PDF (dynamically create it) and send it out to people, or offer people to download it. During the survey you could also link to relevant pages that dynamically fetch the information that was entered into a database and show it.

 

Doing that you focus on storing the raw content, which can be used for a variety of purposes in differing contexts. It would be technically easier to implement instead of having it all revolve around a PDF which was generated in some other way (e.g. via MS Word or whatever). Without knowing how you currently do, I also think it would be administratively easier.

 

That is a really interesting proposal.  Unfortunately it's quite a technically-illiterate workplace and everything is done in Word with templates and whatnot, and changing the workflow to that extent would cause uproar.

 

It's something we'll have to think about.  Converting to HTML in OOo might be the only workable solution as it produces the highest quality output of everything I've tried.

Link to comment
Share on other sites

The idea is that the entire infrastructure will focus on the content instead of the PDF file, which is just one representation of the content. So you move to a kind of higher abstraction level.

 

I don't know if you're a programmer, or just an employee they sent out to do some research. I'll assume programmer.

 

So in the database you may have some tables called "surveys", "questions" and "content" (and other tables that are irrelevant right now). There is a has many relationship between a survey and some questions (1:n), and there is a one-to-one relationship between a question and content object.

 

The content objects would have an output strategy. This allows you to abstract the representation to an arbitrary format (assuming someone implemented a strategy for that format obviously). The kind of strategies we would have here is an HTML strategy for output in a browser, and a PDF strategy for exporting. Of course just one content object would not be sufficient for the entire PDF consultation paper, so you could make a composite content object that's simply a container of content objects. Because it's a composite it would be allowed to still use the strategies, but it can also itself allow for creation of e.g. a table of contents. The composite would then represent the survey in its entirety while the leafs represent a single question.

 

This allows for a very flexible system that can have multiple output methods, and it allows you to represent the consultation paper either partially or in its entirety.

 

It of course would require some staff retraining, but while the system might seem complex it shouldn't be complex from an end user perspective.

 

 

References:

http://en.wikipedia.org/wiki/Strategy_pattern

http://en.wikipedia.org/wiki/Composite_pattern

http://en.wikipedia.org/wiki/Object_composition

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.