PDFs and Data
PDFs and Data: making PDFs play nicely with data and open data.
PDFs are the world's standard for documents that people can trust. They look great, they preserve design, and they print everywhere.
But PDFs can do much more than that. They can contain data, maps, links, and files. Some contain videos, and animations. And yet they almost never do. When was the last time you used a PDF like that?
We've been mentioning PDFs to people who work with data for the last few months. The responses are often negative. We understand. We too have retyped data from an embedded table, traced points from an embedded graph, removed line breaks from copied paragraphs of text, and struggled to follow links or get back to the original source of a document.
But we've also had some really positive responses. For many people, PDFs are the document that they trust to make sure their presentation always looks the same, that their documents always have the same content on the same number of pages, and that the files in their archives can be trusted and readable well into the future.
Because we know how important PDFs are to a lot of people we recently kicked off a project with Adobe. We want to learn and share how people are using features that are already in the open PDF standard in ways that play much more nicely with data.
Documents still matter
For many challenges, large databases, APIs, and linked data are the answer. But they are not the answer to every question, and they are rarely the whole answer to any single question.
Documents are still printed, posted on noticeboards, submitted to courts and governments, and kept for archival purposes. They remain the trusted version of most of the documents that make the world work today. A file stored on an archival DVD will last longer, and change less, than almost all data portals in existence today. A PDF/A file is more likely to be openable and retain its formatting than almost any alternative.
But if we can't reliably extract text, an image, or a graph, or a table, from a PDF today, what chance will we have in the future?
Tell us what you think
So we're looking for examples of how people work well with PDFs and data. We want to document and share things that work well. And we want to document and share the barriers that there are to change too.
We've already spoken to a lot of people, and we've set up a W3C Community Group as a place for global discussions. We're currently working on a showcase project to show how PDFs and data can play a much better role in England & Wales' planning system. We'll be working aloud and sharing what we find as we go.
But for now we want to hear from even more of you.
- What do you like about PDFs?
- What are the best examples and use cases you've seen?
- What do you wish was better?
- Where do you see the future of the format? How can we help?
Join the Discussion
We've established the PDF Open Data Community Group to be the forum for PDF and data work. We chose to co-ordinate this activity within the W3C as an open forum for communities to come together.
By working in the open, and allowing anyone to contribute, we can ensure the methods we discover have broad applicability.
ODI Leeds Head of Data