BETA
This is a BETA experience. You may opt-out by clicking here
Edit Story

Can Blockchain Solve Your Document And Digital Signature Headaches?

Oracle

The purchase order looks legitimate, yet does it have all the proper approvals? Many lawyers reviewed this draft contract so is this the latest version? Can we prove that this essential document hasn’t been tampered with, before I sign it? Can we prove that these two versions of a document are absolutely identical?

Blockchain might be able to help solve these kinds of everyday trust issues related to documents, especially when they are PDFs—data files created using the Portable Document Format. Blockchain technology is best known for securing financial transactions, including powering new financial instruments such as Bitcoin. But blockchain’s ability to increase trust will likely find enterprise use cases solving common, non-financial information exchanges like these documents use.

Joris Schellekens, a software engineer and PDF expert at iText Software in Ghent, Belgium, recently presented his ideas for blockchain-supported documents at Oracle Code Los Angeles. Oracle Code is a series of free events around the world created to bring developers together to share fresh thinking and collaborate on ideas like these.

The PDF’s Power and Limitations

The PDF file format was created in the early 1990s by Adobe Systems, as a way to share richly formatted documents whose visual layout, text, and graphics would look the same, no matter which software created them or where they were viewed or printed. The PDF specification became an international standard in 2008.

Early on, Adobe and other companies implemented security features into PDF files, including password protection, encryption, and digital signatures. In theory, the digital signatures should be able to prove who created, or at least who encrypted, a PDF document.

However, depending on the hashing algorithm used, it’s not so difficult to subvert those protections to, for example, change a date/time stamp, or even the document content, says Schellekens. His company, iText Software, markets a software development kit and APIs for creating and manipulating PDFs.

“The PDF specification contains the concept of an ID tuple,” or an immutable sequence of data, says Schellekens. “This ID tuple contains timestamps for when the file was created and when it was revised. However, the PDF spec is vague about how to implement these when creating the PDF.”

Even in the case of an unaltered PDF, the protections apply to the entire document, not to various parts of it. Consider a document that must be signed by multiple parties. Since not all certificate authorities store their private keys with equal vigilance, you might lack confidence about who really modified the document (e.g. signed it), at which times, and in which order. Or, you might not be confident that there were no modifications before or after someone signed it.

A related challenge: Signatures to a digital document generally must be made serially, one at a time; the PDF specification doesn’t allow for a document to be signed in parallel by several people (as is common with contract reviews and signatures) and then merged together.

Blockchain has the potential to solve such document problems, and several others besides, For example, the Port of Antwerp is testing PDF files for digitized documents and workflow for receiving ocean cargo containers, and for documenting who has authorized access to those containers. Those records are stored in a blockchain, and if cargo is stolen, the port can determine who had the most recent access to the container record.

These kinds of use cases will depend on an enterprise-grade, production-ready blockchain infrastructure—in particular, a cloud-based system to help customers quickly experiment with use cases and deploy the ones that work into production. Oracle is planning to release a cloud-based platform, Oracle Blockchain Cloud Service, with an advanced blockchain service for enterprises and public sector organizations.

Blockchain: An Immutable Record

Blockchain is a relatively new system for linking together a group of records, called blocks, into a linked chain that itself is encrypted and protected against tampering. Essential to blockchain’s functionality is the decades-old concept of a hash, which is a calculated number used to represent the contents of a large, complex piece of data such as a PDF of a signed contract.

A hash can be used to compare two pieces of data to see if they are the same—even if all you know is the hash, and you don’t have the pieces of data themselves. Thus, if you create hashes of two versions of a contract’s PDF, and the hashes don’t match, the two PDFs are not identical. If the hashes do match, the odds are very, very good that they are identical. (There is always a small chance that two different PDFs could have the same hash.)

Where does blockchain come in? Calculate the hash for the original version of the contract PDF and store that hash value in a block in the blockchain. Authorized and interested parties can verify for themselves that copies of the contract PDF are legitimate by comparing their version’s hash to the hash stored in the blockchain. When there are multiple copies, such as those with different signatures, interested parties can use the blockchain to not only assess their validity, but also their sequence by looking at timestamps associated with each document’s metadata.

The benefit here is that blockchain stores other information along with the document’s hash, such as an accurate date/time stamp, or the identity of the person who stored the hash. That makes it easier to see if documents are the latest version, for example, or to store multiple versions of a document, containing parallel revisions. In this way, limitations of the PDF format can be overcome, or more precisely, neatly sidestepped by using blockchain, not the PDF metadata, to verify the file.

Specifically, blockchain strengthens existing solutions related to the core elements of document verification, Schellekens says:

  • Integrity: Ensuring that the copy of a document has the exact same content as the original.
  • Authentication: Proving who created or changed the document, in a way that PDFs can’t.
  • Non-repudiation: Not allowing someone to deny that they created or changed the document, as long as each iteration is stored in the blockchain.

In addition, because the document verification lives in the blockchain and not in the document itself, as is the case with PDF’s built-in security measures, it’s more difficult for anyone to tamper with the document and avoid detection of that tampering.

Best of All Worlds

When the built-in security features of PDF documents are combined with those of blockchain, it could create the best of both worlds.

The PDF industry-standard specification ensures that the document will look identical, no matter what computer it is accessed on, or which software creates or reads it, or which printers output it onto hard copy. The PDF specification also provides for security, such as encryption, to add a layer of protection.

PDF also includes extensibility. Not only can it contain metadata, but it can also contain program code, written in the JavaScript language. This makes PDF documents robust enough to contain not only a visual layout, text, and graphics, but also program logic, animations, and embedded data that can be processed by the appropriate applications.

Blockchain provides external security, timestamps, authentication of the document’s creator (or at least, authentication of whoever checks the document into the blockchain), non-repudiation, as well as verification that the document has not been tampered with.

“A strength is that you can have not only digital signatures, like you do in PDF,” says Schellekens “and you can add extra functionality in that, such as time stamping, and seeing that someone has modified the file.”

Is blockchain appropriate for every critical PDF document? Not at this point. The technology is still new, and organizations without experience with blockchain servers may not know how to implement or use blockchain. However, Schellekens is a believer in its value for authenticating document records, because blockchain doesn’t rely upon any external authorities, such as digital certificates issued by a third-party certificate authority.

The trust risks of blockchain are also easy to calculate, which may not be the case with other security methods.

“How trustworthy is blockchain?” asks Schellekens. “That’s one of the beautiful things about blockchain: You can prove, from a mathematical perspective, how hard it is for someone to decrypt your cryptographic keys or generate a hash that matches your PDF exactly. With other security, you have no idea how safe you are—until you discover that an employee ran off with the private key on a USB stick.”

Alan Zeichick is principal analyst at Camden Associates, a tech consultancy in Phoenix, Arizona, specializing in software development, enterprise networking, and cybersecurity. Follow him @zeichick.