WYSINWYG or Improper Redaction Techniques

This afternoon I noticed in Joe Hall’s weblog that the Library of Congress had published the comments submitted thus far on the inquiry on orphan works. Joe commented that he was having difficulty copying and pasting from the PDF documents because of the DRM on the files, ostensibly used to help protect the redaction of the personal information contained in the files.

Library of Congress boobooUnfortunately, I was able to quickly confirm a suspicion of mine. It was trivial for me to expose the “redacted” information from the documents, which had not been redacted at all, but merely covered with white boxes to obscure. I have posted an example image to the right, click the thumbnail to enlarge.

Thus, the personal information of everyone who submitted comment to the L.O.C. has their personal information available, over 700 of them at present. (No, I won’t be posting the method I used here, but it does not involve any specialized level of skill.)

This is not the first time something like this has happened. Redacted information cannot merely be obscured in digital formats, it is important that it is destroyed in the public version of a document, otherwise it will always be recoverable.

This is an amusing coincidence for me, as I am currently working on a upcoming presentation that talks about precisely these types of issues.

N.B.: I got Joe’s permission to post his data as the example document, and the appropriate people at the Library of Congress are being notified about the issue. Hopefully it will be resolved soon.

UPDATE 4/01: The LoC has now removed the documents, and replaced them with copies that fix the issue.