Security Aspects of Fuzzy Hashing

Baier, Harald; Breitinger, Frank

Security Aspects of Fuzzy Hashing

Files

08.pdf (533.15 KB)

Date

2011-07-21

Authors

Baier, Harald

Breitinger, Frank

Abstract

Hash functions are widely spread in computer science and used to map arbitrary large data to bit strings of a fixed length called a fingerprint. Cryptographic hash functions like SHA-1 or MD5 are established in various fields, but have one `drawback' due to several security requirements: if one bit of the input is changed, this results in a completely different fingerprint. In computer forensics cryptographic hashing represents the backbone in identifying files in sets of known-to-be-good (known as whitelisting) or known-to-be-bad files (known as blacklisting). Therefore, most investigators use hash databases, which are either created by themselves or maintained by institutions. Once they are in possession of such a database, the processing is straightforward: take a storage medium, compute all hash values, and perform look ups in the database. An active adversary, however, can destroy this straightforward proceeding by changing one bit of each file. Conditioned by the pseudo-randomness of the cryptographic hash function, this leads to completely different hash values. A 'similarity hash function' for files, i.e. a fuzzy-hashing function, would be the perfect counterpart for this attack. In 2006, Jesse Kornblum presented an implementation for this idea in [Kor06] named context triggered piecewise hashing (CTPH). Since then, he has been quoted numerous times (e.g. [Hur09]) and his approach has been improved with respect to performance (e.g. [SLC09]). However, there is no security analysis in terms of the reliability of the results. In our research, we address some security aspects of CTPH: 1. Even if two files have absolutely different content, we show how to efficiently achieve high match scores of the respective CTPH values of both files. This approach may be used to circumvent CTPH whitelisting. 2. We show how to efficiently manipulate a file by a fixed number of modification. The CTPH of the manipulated version of the file differs significantly from the CTPH of the original, although both versions are perceptually identical. This approach may be used to circumvent CTPH blacklisting. Besides these two anti-forensic issues, we also discovered weaknesses in the randomness of Kornblum's pseudo-random function. We showed how to improve Kornblum's pseudo-random function with respect to efficiency and randomness.

URI

http://hdl.handle.net/2003/28934
http://dx.doi.org/10.17877/DE290R-1917

Collections

Presentations

Full item page

Security Aspects of Fuzzy Hashing

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Alternative Title(s)

Abstract

Description

Table of contents

Keywords

Subjects based on RSWK

Citation

URI

Collections