Security Aspects of Fuzzy Hashing

Baier, Harald; Breitinger, Frank

Authors:	Baier, Harald Breitinger, Frank
Title:	Security Aspects of Fuzzy Hashing
Language (ISO):	en
Abstract:	Hash functions are widely spread in computer science and used to map arbitrary large data to bit strings of a fixed length called a fingerprint. Cryptographic hash functions like SHA-1 or MD5 are established in various fields, but have one `drawback' due to several security requirements: if one bit of the input is changed, this results in a completely different fingerprint. In computer forensics cryptographic hashing represents the backbone in identifying files in sets of known-to-be-good (known as whitelisting) or known-to-be-bad files (known as blacklisting). Therefore, most investigators use hash databases, which are either created by themselves or maintained by institutions. Once they are in possession of such a database, the processing is straightforward: take a storage medium, compute all hash values, and perform look ups in the database. An active adversary, however, can destroy this straightforward proceeding by changing one bit of each file. Conditioned by the pseudo-randomness of the cryptographic hash function, this leads to completely different hash values. A 'similarity hash function' for files, i.e. a fuzzy-hashing function, would be the perfect counterpart for this attack. In 2006, Jesse Kornblum presented an implementation for this idea in [Kor06] named context triggered piecewise hashing (CTPH). Since then, he has been quoted numerous times (e.g. [Hur09]) and his approach has been improved with respect to performance (e.g. [SLC09]). However, there is no security analysis in terms of the reliability of the results. In our research, we address some security aspects of CTPH: 1. Even if two files have absolutely different content, we show how to efficiently achieve high match scores of the respective CTPH values of both files. This approach may be used to circumvent CTPH whitelisting. 2. We show how to efficiently manipulate a file by a fixed number of modification. The CTPH of the manipulated version of the file differs significantly from the CTPH of the original, although both versions are perceptually identical. This approach may be used to circumvent CTPH blacklisting. Besides these two anti-forensic issues, we also discovered weaknesses in the randomness of Kornblum's pseudo-random function. We showed how to improve Kornblum's pseudo-random function with respect to efficiency and randomness.
URI:	http://hdl.handle.net/2003/28934 http://dx.doi.org/10.17877/DE290R-1917
Issue Date:	2011-07-21
Is part of:	SPRING - SIDAR Graduierten-Workshop über Reaktive Sicherheit, 21.-22. März 2011, Bochum, Deutschland
Appears in Collections:	Presentations

Files in This Item:

File	Description	Size	Format
08.pdf	DNB	533.15 kB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show full item record

This item is protected by original copyright rightsstatements.org