Abstract:
Example apparatus, methods, and computers support data de-duplication indexing. One example apparatus includes a processor, a memory, and an interface to connect the processor, memory, and a set of logics. The set of logics includes an establishment logic to instantiate one-to-many de-duplication data structures, a manipulation logic to update the de-dupe data structure(s), a key logic to generate a key from a block of data to be de-duplicated, and a similarity logic to make a similarity determination for the block. The similarity determination identifies the block as a unique block, a duplicate block, or a block that meets a similarity threshold with respect to a stored de-duplicated block accessible through the dedupe data structure. The similarity determination involves comparing the block to be de-duplicated to a stored block available to the apparatus using a byte-by-byte approach, a hash approach, a delta hash approach and/or a sampling sequence approach.