Bloom Filter vs Homomorphic Encryption: Which approach protects the biometric data and satisfies ISO/IEC 24745?

Bloom filter (BF) and homomorphic encryption (HE) are popular modern techniques used to design biometric template protection (BTP) schemes that aim to protect the sensitive biometric information during storage and the comparison process. However, in practice, many BTP schemes based on BF or HE violate at least one of the privacy requirements of the international standard ISO/IEC 24745: irreversibility, unlinkability and confidentiality. In this paper, we investigate the state-of-the-art BTP schemes based on these two approaches and assess their relative strengths and weaknesses with respect to the three requirements of ISO/IEC 24745. The results of our investigation showed that the choice between BF and HE depends on the setting where the BTP scheme will be deployed and the level of trustworthiness of the parties involved in processing the protected template. As a result, HE enhanced by verifiable computation techniques can satisfy the privacy requirements of ISO/IEC 24745 in a trustless setting.


I. INTRODUCTION
A biometric template is a compact representation of a physiological or a behavioral biometric characteristic such as face, iris, voice, etc. The biometric characteristic itself is not a secret as, in a human-to-human interaction, humans recognize each other from their actual characteristics. However, in a human-to-machine interaction, a biometric template becomes a numerical equivalent of the human characteristic understandable by a machine. Thus, a biometric template reflects the identity of an individual that allows him to be recognized by the system. Given the fact that systems are subject to various types of security threats, a biometric template must be wellprotected.
The literature [1], [2] defines biometric template protection (BTP) schemes as the branch of biometrics that tackles the problem of persevering biometric templates while maintaining the recognition performance. There exist different approaches to design BTP schemes that try to satisfy the privacy requirements of the international standard ISO/IEC 24745 [3]: irreversibility, unlinkability and confidentiality. Among those approaches, Bloom filter (BF) based BTPs, process the template in the transformed domain, while homomorphic encryption (HE) based BTPs, process the template in the encrypted domain. Both approaches have common and exclusive interesting properties that deal with the BTP challenges and the tradeoffs. There are several surveys that investigate either Bloom filter [4], [5] or homomorphic encryption [6]- [8] and their applications in general. However, none of them focuses on examining these two approaches from a biometrics point of view.
In this paper, we investigate the differences between BFbased BTP schemes and HE-based BTP schemes. We analyze the state-of-the-art in both approaches by studying their core functionalities and how they are exploited in the design of BTP schemes. As both approaches seem promising, we compare their advantages and disadvantages with respect to different levels: fulfillment of the privacy requirements of ISO/IEC 24745, application usability, protected template flexibility, template size and runtime efficiency. We conclude by reflecting on which of BF or HE has the potential to satisfy the three requirements of ISO/IEC 24745 in a trustless setting.

II. BACKGROUND
In this section, we discuss Bloom filter and homomorphic encryption as technologies we are about to investigate in the context of biometric recognition. We also provide the privacy requirements recommended by ISO/IEC 24745 [3].

A. Bloom Filter
A standard Bloom filter (BF) is an efficient data structure that is used to verify whether an element belongs to a set or not. Let us denote S = {x 1 , · · · , x n } where x i ∈ {0, 1} * 1 a set of n elements to-be-represented. A BF consists of an m bits array initially set to zero. The filter uses k independent hash functions h 1 , · · · , h k , where h i : {0, 1} * → {0, 1, · · · , m−1}, that are assumed to be uniformly random. To insert an element x ∈ S in the BF, the bit at index h i (x) is set to one for all 1 ≤ i ≤ k. To verify whether an element y belongs to S, for all i ∈ [1, k] the bit at index h i (y) must be activated 2 . Hence, if at least one index is not activated then with certainty y does not belong to S otherwise y probably belongs to S since the indexes could have been activated by some elements of S distinct from y. [9] provides an extensive study on the selection of optimal parameters (k, n and m) of a BF and [10] provides a tool to estimate them and observe parameters variation.
BF is used in biometrics not only for being a space-efficient data structure but also for its invariant property with respect to element insertion since the BF of a set of elements S is identical to the BF of any permutation of S. This property is important for disposing of the inconvenient features alignment, and thus to allow an alignment-free technique. The BFs used in biometrics differ from the standard ones in the number of hash functions, they use a single hash function that is binaryto-integer, and the verification of element membership, instead they calculate the weighed Hamming distance between the BFs of two sets. BFs are close if the distance is small and thus their corresponding sets are likely to overlap.

B. Homomorphic Encryption
Homomorphic encryption (HE) allows computation over encrypted data without decryption; where E(x) (resp. E(y)) is an encryption of x (resp. y), * operation in the encrypted domain and • operation in the plaintext domain. The operations * and • can be either an addition, a multiplication or both; depending on HE scheme type. There are three types of HE schemes: partially HE (PHE), somewhat HE (SWHE) and fully HE (FHE). PHE schemes (e.g. Paillier [11], ElGamal [12]) support only one operation unlimited number of times with a plaintext space either binary or integer. SWHE schemes (e.g. BGN [13]) support a limited 3 number of operations, usually a limited number of multiplications and an arbitrary number of additions, and operate also on a binary or integer plaintext space. FHE schemes (e.g. BFV [14], [15], BGV [16], CKKS [17]) support an unlimited number of both operations and are fundamentally based on Gentry's construction [18] that enables refreshing ciphertexts to prevent them from reaching the allowed limit in each operation, and thus they remain decryptable. Unlike the classical PHEs and SWHEs, that have a limited choice of the plaintext, the state-of-the-art FHEs support binary (e.g. BFV), integers (e.g. BGV), real numbers and complex numbers (e.g. CKKS). Moreover, they offer a new style of operations, called singleinstruction multiple-data (SIMD), that significantly contributes to speeding up FHEs. For instance, they allow encryption of a vector of plaintexts, packing of a vector of ciphertexts into a single ciphertext, permutations within the same ciphertext and automorphisms of a ciphertext. Although, the practical improvements on accelerating FHE schemes are considerable, it is still an active area of research.
HE offers flexibility in processing encrypted data, however it comes with a significant cost that impacts the storage as well as the runtime. The HE ciphertexts have a large size which implies that the biometric encrypted templates have a large size as well. The biometric recognition performed in the plaintext domain is significantly faster than the biometric recognition performed in the encrypted domain since they require several multiplications which are resource demanding operations under HE. The impact that HE has on the memory space and the runtime is undesirable in biometric recognition systems that try to minimize both of them to meet the usability requirement. However, this optimization should not be at the expense of their security.

C. Privacy Requirements of ISO/IEC 24745
The international standard ISO/IEC 24745 [3] establishes requirements and guidelines on how the biometric information should be protected throughout its entire lifecycle: storage, transfer and processing. The standard highlights the importance of binding a biometric reference with the corresponding subject identity as well as the privacy protection of subjects' biometric information during the processing. In this work, we focus on the ISO/IEC 24745 privacy requirements that are: Irreversibility: for a fixed pre-defined usage (such as recognition), the raw biometric data must be transformed into an irreversible representation that precisely fits the task of the pre-defined usage. Unlinkability: there must be no relationship between the stored biometric templates neither across applications nor databases. Confidentiality: the biometric template must be preserved and not exposed to unauthorized parties trying to gain unauthorized accesses.

III. BLOOM FILTER BASED BTP SCHEMES
Cancelable biometric systems [19]- [21], that apply noninvertible transformations to preserve the biometric template, suffer from significant degradation in their recognition performance due to the use of non-invertible transformations (such as cryptographic hash functions) that hurt the biometric accuracy. BF-based BTP schemes overcome this drawback by taking advantage from the invariant property of BFs to conceal a distorted version of the raw biometric sample in a BF-based template and thus achieve diffusion of the statistical properties of biometric features while maintaining their distinctiveness.
a) First Category BF-based BTPs: [22] introduced the first BF-based BTP scheme (which we call first category BF-based BTP scheme and illustrate in Figure 1), as a form of cancelable biometric system that preserves the recognition performance by circumventing the feature alignment problem during the comparison process. This is achieved since BFs are invariant with respect to the insertion of elements as the BF of a set of elements S is identical to the BF of any permutation of S. This first category of BF-based BTPs was tested on irises [22]- [25], faces [26] and fingerprints [27] to demonstrate the diversity of this approach with respect to the biometric modalities as long as they can be expressed as binary feature vectors.
The early security assessment of the first category of BFbased BTPs was studied by [28] who confirmed the irreversibility of their templates but questioned their unlinkability. In particular, the authors showed that for and T 2 = {BF M Bi (K 2 )} k 1 two BF-based templates generated from the same iriscode M using different keys K 1 = K 2 are determined to conceal the same iriscode with a probability of 96% assuming that the biometric samples are uniformly random. Later, [29] extended the unlinkability analy-sis and considered the non-uniformity of biometric samples inherited from the acquisition noise to determine whether with different iriscodes and different keys, are from the same iris. Their attack is a brute force over the possible keys K per block that saves the key with the lowest dissimilarity score. In other terms, for each block B i it searches for andK by activating the BF at index j ⊕K if and only if BF at index j is activated. Hence, the distribution of the dissimilarity scores of the original BF-based templates where the key K has been chosen from the lowest dissimilarity score, overlap and have a slightly similar error rate. Then, [29] analyzed the irreversibility of a 1st category BF-based template without key K = 0 and proposed two attacks that try to reconstruct an approximation of the unprotected template only by extracting some partial information from the protected template. The first attack consists of reconstructing a block by replacing all its columns with the same column computed from averaging the activated indexes of the BF of the protected template. The second attack requires a training set of the form (M ID , T ID ) where T ID is the protected template concealing the iriscode M ID . The attack consists of reconstructing the iriscode of a protected template from the test set by replacing each block with the block corresponding to the nearest BF belonging to the protected templates of the training set. This attack assumes that K = 0 which implies that it does not take into account neither the variability of the key among different subjects nor the effect of the key for the same subject. As reported by the authors, the experimental results of both attacks are ineffective.
b) Second Category BF-based BTPs: In order to address the linkability vulnerability of 1st category BF-based BTPs, [30] proposed a technique called structure-preserving feature re-arrangement to replace the XOR with the key before computing the BF, and thus the second category BF-based BTP scheme that we illustrate in Figure 1. This technique permutes the rows of a feature block according to a keyed random permutation to diffuse the statistical properties of a biometric feature vector and at the same time to preserve the biometric performance. Later, [31] uses the same technique with a minor addition, that is, after a row-wise permutation there is a circular shift within each column. However, this circular shifting does not contribute to the dissipation of the biometric information but rather might lead to some accuracy loss since different columns after shifting might result in the same column.
[32] studied the unlinkability of any BTP scheme from an information theory perspective and proposed a linkability evaluation procedure (Section 5 in [32]). This procedure helps to assess whether two protected templates of a given BTP scheme are concealing the same or different biometric instances. This is determined only by observing the score resulted from the BTP's comparison measure and comparing it with the prior mated score distribution and the prior unmated score distribution. The same work defined three degrees of unlinkability that are: fully unlinkable, semi unlinkable, and fully linkable templates. [32] tested their framework analysis on a HE-based BTP that uses Euclidean distance and reported that it is fully unlinkable while the BF-based BTP in [30] lies between fully unlinkable and semi unlinkable. Note that this procedure works only if the comparison score is known, however for an HE-base BTPs this score can be hidden [33] and only the comparison outcome is revealed. Hence, this procedure studied the unlinkability of the underlying unprotected template instead of the one protected by HE. The BF-based template is of the set of those BFs = BF Bi 1 The dissimilarity between two templates 1 and 2 is given by  Fig. 1. Overview of the 1st category and the 2nd category BF-based BTP schemes.
Step 1 and Step 3 are common to both categories. In Step 2, the 1st category (resp. 2nd category) each block is transformed via a XOR with user's key (resp. row-wise random permutation). Note that the key is userspecific and should be different from an application to another to avoid crossmatching over databases. The original scheme [22] uses the same key for all blocks while [29], who assessed its security, proposed to use a different key per block, as depicted in this figure.

IV. HOMOMORPHIC ENCRYPTION BASED BTP SCHEMES
Homomorphic Encryption (HE) has been the centerpiece of many privacy-preserving schemes, in particular biometric recognition in the encrypted domain [34]- [36] as it allows processing of encrypted templates without decryption. The use of an IND-CPA 4 secure HE scheme guarantees unlinkability, irreversibility and confidentiality under the constraint of the hardness of the underlying mathematical problem. Unlike classical BTP schemes, HE-based BTPs provide template protection even for a remote biometric recognition since an encrypted template can be sent over an unprotected public channel as only the party holding the private key is able to decrypt, and thus the importance of key management in the design of HE-based BTPs. Hence, HE allows a distributed comparison between the client and the server where only the party with the disclosure right is entitled to learn the recognition outcome. Therefore, in this survey, we classify HE-based BTPs according to their key management approach: either a single key HE 5 , where the template is encrypted with the public key of one of the parties and is decryptable with its private key, or threshold HE where the template is encrypted using a joint public key between the client and the server and is decryptable using their both partial private keys.
Single key HE-based BTPs: The choice of a suitable HE scheme for designing a HE-based BTP scheme depends on the comparison measure that produces either a similarity score or a dissimilarity score. Some comparison measures (such as Hamming distance) can be efficiently implemented under encryption using only a PHE scheme while others that consume more multiplications (such as Cosine similarity) can benefit from SIMD operations of a SWHE scheme or a FHE scheme to improve their efficiency under encryption. The design of a HE-based BTP scheme also depends on the recognition protocol architecture, the parties involved (such as client, authentication server and database server, where the two later are sometimes combined as a single server), which party has the right to learn the recognition outcome based on which the key management is handled.
For applications such as access control, the client is entitled to learn the recognition outcome. For instance, schemes such as [37]- [39] encrypt the template with the client's public key and stores the encrypted template on the server's database who computes the comparison measure under encryption and sends the final score encrypted to the client. While in other applications such as remote authentication to a service, the authentication server is entitled to learn the recognition outcome. For example, schemes such as [40]- [44] differentiate between an authentication server and a database server with the assumption that both do not collude. In these schemes, the template is encrypted with the authentication server's public key and stored on the database server. This time the database server performs the comparison under encryption and sends the encrypted final score to the authentication server. In both cases, the party, entitled to learn the recognition outcome, decrypts the encrypted final score and then compares it with the system's threshold, if the score exceeds the threshold then the party counts it as a match otherwise a no match. Hence, the comparison is not fully in the encrypted domain as the comparison with the threshold is performed after the decryption and the entitled party learns more than what it needs to learn, the final score and the recognition outcome.
In some schemes, such as [36], [45], the template is encrypted with the client's public key although the authentication server is the entitled party. For the comparison measure, [45] uses the support vector machine (SVM) classifier while [36] uses the squared Euclidean distance (SED). During the enrollment of a given individual, in [45] the classifier is trained on several biometric samples of that individual and the encrypted template is formed by encrypting the classifier's parameters using the client's public key while in [36] the encrypted template is simply the encrypted feature vector.
During the comparison, in [45] the client sends an encrypted freshly extracted feature vector to the authentication server who multiples them feature-wise with the encrypted template and a random value in order to blind the individual products. Subsequently, the server sends these blinded products to the client who decrypts and adds them and then sends back the result to the server so that it cancels out the blinding to learn the final score based on which it makes its decision. Similarly, in [36] the server computes a blinded SED under encryption, sends the encrypted blinded final score to the client who decrypts it and sends it back. Then, the server removes the blinding from the blinded final score and performs the comparison with the threshold. Again in these cases the final score is revealed to the server and thus the comparison with the threshold is performed outside the encrypted domain.
Threshold HE-based BTPs: The encryption of the template with the authentication server's public key, even if the encrypted template is stored on the database server, is unsafe since in case the authentication server intercepts the communication between the client and the database server or illegally obtains the encrypted template, the authentication server is able to decrypt the encrypted template and learns the clear template that is supposed to be protected. HE-based BTP schemes such as [33], [35] use a threshold variant of HE to encrypt the template in order to address the above mentioned limitation introduced by the use of a single key HE scheme. Hence, a threshold HE encrypted template cannot be decrypted by neither the client nor the server on his own but instead both of them need to participate in the decryption process, and thus a better control of the biometric data flow from both parties.
In general, the exposure of the final score, whether to the client or to the server, leaks the closeness between a freshly processed biometric data (probe) and the static previously processed biometric data (template) as well as the quality of a user's biometric modality. Taking advantage from HE that allows processing under encryption, [33] shows that the final score can be hidden. Moreover, [33] performs the comparison with the threshold under encryption and then reveals only the recognition outcome, match or no match, at moment of decryption.

V. BF-BASED BTP SCHEMES VS HE-BASED BTP SCHEMES
Both approaches present pros and cons and differently satisfy the tradeoff efficiency-security which makes a binary decision between these approaches difficult to make. Table I summarizes and compares BF-based BTP schemes and HEbased BTP schemes with respect to the privacy requirements  [37], [40], [44] [33], [35] Irreversibility Unlinkability , supported modalities and their nature (rows 7, 8 and 9), biometric recognition protocol (rows 10, 11 and 12), template's characteristics (rows 13 and 14) and performance of the overall BTP (rows 15 and 16). Note that malleability means whether the protected template can be inconspicuously altered. A BF-based template can be modified by flipping activated/deactivated bits while HEbased template can be modified by injecting ciphertexts to the encrypted template since HE is malleable by nature. Therefore, a verification mechanism needs to be applied along with BTP schemes to check the validity of the protected template and monitor the correctness of comparison operations.

VI. CONCLUSION
In this paper, we investigated existing BF-based BTPs and HE-based BTPs with regard to the fulfillment of the privacy requirements of ISO/IEC 24745. While both approaches preserve the biometric accuracy, however they present advantages and disadvantages that vary according to the tradeoff efficiency-security. The choice of using one approach over the other depends on the setting where the BTP scheme is intended to be deployed and the level of trustworthiness of the parties involved in processing the protected template. In both approaches, the protected template needs to be treated with cautiousness since according to [46] and [34] if the parties do not follow the recognition protocol as prescribed, then serious biometric leakage can happen. Unlike BF-based BTPs, HE-based BTPs are more able to deal with this kind of misbehavior since they can be combined with secure and verifiable computation techniques to monitor the flow of the computation and thus satisfy the privacy requirements of ISO/IEC 24745 in a trustless setting.