CB-cPIR: Code-Based Computational Private Information Retrieval

CB-cPIR introduces a code-based single-server computational private information retrieval scheme that enhances security against subquery attacks by using high-weight secret vectors and dual queries, achieving lower communication and computational costs compared to lattice-based schemes like XPIR and SimplePIR.

Privacy-Preserving Machine Learning, Efficiency, Robustness, AI in Security

Camilla Hollanti, Neehar Verma

Aalto University

Generated by grok-3

Background Problem

The paper addresses the challenge of private information retrieval (PIR), where a user aims to retrieve data from a database without revealing the identity of the requested file to the server. Traditional information-theoretic PIR schemes assume non-colluding servers, but in the worst-case scenario of full collusion (or a single honest-but-curious server), privacy can only be achieved by downloading the entire database, which is impractical for large datasets. To overcome this, computational PIR (cPIR) schemes rely on cryptographic hardness assumptions to ensure privacy with reduced communication overhead. The work builds on the HHW scheme, a pioneering code-based cPIR approach, which was vulnerable to subquery attacks due to discernible rank differences in query submatrices. The key problem solved is designing a secure single-server cPIR scheme that resists such attacks while maintaining efficiency.

Method

The CB-cPIR scheme modifies the HHW code-based cPIR approach by constructing queries that prevent rank-based distinguishability attacks. Its core idea is to use high-weight secret vectors instead of low-weight standard basis vectors to mask the desired file index, and to concatenate two independent queries (Q1 and Q2) for added security. The steps are: (1) Sample public information (generator matrices for random linear codes) and secret information (information sets, codeword matrices, subspaces, and error matrices); (2) Construct two queries using high-weight vectors β and β + e_i (where e_i is the standard basis vector for the desired file index i), combined with error and codeword matrices; (3) Concatenate these queries into a single query sent to the server; (4) The server responds with a matrix product of the database and query; (5) The user decodes the response using erasure decoding and subspace projection to recover the desired file. This method circumvents the subquery attack by ensuring submatrix ranks are indistinguishable, leveraging the hardness of decoding random linear codes for security.

Experiment

The experiments evaluate CB-cPIR against lattice-based cPIR schemes (XPIR and SimplePIR) on communication costs, computational complexity, and PIR rate, using various parameter sets for security levels (e.g., 113-133 bits for CB-cPIR). Datasets are conceptual, representing databases with varying file sizes and numbers (e.g., 10,000 files or 1GB total size). The setup compares CB-cPIR with parameters like (q=2^61-1, n=100, k=50, s=6, v=2) against XPIR (n=1024, log(q)=60) and SimplePIR (q=2^32, p=495, n=1024), focusing on scenarios with large and small files via square and iterative database extensions. Results show CB-cPIR outperforms XPIR and SimplePIR in communication overhead (e.g., higher PIR rates as file sizes increase) and computational costs (e.g., fewer multiplications over comparable fields). The setup is reasonable for highlighting efficiency, though it lacks real-world implementation data or optimized lattice-based comparisons, which could skew results. The effectiveness matches expectations for communication and computation savings, but the halved PIR rate compared to HHW (due to dual queries) and sensitivity to parameter choices for security are notable drawbacks.

Further Thoughts

The CB-cPIR scheme’s reliance on the hardness of decoding random linear codes positions it as a promising post-quantum secure solution, especially as quantum computing threatens lattice-based schemes like XPIR. However, I’m intrigued by the potential impact of future advances in decoding algorithms—could they undermine the security foundation of CB-cPIR as Shor’s algorithm does for classical cryptography? Additionally, the iterative database approach for small files connects to broader data management challenges in distributed systems; it might be worth exploring how CB-cPIR integrates with federated learning environments where privacy and efficiency are paramount. Comparing this to recent works on homomorphic encryption for PIR, such as those leveraging SEAL or CKKS schemes, could reveal whether code-based methods offer unique advantages in specific low-bandwidth, high-privacy scenarios like IoT or edge computing. Finally, the lack of a proof-of-concept implementation raises questions about real-world latency and scalability—future research should prioritize this to validate the theoretical gains.