BACK TO SCHEDULE REGULAR

Session 15

Confidentiality and Data Protection

3 June 2026
16:30 – 18:00
ŠIBENIK VI

Presentation title
Revisiting Data Protection in the European Statistical System: Current Challenges and Opportunities
The growing demand for timely, granular, and integrated statistics has placed unprecedented pressure on traditional frameworks of statistical confidentiality and data protection.

Read more Read less Recent regulatory developments, including the revision of Regulation (EC) No 223/2009, explicitly encourage the reuse of administrative data, the integration of multiple data sources, and, under certain conditions, access to privately held data. While these developments expand the analytical potential of official statistics, they also intensify disclosure risks, particularly in contexts involving linked microdata, high-dimensional datasets, and increased data sharing across institutions and borders. This creates a fundamental tension: how to protect individual and organizational confidentiality while fully exploiting the informational value of data.

This work examines how national statistical systems are adapting their confidentiality protection strategies in response to these pressures, with a particular focus on the evolving role of anonymization algorithms within broader data governance frameworks. Rather than imposing a single technical solution, the current regulatory environment promotes a risk-based approach to statistical disclosure control (SDC), requiring statistical authorities to demonstrate that residual disclosure risks are proportionate to the public value of the data released. As a result, traditional rule-based anonymization techniques—such as suppression, recoding, and aggregation—are increasingly complemented by quantitative risk assessment, stochastic perturbation methods, and, in specific cases, differential privacy mechanisms.

Drawing on comparative insights from several European statistical systems, the work highlights a clear shift from “anonymization as a one-off technical step” toward “confidentiality as a socio-technical system.” In this paradigm, algorithms are only one component, operating alongside controlled access environments, secure research facilities, legal safeguards, and user accreditation procedures. Countries with advanced register-based systems illustrate how strong governance of access and accountability can reduce the need for excessive data distortion, thereby preserving analytical utility while maintaining robust protection.

The work argues that realizing the full potential of data under modern regulatory constraints requires moving beyond the dichotomy of data protection versus data use. Instead, confidentiality should be understood as a dynamic balance between risk management, transparency, and trust. By embedding anonymization algorithms within coherent governance structures and adopting evidence-based risk assessment practices, statistical authorities can protect data subjects effectively while enabling high-quality, policy-relevant statistical outputs.

Main author / Presenter
Ana Dulce Pinto
Statistics Portugal

Read more Read less Ana Dulce Pinto holds a degree from the Faculty of Law of the University of Lisbon. Since May 2018, she has been the Data Protection Officer (DPO) at the National Statistics Institute, I. P., having obtained Data Protection Certification from the European Institute of Public Administration. She monitors and drafts national legislation, as well as other regulatory instruments, and is responsible for liaising with external public and private entities. Ana represents Statistics Portugal at national and international meetings and conferences e she is also a trainer on confidentiality, privacy, and data protection. Ana is also a legal consultant for various cooperation projects with the countries of the CPLP (Community of Portuguese-Speaking Countries), namely drafting legislation and training statistical staff on the legal framework of the respective statistical systems. She has also represented INE in European Commission projects on access to statistical data by researchers.


CO-AUTHOR:

Pedro Campos, Statistics Portugal

Presentation title
Automatic analysis of metadata for the protection of tabular data
This article presents two new features of the R package rtauargus, a tool for confidentiality-preserving data dissemination.

Read more Read less Getting the list of published tables from templates such as the ones provided by Eurostat. And an automatic analysis of the metadata of the published tables from a confidentiality viewpoint.

Cells shared across tables need to be detected and protected consistently. This can be done using rtauargus. However, for the package to detect these common cells, a list of linked tables must be provided. Constructing this list requires specific expertise in understanding the relationships between tables. Thus, the automatic analysis comes in handy for people with less expertise since it tells them how they should protect the tables.

A multi-step process can be followed to ensure the protection of a set of tabulated data. The first step consists in analyzing the full set of tables to be published. The automatic analysis function proceeds as follows:

1. Identification of hierarchies: all breakdown variables that belong to the same hierarchical variable are renamed after that hierarchy;

2. Clustering: tables that need to be processed together are grouped into clusters. These tables generally share the same indicator, the same scope, and at least one common breakdown variable. Each cluster is independent from the others;

3. Detection of nested tables: since table descriptions are provided by data producers rather than confidentiality experts, tables are often described separately even though, from a confidentiality perspective, they are nested. For example, if one table presents turnover by NACE and another presents turnover by NACE × NUTS, protecting the NACE × NUTS table is sufficient. This step therefore analyzes breakdown variables to detect such inclusions;

4. Grouping of nested tables;

5. Final output: a summary table is produced, including a “cluster” column indicating which tables should be processed together.

The automatic analysis developed allows confidentiality experts to verify their work, particularly in the context of publications involving a large number of tables. However, its primary objective is to be used directly by data producers, enabling them to carry out data protection themselves and thereby relieving methodological experts of this task. This automatic analysis can be used prior to applying rtauargus, but it is also relevant for other protection methods. Indeed, the identification of common cells and hierarchies is required for many tabular data protection methods.

The article will develop the process and illustrate it through use-cases.

Main author / Presenter
Clara Baudry
Insee

Read more Read less Clara is a methodologist in statistical disclosure control at Insee. She has been working in this field for two years, after completing a Master's Degree in Data Science for Public Decision Making.

Presentation title
The protection of personal data versus the need to reflect the state of society? Estonian experience in producing statistics
Pursuant to the National Statistics Act (NSA), Statistics Estonia may perform non-program statistical work on the basis of data collected for national statistics and share data for research purposes, based on the requirements set out in the Personal Data Protection Act.

Read more Read less Since 2024, the Data Protection Inspectorate (DPI) has issued instructions to interpret the act narrowly, i.e. the legal basis for any non-program statistical work (done by request) cannot be NSA, but the legal act of the institution requesting the work (e.g. statutes, etc.). In addition, it requires the permission of an ethics committee or the DPI. As a result, Statistics Estonia faces new challenges in its operations as follows:

• How to ensure the protection of the fundamental rights of individuals while satisfying the need to obtain information for data-based decisions.

• The implementation of the GDPR is still allowing for different interpretations as regards the exception granted to produce statistics. For example, although the term “official statistics” is broader, the current position of the DPI is, that the exception concerns only data processing within official statistical programme.

• One of the quality criteria for statistics is timeliness – excessively lengthy and complex approval procedures for obtaining permission to process personal data make it very resource-intensive. For example, when collecting data within the framework of some EU statistical work, a few questions necessary for organizing local life were added. According to the new interpretation, these individual questions should undergo an approval procedure, within which all data processing stages are described.

• When the new Regulation 223 was negotiated, the privacy enhancing technology was seen as one of the facilitators using different types of data in the production of statistics. Estonia is currently investigating ways to build a so-called “black box” that would prevent contact with personal data. At the same time, it is a challenge to explain to DPI how it ensures risk mitigation.

• There is an increasing demand for state institutions to explain and ensure transparency for citizens about how their data is handled. However, a desire to lower barriers to private sector access to data is also increasing. Question about how to challenge the risks for data leaks there, while state institutions are easier to control, is still pending.

Main author / Presenter
Ilona Reiljan
Statistics Estonia

Read more Read less Authors are co-presenters having over 20 years experience in the public sector. Currently working in the quality and information security team - Ilona Reiljan as the head and Thea Palm as the legal expert. Topics of the team - law, quality, data protection and information security. Mrs Reiljan has worked in different institutions as certified quality manager and participated in the projects of organization development. Mrs Palm has long experience in drafting various legal acts and planning of surveillance activities mainly in internal market related issues. In recent years the counselling on data protection both inside and outside of Statistics Estonia has become their crucial task in guiding the production of statistics.


CO-AUTHOR:

Thea Palm, Statistics Estonia

Presentation title
The quality challenge in Secure Private Computation: how to check data without seeing them
The evolution of official statistics brings about a shift towards multi-source statistics, i.e., statistics produced from the integration of multiple input data sets.

Read more Read less In several cases of practical interest, the source data sets are held by different organizations and are subject to some protection regime, e.g., because they represent personal data subject to GDPR or business-sensitive information. The traditional approach to deal with such situations is to transmit a copy of the data to a trusted party that “sees” all the data and therefore can check, analyze and process them in the standard way, manually or automatically. Today this is not the only possible approach: recent technologies from the field of Secure Private Computing (SPC) offer an alternative. Solutions based on modern cryptographic methods (Secure Multi-Party Computation) and/or special hardware (Trusted Execution Environment) allow to perform a pre-defined computation task without exposing the source data to any other entity other than their original holders. This approach, often referred to as “Computing over Encrypted Data”, assumes that the function to be computed is fully specified in advance, from the input data to the final desired output result(s), and then entirely executed by machines. Therefore, statisticians and analysts lose the possibility to access the source data or inspect any of the intermediate data generated along the process other than what was pre-defined to constitute the output results. This has important implications for the Quality Assurance (QA) process: the traditional approach to QA, based on data centralization and full data visibility, does not work in SPC settings and must be fundamentally rethought.

In this contribution we elaborate a principled view of the QA process for SPC settings in which statisticians don’t “see” the data directly. We show how the QA process must start from the initial conception of the processing methodology through a close dialogue between the data holders and the statistical office, based on disclosure of detailed meta-data and limited test data samples, not full data sets. During the dialogue, an extensive set of quality checks must be developed to be included in the automated processing code.

During the presentation we provide examples of QA methods from the two use-cases developed in the context of the project JOCONDE (Joint On-demand Computation with No Data Exchange) carried out by Eurostat and cybernetica. The project’s goal is to develop a SPC-as-a-Service system to be used by members and partners of the European Statistical System.

Main author / Presenter
Fabio Ricciato
Eurostat

Read more Read less Fabio Ricciato received a Laurea in Electrical Engineering (1999) and PhD in Information and Communication Technologies (2003) from University La Sapienza, Italy. Until 2017 he worked in telecommunications and computer science research across different organisations in Austria, Italy and Slovenia. He served as assistant professor at the Faculty of Engineering, University of Salento, and associate professor at the Faculty of Computer Science, University of Ljubljana, teaching subjects in the fields of telecommunications, computer networks, signal processing and data analysis. He also worked as middle manager for two Research and Technology Organisations in Vienna, leading units of up to 45 researchers and engineers. In 2018 he joined Eurostat where he currently serves as Statistical Officer in the unit dealing with Innovation and Digital Transformation. His current work focuses on the reuse of novel data sources for official statistics and the application of privacy-enhancing technologies to multisource statistical production.

Presentation title
Balancing data confidentiality and data use in Singapore
Safeguarding data privacy while unlocking the immense potential of data for innovation, research and policy development represents a critical balancing act.

Read more Read less The core challenge lies in transforming raw sensitive data into a usable asset without compromising the trust or confidentiality in the data it represents. In Singapore, this balance is achieved by focusing on three key areas: Firstly, building trust by ensuring confidentiality safeguards meet required legal and organizational standards. Secondly, by limiting and structuring information collected so that no person or entity can be identified. Thirdly, establishing safe processes for authorized users to work with the data so it stays protected. These principles guide the entire data life cycle, from collection to deletion.

The realization of data’s full potential hinges on trust - without trust, individuals and entities are reluctant to share the necessary information for large scale analysis. When people trust that their data is handled securely and for legitimate statistical purposes, they will provide information and official statistics maintain credibility. This requires transparency about why the data is collected, how it will be used, and who will have access to it. Clear internal policies and staff training must support these commitments and ensure data remains confidential.

The second strand is reducing the amount of identifying detail held and structuring data in a way that lowers the risk of re-identification. If meaningful results can be produced using aggregated data rather than exact personal information, broad categories such as age ranges or general locations can replace precise details, this will protect privacy while keeping the analysis strong.

Finally, the third element is providing safe and controlled access to the data. Access should be limited to authorized users for legitimate purposes, supported by systems designed to prevent unintended disclosure. It is important for organizations to be accountable through proper records and regular audits to maintain the trust needed for people to continue sharing their information.

To conclude, these measures in general allow Singapore to strike a balance between data protection and data use. It demonstrates that they are not competing goals but can work together to achieve desired outcomes. By building trust, keeping only necessary information, and enabling safe and controlled access to the data, organizations can meet their confidentiality obligations while still enabling insights that benefit the public.

Main author / Presenter
Jeremy Heng
Ministry of Manpower

Read more Read less Jeremy Heng is a Senior Assistant Director with the Singapore Ministry of Manpower. He oversees the conduct of official household and establishment surveys in Singapore, such as the Labour Force Survey and Labour Market Survey. He also ensures the timeliness, quality and confidentiality of the statistics that is being produced.

Presentation title
Why not a little bit of bias to improve the risk-utility trade-off ?
Why not a little bit of bias to improve the risk-utility trade-off ? An alternative way to build the transition matrix of the cell key method.

Read more Read less The publication of tabular data requires achieving a better risk-utility trade-off to protect privacy while preserving data quality. For very large tabular data releases, National Statistical Institutes can turn to perturbative methods over suppressive ones, the main limitation of which is the computational constraints of existing tools.

The Cell Key Method ([Fraser and Wooton, 2005], [Enderle et al., 2018]) is such a data perturbation technique for tabular data designed to reduce the risk of disclosure of large data releases. Its perturbation mechanism is governed by a transition matrix that defines the probability distributions mapping an original count to the possible perturbed values. The R package ptable ([Enderle, 2023]) implements the construction of such a matrix by solving a maximum entropy optimization problem for each distribution under various constraints. In particular, the resulting distributions are unbiased.

However, since the support of the perturbation for small counts (especially 1 or 2) is asymmetric - the method doesn't produce negative counts - the distribution generated by the package introduces little perturbation to these high-risk small counts. To mitigate this, the package offers the option to suppress certain counts from appearing in the final data. Yet, this change in support often entails specifying very large variance levels, which can significantly deteriorate data quality, even for large counts.

We propose an alternative approach to constructing transition probability distributions for small counts by allowing a small amount of bias to be introduced during their perturbation. We first present the different adaptations of the optimization program that enable the construction of such distributions. A comparison between these alternatives and the classical solution implemented in the ptable package is then illustrated through the risk-utility trade-off.

What if adding a small amount of bias would result in a better compromise?

References

[Enderle, 2023] Enderle, T. (2023). R package ptable : Generation of Perturbation Tables for the Cell-Key Method.

[Enderle et al., 2018] Enderle, T., Giessing, S., andTent, R.(2018). DesigningConfidentiality on the Fly Methodology – Three Aspects. In Domingo-Ferrer, J. and Montes, F., editors, Privacy in Statistical Databases, volume 11126, pages 28–42. Springer International Publishing

[Fraser and Wooton, 2005] Fraser, B. and Wooton, J. (2005). A Proposed Method for Confidentialising Tabular Output to Protect against Differencing. In Monographs of Official Statistics

Main author / Presenter
Julien Jamme
INSEE

Read more Read less Expert in statistical disclosure control methodologies, department of statistical methods, INSEE (France)

Presentation title
Data protection-compliant methods for access to mobile signal data in Germany
Since 2017, the Federal Statistical Office has been researching the topic of mobile phone data for official statistical purposes in various feasibility studies and publishing these as experimental statistics (EXSTAT: https://www.destatis.de/DE/Service/EXSTAT/_inhalt.html).

Read more Read less The data used in the respective projects has so far been prepared and processed by various mobile phone providers for specific applications and made available to the Federal Statistical Office in anonymised and aggregated form. Due to confidentiality and trade secrets, this data is created in a methodological ‘black box’, in which the Federal Statistical Office has only limited insight into the processing of the data.

In order to open this black box, the Federal Statistical Office cooperated with T-Systems as a mobile phone provider from 2023 to 2025 as part of the project ‘Anonymity in integrated and georeferenced data (AnigeD)’. The aim of the work package is, in particular, to develop anonymisation and processing procedures for the use of anonymised georeferenced mobile phone data. These are necessary in order to meet the quality criteria of transparency, accessibility and comparability in the production of official statistics. In addition, data protection standards must be complied with at all times in order to maintain public confidence in official statistics.

Further objectives include designing and setting up the necessary development environment at the data provider and developing a model process for future cooperation between private (mobile phone) data providers and the Federal Statistical Office, as well as other government institutions where applicable.

In this talk, Destatis will present the results of the project.

Main author / Presenter
Lorenz Ade
DESTATIS (Federal Statistical Office of Germany)

Read more Read less Lorenz Ade studied Politics and Administrative Sciences in Konstanz and Utrecht and has been working as a Research Associate at the Federal Statistical Office of Germany since February 2022. Since November 2023, he has been part of the department "Research on New Digital Data", focusing on the methodological development of standardized processes for the processing and anonymization of mobile network signal data.

Cookies

This website uses cookies to ensure you get the best experience.

x