3 June 2026
15:45 – 16:30
ŠIBENIK I
Presentation title
Leveraging AI Models for Real-Time Analysis of Unstructured Field Text Data to Enhance Supervisory Quality in Statistical Surveys.
The quality of statistical data depends fundamentally on effective fieldwork supervision.
Read more
Read less
However, methodological and logistical issues that arise during data collection are often recorded through informal channels, such as messaging applications. These communications constitute unstructured text data that is difficult to process manually, this results in delays in supervisory intervention, exacerbate non-sampling errors, and threaten the integrity of the final survey results. Therefore, there is an urgent need for real-time tools to support supervisory decision-making.
To enable real-time quality monitoring, this study proposes an innovative methodology based on the analytical power of Artificial Intelligence) AI (models, It utilizes leveraging Large Language Models) LLMs (and Natural Language Processing) NLP (using tools like Gemini to systematically process unstructured field communication texts via platforms WhatsApp, transforming them into actionable data within the context of the International Migration Survey in Egypt.
The immediate goal of this processing is to instantly and automatically convert these texts into accurately classified structured data, key classification categories include sample challenges, methodological ambiguity in the questionnaire, and routine logistical problems, this precise classification provides the foundation necessary for transitioning from identifying problems to implementing solutions.
Based on the above, this paper will discuss a system that generates real-time quality reports identifying "hotspots" of problems in the statistical field, with application to international migration surveys, this output serves as the primary tool for ensuring the efficient allocation of supervisory support and resources to maximize effectiveness, critical mechanisms are activated, including the issuance of immediate corrective memos to address methodological measurement errors.
Waleed Mohammed
Central Agencey for public Mobilization and Statistics
Read more
Read less
Waleed Ameen Abd Elkhalik Mohammed is a highly accomplished Senior Statistician at the Egyptian National Statistical Office (NSO), CAPMAS, specializing in Social Statistics and Migration Research. Holding a PhD (2018) focused on leveraging statistics to develop social research skills, he combines academic rigor with extensive practical experience dating back to 2006.
His career is marked by significant contributions to developing statistical operations across several key teams, most notably as the NSO’s representative to the UN Inter-Agency and Expert Group on SDG Indicators (IAEG-SDG) since 2015. He is a key contributor to migration data governance, coordinating the Migration Data Analysis Unit and representing Egypt in the regional "THAMM" and MedStat 5 programs. An experienced trainer, he instructs at the National Statistical Training Center (since 2009) and was instrumental in leading data collection teams for the 2017 census, focusing particularly on metadata documentation and statistical quality assurance.
Presentation title
Keeping official statistics relevant: Coherence, Harmonisation, and Inclusion
The data landscape is evolving rapidly, driven by new technologies, new data sources, and changing societal priorities.
Read more
Read less
To remain trusted and relevant, official statistics must adapt while maintaining quality, consistency, and inclusivity. This presentation explores how the UK Government Statistical Service (GSS) are meeting this challenge through three workstreams; Coherence, Harmonisation, and Equalities and Inclusion.
The GSS Coherence team supports the GSS to improve the statistical coherence and comparability of data across the UK. The team collaborates with Devolved Governments to identify and promote coherence priorities across the UK, highlighting progress and identifying gaps in coherence needs. Complementing this, the GSS Harmonisation team promotes consistency in data collection and presentation. The team maintains and develops over 40 harmonised standards and provides guidance to align definitions, survey questions, and outputs. The Harmonisation team are proactively developing new standards for priority topics such as ethnicity, sex, and gender identity, ensuring data reflects the diversity of the UK population while remaining comparable and coherent. Alongside these efforts, the Centre for Equalities and Inclusion strengthens understanding and insight into populations that are less visible in our data, highlighting and promoting the need to embed equality and inclusion considerations into statistical design and interpretation.
Together, these initiatives demonstrate how coherence, harmonisation, and an inclusive approach ensures the relevance of official statistics remain relevant and capable of addressing emerging policy questions, in a rapidly changing data environment. The session will highlight recent achievements of the workstreams, explore future challenges, and invite discussion on how we collectively deliver consistent, inclusive, and trusted data.
Fiona Dawe
UK Statistics Authority
Read more
Read less
Deputy Director, National Statistician's office, UK Statistics Authority
CO-AUTHOR:
Presentation title
Analysis of socio-economic deprivation: a composite index for measuring inequalities within municipalities.
There are numerous examples in scientific literature of studies on the socio-economic deprivation of households, generally based on indicators calculated at municipal and sub-municipal level using population census results.
Read more
Read less
With the new Permanent Population and Housing Census (PPHC) defined by the National Institute of Statistics (Istat), the availability of highly detailed spatial data has changed significantly. This has prevented the reproduction of deprivation indicators previously proposed and used by scholars.
However, the richness of information contained in administrative data archives and the development of thematic statistical registers (on topics such as income, employment, and education), and the possibility of linking different sources to the census database and geocoding individual data to the smallest territorial units (which coincide with census enumeration areas) have made it possible to expand and diversify the availability of highly detailed territorial data. This framework offers new opportunities to study the demographic, social and economic phenomena of individuals and households and to measure socio-economic inequalities at the sub-municipal level. In this regard, Istat has recently developed a new index of socio-economic deprivation of households based precisely on the integration of census data, administrative sources, and statistical registers. It is a composite index based on various components of deprivation (economy, employment, education), which can be calculated annually, allowing for sub-municipal analysis in both spatial and temporal dimensions.
The basic measures of deprivation and the composite index are calculated for census areas, but the data are analysed and disseminated by territorial aggregates for reasons of privacy and statistical representativeness. The reliability of the results is guaranteed by the high quality of the data used in terms of accuracy, consistency, timeliness, and relevance.
The results derived from the use of this index are a useful tool for both policymakers and researchers, for example in developing social service policies or income support instruments.
Giancarlo Carbonetti
Istat - National Institute of Statistics
Read more
Read less
Graduate in Statistical and Economic Sciences - PhD in Methodological Statistics. Senior Researcher - at Istat for almost 30 years.
He deals with statistical methodologies, applied statistics and data quality - expert in estimation methods for small areas, sampling techniques and territorial analysis. He has mainly worked on the Population Census and household surveys. He was involved in the design and production of the 2001 and 2011 censuses.
In the current Permanent Census, he is working on the “valorisation of sub-municipal census results” - integrated with register data, through studies and projects.
Presentation title
Statistical Literacy as Self-Defence: Engaging Young Users Against Misinformation
In the context of the current edition of the European Statistics Competition, 2025-2026, we developed, as part of the ALEA project, a statistical literacy test that the competing teams answered, alongside tests on the use of data from the Statistics Portugal (INE) portal and a Eurostat publication.
Read more
Read less
The statistical literacy perspective adopted in the test draws on Darrell Huff's work, framing statistical literacy as a form of self-defence against misinformation. The results show that the average score in this test was substantially higher than in the other two tests.
This presentation will report an exploratory study analysing test data for internal consistency using standard reliability metrics (e.g., Cronbach’s Alpha) with a view to developing a measure of this kind of statistical literacy. We will also explore correlations with the other two tests, as well as with student, teacher, and school-level characteristics.
Additionally, Statistics Portugal’s regular engagement with school groups through educational visits provides opportunities to develop complementary applications of the emerging measure (e.g., in Kahoot!s) and to explore its potential for fostering dialogue and critical reflection with young users of official statistics. For instance, following full GDPR compliance and ethical approval, we could randomly assign students to control and experimental groups, assign the latter to a treatment in the form of viewing a video on misinformation, and assess whether the measure differs between the two groups as a result of this treatment.
This study opens up the possibility of transforming Statistics Portugal’s outreach activities, such as school visits and the ALEA project, into a setting for methodological innovation, supporting the development and testing of tools to assess and strengthen young users’ resilience to misinformation, thereby contributing to institutional trust.
This proposal falls under Topic 3 of the conference programme – Quality as a foundation for building and maintaining user trust – in particular, subtopics 3.7 – Enhancing user engagement, communication, and user-centric dissemination – and 3.8 – Data management, data ethics, and combating misinformation.
Tiago Santos
Statistics Portugal
Read more
Read less
A Senior Statistician at Statistics Portugal’s dissemination unit, Tiago Santos focuses on statistical literacy initiatives and public engagement around official statistics. He also serves as a juror for Portugal’s main diversity, equity, and inclusion institutional certification and as an expert evaluator for the European Commission’s Citizens, Equality, Rights and Values Programme.
With over two decades of experience in survey research and data analysis, Tiago has worked across public administration, academic research, and the non-profit sector. His current work centres on developing statistical literacy tools and fostering critical engagement with official statistics among young users, with a particular focus on trust, quality, and resilience to misinformation.
He holds a pre-Bologna Master’s degree in Sociology from NOVA School of Social Sciences and Humanities, where he taught research methods for several years.
Presentation title
FAIR metadata as a pillar for digital data collection
Official statistics increasingly need to rely on complex, multi-source digital data collection systems that must support quality, transparency and long-term reuse.
Read more
Read less
Building such systems often requires the integration of legacy systems have been built in silos, without interoperability in mind. An approach to enhancing connectivity between these systems relies on positioning metadata as an active, central component of the data lifecycle rather than a passive description layer. This paper discusses the results of a pilot study designed to document structural and descriptive Enterprise metadata across the full data lifecycle (collect-process-analyse-disseminate) from data sources sitting in different systems, using well-known international metadata standards such as DDI and SDMX and an off-the shelf application. The main principle focuses on building a rich metadata repository based on FAIR (meta)data principles that will enhance cross-system interoperability and support automated discovery and reuse by both humans and machines. The results of this study demonstrate that adopting FAIR metadata as the structural backbone of a data collection system enables more transparent, scalable, and reusable data infrastructures across domains.
Susana Portillo Cruz
Central Statistics Office
Read more
Read less
Susana Portillo has worked in official statistics in the Central Statistics Office (CSO) in Ireland since 2007. From 2015 she has been involved in the design and delivery of the CSO quality strategy through the implementation of a solid Quality Management Framework. She currently leads the unit on Metadata and Quality Training ensuring the harmonisation of questionnaire design and quality reporting across the CSO and their documentation conforming to international standards. She also provides support on metadata and quality techniques to the wider Irish National Statistical System.
Prior to her work in quality, Susana led the unit responsible for the data collection of short-term enterprise statistics, using continuous improvement techniques to migrate statistical production to a process approach, achieving efficiencies in timeliness, response burden and cost while maintaining the quality of the production data.
Presentation title
From Metadata to Usage: Practical Uses of DCAT-JSON-LD for Data Catalogue Dissemination
DCAT expressed in JSON-LD is widely used in official statistics as a metadata exchange format, primarily for catalogue harvesting.
Read more
Read less
National statistical institutes publish DCAT catalogues that are collected by national open data portals and aggregated at the European level. While this approach effectively supports metadata circulation, DCAT is most often used for harvesting purposes and less as a format directly supporting dissemination and usage-oriented workflows.
This poster explores how DCAT-JSON-LD can be used as a pivot format for publishing, aggregating, indexing and consuming data catalogues. The DCAT model provides a standardized structure to describe datasets and their distributions, together with the contextual metadata required for discovery and reuse, such as access modalities, formats, languages and licences. When expressed in JSON-LD, this information can be directly exploited by applications without proprietary transformations or intermediate schemas.
Based on a concrete implementation developed within the Onyxia ecosystem, the poster illustrates how client applications can directly consume DCAT-JSON-LD catalogues to expose datasets and distributions in a coherent and user-oriented way. It supports multiple catalogues, multilingual metadata and strict compliance with existing standards, including the use of IANA media types. The approach demonstrates how catalogues can be aggregated across institutional levels, forming a continuous DCAT chain from data producers to national and European platforms.
The poster shows how DCAT-JSON-LD catalogues can be indexed and explored using widely adopted search technologies, such as Lucene-based engines. By relying on structured and standardized metadata, datasets and distributions can be filtered and discovered through standard query mechanisms, without redefining domain-specific models. More broadly, such metadata also supports machine-assisted use cases, including AI-based discovery and data selection, by enabling automated processing and integration into AI-oriented workflows.
Finally, the poster emphasizes that data quality and metadata quality are tightly coupled, and that standardization plays a central role in ensuring interoperability. While DCAT is deliberately pragmatic and flexible, this flexibility can lead to heterogeneous practices. The poster highlights the need for shared conventions and discusses current limitations of the DCAT model, such as the lack of standardized descriptions for dataset variables. Producing high-quality, standard-compliant DCAT-JSON-LD is therefore a key factor for sustainable and future-proof dissemination of official statistics.
Dylan Decrulle
Insee
Read more
Read less
I am a software developer at INSEE in the Technical Innovation Unit. I work on Onyxia, an open-source platform for data science environments. I previously contributed to the development of INSEE’s survey systems based on active metadata using the DDI standard. I have a strong interest in open-source software and contribute to projects developed and shared in this ecosystem.
Presentation title
Beyond DCAT: Metadata Needs for Statistical Data
This presentation compares metadata standards with a focus on DCAT and its applicability to statistical datasets.
Read more
Read less
DCAT is widely used across data portals and provides a generic framework for dataset description. However, statistical data often require more detailed metadata than DCAT can offer. These include information about the statistical products (reference metadata), process quality documentation, variable-level descriptions, and controlled vocabularies such as codelists. Such elements are essential for understanding and reusing statistical data but are not natively supported by DCAT. The presentation discusses these limitations and explores how richer metadata requirements can be addressed. Additionally, it considers whether DCAT’s class-property structure can be adapted for internal metadata management. This involves modeling different data classes with specific properties to support more complex metadata. . Practical examples and modeling approaches will be presented to illustrate key findings.
Magdalena Six
Statistics Austria
Read more
Read less
Magdalena Six has extensive experience in quality management at Statistics Austria. For the past year, she has been leading the central metadata management project, in which the concept for metadata management was developed and is currently being implemented.
Presentation title
Administrative Data Integration as a Response to Declining Survey Response Rates: Implications for Quality in Official Statistics
National Statistical Offices are increasingly confronted with declining survey response rates, which undermine the accuracy, representativeness, and reliability of official statistics.
Read more
Read less
Factors such as respondent fatigue, increased privacy concerns, urban mobility, and rising data collection costs have reduced the effectiveness of traditional survey-based data collection methods. In this context, administrative data integration has emerged as a strategic alternative and complementary approach for sustaining the production of high-quality official statistics.
This topic examines how the integration of administrative data can mitigate the adverse effects of declining survey response rates while maintaining the core quality principles of official statistics. It explores both the opportunities presented by administrative data, such as improved coverage, enhanced timeliness, reduced respondent burden, and cost efficiency: and the quality challenges associated with its use, including inconsistencies in definitions, incomplete coverage, limited metadata, and data governance and confidentiality concerns.
The discussion emphasizes the need for robust quality assurance frameworks, strong institutional coordination, and clear data governance arrangements to ensure that administrative data are fit for statistical purposes. It also highlights the evolving role of National Statistical Offices as data integrators and quality stewards within the National Statistical System. Ultimately, the topic underscores that while administrative data integration offers a viable response to declining survey participation, its successful adoption depends on deliberate strategies to safeguard accuracy, coherence, comparability, and public trust in official statistics
Mothusi Ditlou
Statistics Botswana
Read more
Read less
Mothusi TC Ditlou is a professional Statistician in Botswana with extensive experience in official statistics, data quality management, and evidence-based decision-making. He currently serves at Statistics Botswana, where he has contributed significantly to the production, coordination, and dissemination of high-quality statistical data used for national planning and policy formulation.
With a strong background in survey operations, administrative data use, and field coordination, Mothusi has played a key role in strengthening data collection systems and promoting statistical integrity at both national and satellite office levels. He is particularly passionate about improving data quality, modernising statistical processes, and enhancing the use of statistics for sustainable development.
Beyond his technical work, Mothusi is actively involved in professional and labour advocacy, demonstrating leadership, accountability, and commitment to public service. He is recognised for his diligence, teamwork, and dedication to advancing the credibility and relevance of official statistics in Botswana
Presentation title
Coherent population forecasting 2024 - 2053 for the Republic of Croatia using the demography package in R
Based on Croatian population data, mortality rates, and fertility rates for the period 2001–2024 published by the Croatian Bureau of Statistics (CBS), stochastic population forecasts for the next 30 years are calculated using the demography package and adapted R code written by Rob Hyndman.
Read more
Read less
The paper presents the forecasting procedure and results and further demonstrates the applicability of R packages in official statistics. The forecasting approach integrates a large amount of data released by the CBS.
Lidija Gligorova
Croatian Bureau of Statistics
Read more
Read less
The author holds a Bachelor of Science degree in Mathematics, Statistics, and Computer Science. She has been working in the CBS Sampling Unit since its establishment in 1994, where she is responsible for sampling, imputation, and statistical analysis.
CO-AUTHOR: