BACK TO SCHEDULE REGULAR

Session 40

Innovating Data Collection Methods and Sources

5 June 2026
10:30 – 11:45
ŠIBENIK VI

Presentation title
Ensuring high quality through use of innovative methods, such as AI, to improve industry and occupation coding for online surveys
As the UK transitions its key labour market survey to an online-first approach, new challenges have arisen, such as the coding of industry and occupation data.

Read more Read less These topics have traditionally been collected in face-to-face surveys with interviewer intervention to probe responses and assist with correct coding. For this new challenge, the Transformed Labour Force Survey (TLFS) team at the UK’s Office for National Statistics are exploring multiple solutions to enable collection and processing of high-quality industry and occupation data. These include 1) the Classification Index Matching Service Multi-Digit Algorithm (CIMS MDA), a post-processing tool that uses a hybrid deterministic and machine learning model for automated coding to mimic clerical coder behaviour; 2) a layered traditional Search-As-You-Type questionnaire approach that takes into account respondent apathy and understanding of the large and complex coding frames which offers dynamic look-up lists to reduce ambiguity; and 3) Survey Assist, a generative AI tool that creates dynamic questions for the respondent based on their responses and assigns codes. Alongside this we are developing more user-friendly versions of the coding frames using evidence collected through censuses, surveys and further user research.

Each of these methods entails unique challenges, such as reliance on external data to train models, ethics, sustainability and environmental impacts that must be taken into account when designing a final solution. Alongside this the solution must consider trade-offs to balance risk, burden, cost and data quality. This is a pathfinder project for using a combination of techniques to solve complex technical challenges in the online survey space. If challenges can be successfully navigated, learnings and techniques may extend the application of these methods to other complex variables.

The TLFS team will present findings from new research from both qualitative and quantitative testing of these innovative solutions through 2025 and 2026; this includes an unmoderated test of an AI prototype questionnaire, findings from cognitive testing and quantitative testing of the Search as You Type method with a modified activities frame, and evidence from different phases of the application for the CIMS MDA tool. Quality of the innovations is being assessed against a clerically coded evaluation dataset, and using a quality framework that considers both the statistical quality and operational acceptability of these independent and integrated solutions. Use of this framework will ensure a thorough quality assessment of solutions, including respondent experience, accuracy, sustainability, and therefore and their accepted application within the Transformed Labour Force Survey.

Main author / Presenter
Debbie Curtis
Office for National Statistics

Read more Read less Dr Debbie Curtis is the Head of Research and Development, and Responsible Business Owner for industry and occupation coding within the UK's Transformed Labour Force Survey (TLFS). Her current role focusses on the transformation of the existing Labour Force survey, with particular focus on complex variables. Debbie has over 18 years' experience of working within Social Surveys in the UK's Office for National Statistics, with previous roles leading the COVID Infection Survey research and design, and acting as the Responsible Business Owner for social survey quality. Prior to her work within this field, Debbie worked as a physicist within the nuclear industry where she completed her PhD on Advancements in Nuclear Waste Assay.


CO-AUTHOR:

Clare Cheesman, Office for National Statistics

Presentation title
Methodological and Quality-Management Aspects of the Transition to New Data Sources for Price Statistics
The production of price statistics, i.e.

Read more Read less an estimation of consumer price indices (CPIs), is undergoing a fundamental transformation driven by the availability of new data sources. National Statistical Offices (NSOs) are progressively moving away from traditional price collection methods, which are based on in-store visits and manual recording of prices, towards the use of large-scale transactional data. Scanner data obtained directly from retailers and e-shops transaction data from e-commerce platforms offer substantial advantages in terms of economic accuracy, as they reflect prices actually paid by consumers and capture real purchasing behaviour rather than shelf-prices alone. Albeit new data sources, i.e. scanner data and e-shops transaction data, are more economically accurate, a transformation brings big challenges for NSOs. The transition to new data sources represents not only a methodological change but also a significant challenge in terms of quality management. This paper showcases both the methodological and the quality management aspects of transforming the production environment for estimation of consumer price indices using new data sources at the Statistical Office of the Slovak Republic. The integration of new data sources requires robust governance frameworks to ensure statistical quality, transparency, and sustainability. The key elements of this transformation include the development of new infrastructure, which needs to be automated for regular big data transfers, and the establishment of legal and institutional arrangements with private data providers, which must define a data ownership. The other aspects of data collection are confidentiality, frequency, and secure data transfer mechanisms. Ensuring data accuracy and reliability becomes a central task, requiring systematic validation procedures and plausibility checks. Unlike the traditional data collection, transactional data are typically not designed for official statistics, making thorough data understanding and continuous quality monitoring essential. Internally, NSOs must embed new processes into their quality management frameworks by documenting workflows, metadata, and the data quality monitoring. The other important side of the transformation is a change of methodologies for processing data and the estimation of CPIs. The new data allows using the weighted approaches. However, the use of weights can cause biases in the estimated CPIs, or the selection of consumer baskets. Overall, the paper argues that the successful adoption of scanner and e-shops transaction data for compilation of CPIs depends as much on effective quality management and institutional readiness as on methodological innovation.

Main author / Presenter
Peter Knížat
Statistical Office of the Slovak Republic / Comenius University

Read more Read less Peter Knížat holds a PhD from the University of Economics in Bratislava. His research areas are related to analysis of new data sources for official statistics, econometric models and spatial regressions. In his professional career, Peter held various roles in private and public sector, where he used advanced quantitative techniques for credit risk modelling, portfolio analytics, price statistics, nowcasting and econometric modelling. He currently works as an Assistant Professor at the Comenius University in Bratislava, where he teaches the graduate courses on financial mathematics, statistics and mathematical modelling in management. Peter also works as a senior advisor at the Statistical Office of the Slovak Republic. Peter is a member of the editorial board of the Slovak Statistics and Demography scientific journal. He is also a member of the ESS Innovation Network, which is organised by Eurostat.


CO-AUTHORS:

Petra Mazureková, Statistical Office of the Slovak Republic
Helena Glaser-Opitzová, Statistical Office of the Slovak Republic

Presentation title
Redesigning for Quality: How Questionnaire Redesign Improves Data Quality Over Time in Quantitative Business Surveys
Questionnaire design is a key lever to reduce non-sampling errors in official statistics, particularly in complex and recurrent surveys.

Read more Read less Well-designed wording, reconciliation questions and information reminders can help respondents better understand task requirements, prevent reporting errors, avoid subsequent extra burden and reduce the need for editing. Such investments are especially valuable in surveys in which the same respondents are asked to participate repeatedly over time.

This paper examines the long-term effects of questionnaire redesign in the context of quantitative business surveys, with a focus on error dynamics and response burden across multiple survey waves. Drawing on multi-year data collected before and after the redesign, we assess how design innovations—such as enhanced guidance, built-in validation tools, auto-filled reminders, and user-friendly interfaces— support more accurate reporting, enhance participation and reduce burden perception across multiple waves of data collection.

Adopting a multi-year perspective allows us to investigate two critical dimensions. First, we analyse error patterns in quantitative variables to determine whether respondents exhibit persistent inaccuracies or demonstrate progressive improvement over time. Second, we examine whether participation patterns change following the adoption of a less burdensome design, and whether repeated participation is associated with reduced reporting inaccuracy, suggesting a progressive facilitation effect linked to the improved questionnaire design.

Findings highlight that design enhancements can support accuracy improvement. However, the benefits of redesign are not uniformly distributed across respondent profiles. Our results indicate that structural characteristics of respondents, as well as previous exposure to the survey, play a central role in shaping response behavior and quality outcomes.

By combining repeated observations over time with response indicators, this study contributes to the broader discussion on the importance of integrating user-centred design principles and long-term monitoring to achieve sustained improvements in data quality and respondent engagement.

Main author / Presenter
Barbara M.R. Lorè
Istat

Read more Read less Barbara Maria Rosa Lorè is a researcher at the Italian National Institute of Statistics (Istat), currently working in the Directorate for Data Collection. Her professional activity focuses on survey methodology, questionnaire design, and the evaluation of measurement quality in official statistics. She has extensive experience in designing and optimising questionnaires for business, agricultural, and social surveys, with particular attention to non-sampling errors, response burden, and usability across data collection modes. She designs and conducts qualitative and quantitative tests, including cognitive interviewing and web-based experiments, to support evidence-based questionnaire redesign. She has presented her work at several international conferences, including ESRA, UNECE, QUEST, AAPOR, and WAPOR.


CO-AUTHORS:

Sabrina Barcherini, Istat
Valeria Mastrostefano, Istat

Presentation title
Measuring Activity Limitations as Subjective Well-Being: Quality Challenges and Evidence from GALI
Self-reported indicators increasingly inform official well-being reporting through multidimensional, people-centred dashboards used for policymaking.

Read more Read less This makes measurement comparability a central quality requirement. In interviewer-administered surveys, seemingly small design choices such as the number of items used to capture a concept, question placement, and survey mode can translate into substantial differences in estimates, threatening time-series continuity and cross-country comparability.

This paper examines these quality risks using the Global Activity Limitation Indicator (GALI; EU Statistics on Income and Living Conditions (EU-SILC) PH030) as a case study. While GALI is a health-related self-report, it is a core standardised key social variable in EU household surveys under IESS Regulation (EU) 2019/1700. It supports Beyond GDP monitoring of social inclusion, health, and equality, and serves as a key input for the Healthy Life Years (HLY) indicator; consequently, design-related measurement differences can propagate to high-visibility policy statistics.

To establish a robust empirical baseline, we draw on evidence from a bridging (parallel) test implemented under production conditions, where alternative GALI question formats were fielded in parallel, while also varying questionnaire placement and using mixed modes (telephone and face-to-face interviewing). The evidence shows strong sensitivity of estimates to question design. Across 2017–2019, the single-item GALI format classified 39%, 38% and 31% of respondents as long-term limited, whereas a multi-step three-item approach—separating presence of limitation, duration (≥ six months) and severity—classified 20%, 21% and 25% over the same period. These differences align with cognitive testing and fieldwork observations: bundling multiple concepts in one item increases cognitive burden and the risk of misclassification, particularly in interviewer-administered modes.

Building on this baseline, Slovenia has undertaken further methodological work on GALI, including cross-survey coherence analyses across EU surveys where GALI is measured using a three-item approach, with harmonised populations, reference periods, weighting schemes and data collection modes for the 2022–2025 period. These analyses aim to identify survey-vehicle and context effects relevant for quality assurance and will be reported in the contribution. Finally, a parallel implementation in EU-SILC 2026 is planned to support a controlled assessment of potential discontinuities when transitioning between alternative recommended formulations.

The findings underline a key message for official statistics: output harmonisation does not guarantee measurement equivalence when survey design, placement or mode differ. We conclude with practical recommendations for managing the quality of self-reported indicators used in policy-relevant well-being reporting.

Main author / Presenter
Manca Šuštar
Statistical Office of the Republic of Slovenia

Read more Read less I began my career abroad at Intel, where I worked in a research and analytics department focusing on end-customer research related to product performance, brand awareness, marketing, events, and sales. I then continued my professional path at the Centre for Social Informatics at the Faculty of Social Sciences, where I worked on two projects. In the first, I was responsible for the support and development of the 1KA online survey tool. In the second, I participated in the research project “Probability-Based Web Panels in Official Statistics for Persons and Households”, which examined the potential of probability web panels for official statistics. For the past four years, I have been working at the Statistical Office of the Republic of Slovenia in the Demography Statistics and Level of Living Section, contributing to the EU-SILC survey. My work includes survey design and preparation, data analysis, and support for data users and researchers.


CO-AUTHORS:

Ana Božič Verbič, Statistical Office of the Republic of Slovenia
Martina Stare, Statistical Office of the Republic of Slovenia

Presentation title
Outbound Campaigns and Business Survey Response: A Quasi-Experimental Evaluation
Istat relies on a centralized Contact Center, operated by an outsourced provider, to conduct outbound campaigns aimed at reducing nonresponse in business surveys.

Read more Read less While telephone reminders have proven effective in increasing response rates, they entail substantial costs and may increase respondent burden if not appropriately targeted. At present, the selection of units for outbound contact is largely driven by operational considerations rather than by objective criteria that explicitly account for different types of nonrespondents and heterogeneity in response propensities.

To improve the efficiency of outbound campaigns under budget constraints, Istat has launched a project based on the application of quasi-experimental designs to evaluate the impact of telephone follow-ups in business surveys. The study aims to estimate the causal effect of outbound contacts on survey response rates and to identify subpopulations of enterprises for which telephone follow-up is most effective.

Because the available data sources are observational, assignment to the exposure of interest—namely, receiving a telephone follow-up—is not random. As a result, treated and untreated groups often differ systematically with respect to both observed and unobserved characteristics that are associated with response behavior. These selection mechanisms may lead to biased estimates of treatment effects if not properly addressed.

To mitigate these limitations, we adopt a quasi-experimental framework that constructs comparable treatment and control groups using non-random assignment mechanisms, thereby enabling an unbiased assessment of the effect of outbound contacts. Specifically, we propose an extension of standard propensity score methods to estimate treatment effects within predefined firm profiles (subgroups) in observational settings. Unlike conventional approaches that focus on achieving overall covariate balance, the proposed method aims to improve balance within subgroups of substantive interest, allowing for a more detailed investigation of treatment effect heterogeneity.

This approach supports subgroup analyses across key domains commonly used in business surveys, such as economic activity, firm size (e.g., employment or turnover classes) and geographical location. By accounting for structural differences across enterprise segments, the method provides more precise monitoring of the differential impacts of telephone follow-ups across the business population.

Adopting an evidence-based approach to the allocation of outbound campaigns allows statistical offices to better balance data quality objectives, budget constraints, and the sustainability of respondent burden. The paper presents an operational application using Istat’s structural business surveys and demonstrates how quasi-experimental evaluation methods can be effectively integrated into survey data collection processes.

Main author / Presenter
Paola Bosso
Istat

Read more Read less Paola Bosso is a technologist and project coordinator specializing in new statistical data sources and digitalization. Her work focuses on Industry 5.0, the integration of ERP systems into official statistics, and machine-to-machine data transmission for industrial production statistics. She also serves as the AI Solutions focal point for the ISTAT Contact Centre, contributing to the promotion of innovation in public services and data governance


CO-AUTHORS:

Stefano De Santis, Istat
Davide Di Cecco, Istat
Giovanni Gualberto Di Paolo, Istat

Presentation title
Teaching the Dinosaur New Tricks: Reviving Cognitive Interviewing with AI
Teaching the Dinosaur New Tricks: Reviving Cognitive Interviewing with AI

Read more Read less (For Topic 2.2: Innovating Methodological and Quality Assurance Frameworks)

Why This Method Remains Indispensable

Cognitive interviewing is essential for questionnaire quality—ensuring surveys collect valid, reliable data (Cross-Cultural Survey Guidelines, n.d.; Willis & Artino, 2013). The method is especially valuable when introducing new questions on sensitive topics or deploying unfamiliar platforms. Our non-discrimination survey presented both: new questions exploring gender-based discrimination and a new digital platform. Pilot testing systematically verified whether respondents understood questions as intended, revealing interpretation differences both among respondents and between respondents and researchers. Additionally, it identified structural problems such as confusing layouts (Willis, 2005).

The Challenge: Time and Resources

Yet cognitive interviewing is often sidelined as "too resource-intensive" for today's fast-paced statistical production environment (Boness & Sher, 2020; Willis, 2005). It requires substantial methodologist hours for conducting interviews, systematically analyzing interview data, and synthesizing findings into comprehensive reports. The result? A critical quality tool frequently skipped due to tight deadlines and limited resources, not lack of value (Colbert et al., 2019). Like the dinosaur in our title, cognitive interviewing has earned a reputation as too slow and cumbersome for modern statistical production timelines.

What We Learned: Critical Problems Detected

We conducted cognitive interviews with 12 diverse participants at Israel's Central Bureau of Statistics. The pilot surfaced critical problems requiring questionnaire revision: Response scales creating measurement error through unclear differentiation Core concepts interpreted inconsistently, fundamentally affecting validity Reference period violations (telescoping effects) Question flow causing respondents to answer about others rather than themselves Platform-induced attention diversion from content to navigation

Making It Feasible: A Practical Tool

AI-assisted analysis and synthesis reduced report production time by approximately 60%. The technology streamlines the analysis of interview data—consolidating findings, identifying patterns across multiple interviews, and presenting insights from different angles—enabling researchers to reach conclusions more efficiently. This efficiency gain—consistent with recent advances in AI for qualitative data analysis (Morgan, 2023)—makes cognitive interviewing feasible for routine use by freeing methodologists from labor-intensive synthesis work while preserving their time for interpreting findings and improving questionnaire design.

The Bottom Line

Rigorous pretesting remains essential for all surveys—platform constraints simply amplify design problems into major data quality issues. AI doesn't replace cognitive interviewing's insights; it simply makes this essential method faster and more practical for routine implementation. The ancient dinosaur becomes a modern dragon: powerful, practical, and indispensable.

Main author / Presenter
Michal Nir
Central Bureau of Statistics

Read more Read less "Michal Nir is a survey methodologist at Israel's Central Bureau of Statistics with 30 years of experience in survey design and implementation. With an M.A. in Organizational Behavior,she specializes in questionnaire quality, question wording, and interviewer training tailored to survey-specific needs. Her work focuses on ensuring Surveys meet research objectives while maintaining data quality."

Cookies

This website uses cookies to ensure you get the best experience.

x