5 June 2026
09:00 – 10:15
ŠIBENIK VI
Presentation title
Use of Machine Learning techniques to estimate population frame by integrating different administrative sources. Application to tourists movements at borders
In recent years, the integration of diverse data sources has become increasingly relevant to achieve the highest possible accuracy in measuring phenomena that impact various official statistical operations.
Read more
Read less
In this case, two types of digital devices—inductive loops and cameras—are used to count vehicles, complemented by two surveys that provide information on the number of occupants per vehicle and the time they have crossed the border. All these sources must be combined to estimate the total number of people crossing the Spanish border and the number of trucks. This information serves as the basis for build the population frame in the official statistics of tourist movements at border.
The proposed solution is an end-to-end process where a predictive model is trained to estimate the number of occupants per vehicle based on survey data. Then, this model is applied to vehicle counts, resulting in a precise estimation of passenger flows.
Quality measurement is essential throughout the process. Therefore, a set of metrics is generated to evaluate model performance and analyse data integrity. These metrics are compiled into an automatically generated report each time the process is executed. The report includes graphs and tables that facilitate monitoring and maintenance of the executed subprocesses. To further assist survey personnel in reviewing the report, eight Key Performance Indicators (KPIs) have been designed and implemented as alert mechanisms. These indicators address aspects such as input and output quality, execution times, and accuracy of the predictions.
This presentation will cover both the methodological framework and the implementation of the fully automated end-to-end process, as well as the structure and content of the final report and the KPIs.
Sandra Barragán
Statistics Spain (INE)
Read more
Read less
Sandra Barragán is Official Statistician at Statistics Spain since 2016. Her academic journey reflects her passion for statistics. She pursued her studies in the field of Statistics from the undergraduate level to the attainment of a doctoral degree, all at the University of Valladolid. Sandra currently holds the position of Head of Unit at the SG for Methodology and Sampling Design. In this capacity, she carries out various projects focused on selective editing, machine learning techniques, and the development of standardized and modular production processes.
CO-AUTHORS:
Presentation title
Intra-EU trade statistical burden reduction at Istat
Under the ESS VISION 2020 programme, the National Statistical Institutes of the European statistical system have undergone a major modernisation process, which benefits from digital transformation and new data sources.
Read more
Read less
Due to the high statistical burden on respondents in the statistical sector of intra-EU trade, the Intrastat survey was heavily involved in the modernisation process.
In 2022, a new statistical data ecosystem was created by exchanging microdata regarding intra-EU exports. This interoperable statistical system is based on collaboration between national statistical authorities, which contribute to the collection, transfer and reuse of data by adopting a harmonised common framework of statistical concepts and tools.
At the same time, new sources of timely administrative data have been introduced thanks to the digital transformation process promoted by tax authorities. In particular, under the project called ViDA (VAT in the Digital Age), EU tax authorities are implementing a new system for real-time digital reporting, based on electronic invoicing. Italy is one of the few EU Member States where this system is already in place and includes reported invoices for intra-EU acquisitions of goods. Studies and analyses have been conducted on the use of ViDA data for the production of economic statistics, particularly those relating to international trade in goods. Particular attention has been given to the quality of the estimates obtained. The shift from statistical processes based solely on surveys to a mixed process exploiting both administrative data sources, such as electronic invoices, and exchanged statistical microdata, represents a significant development in the field of goods international trade statistics.This new approach reduces the statistical burden on respondents while maintaining data quality. Indeed, using the exchanged microdata makes it possible to minimise asymmetries in European statistics, thereby improving data accuracy and overall confidence in official statistics. However, the success of the new data production system depends on the timeliness and availability of new data sources.
This paper outlines the transition to the new data production system, highlighting the advantages and challenges of the implemented methodology.
Maria Serena Causo
Istat
Read more
Read less
The presenting author is a senior researcher at ISTAT with long-term experience in external trade statistics. She has been heavily involved in the activities of the ITGS (International Trade in Goods Statistics) working groups, which are promoting the modernisation and harmonisation of trade statistics processes at a European level.
CO-AUTHORS:
Presentation title
Enhancing the quality of maritime statistics with AIS data
There is increasing interest within the statistical community in leveraging Automatic Identification Systems (AIS) data for maritime statistics.
Read more
Read less
AIS is an automatic tracking system used on ships extensively employed in the maritime domain for safety and management purposes. AIS signals provide information on vessel identity, geo-localisation, speed and direction of navigation at regular and frequent intervals. Their high temporal/spatial granularity and timeliness are expected to enhance the quality of statistics in this domain. Nevertheless, AIS data need to be transformed for statistical purposes and cleaned for dealing with the errors affecting the data. This process is neither immediate nor straightforward, and for this reason their introduction into the production processes of National Statistical Institutes is still generally under study.
Istat has invested in the use of AIS data in the last years, and has developed a process for their transformation. Currently, they are used as a comparative data source to resolve critical cases observed in official data, which result from the integration of data from the General Command of Port Authorities and a complete enumeration survey of all ships engaged in commercial activities with a gross tonnage greater than 100. As important examples of their concrete usage, they are employed to identify and resolve duplications, to rectify voyage-related information, such as for instance the vessel's port of origin.
In this paper, we present and analyse the impact of the use of AIS as an auxiliary comparative data source on the quality of official data used for Maritime statistics at Istat. In addition, we discuss the first preliminary analyses related to the use of AIS data for improving the timeliness of the maritime statistics. An important component of the quality of estimates that we plan to investigate in depth in the near future.
Marco Di Zio
Istat
Read more
Read less
Marco Di Zio is research manager and component of the Istat scientific committee. He has worked and published papers on data integration, data editing and imputation. He is currently involved in Istat projects concerning the use of non-traditional doata sources for improving official statistics.
CO-AUTHORS:
Presentation title
Efficiency and quality in official business surveys under industry 5.0: an experimental study using an extended TSE paradigm
A focused operational process leveraging business digitalisation for official statistical purposes cannot be considered exhaustive, but must be framed as part of a broader multi-source strategy for collecting business data.
Read more
Read less
Within such a strategy, traditional survey-based data collection remains necessary in the medium term—both for enterprises with low levels of digitalisation and for variables not well suited to automated acquisition (e.g. qualitative variables). For highly digitalised companies, however, digital quality management acts as an enabler of standardisation, interoperability, and scalability, transforming data retrieval and transmission into repeatable, auditable, and controlled processes capable of handling increasing data volumes and heterogeneous sources without compromising statistical quality.
Against this backdrop, Istat designed an experimental study to investigate the potential of automation enabled by the Industry 5.0 paradigm. The study focuses on two complementary dimensions: (a) variables that are particularly suitable for automated acquisition because they are generated by integrated digital systems and are not directly accessible via alternative sources (e.g. administrative registers); and (b) enterprises with high levels of digitalisation, representing potential candidates for automated data collection techniques involving AI-based models and machine-to-machine (M2M) transmission.
Within this experimental framework, the work illustrates: (a) the design of the experimental study, with specific reference to the variable “industrial production in volume”. The study is articulated into two phases. The first investigates, through structured interviews with major ERP providers operating in the Italian market, the statistical potential of ERP-generated data, focusing on data standards, time granularity, metadata availability, and harmonisation requirements. The second phase involves enterprises participating in the monthly industrial production survey in order to assess compliance with EU methodological standards, the consistency of values produced by automated extraction, and the degree of acceptance by respondents.
(b) Furthermore, the work proposes a quality evaluation model that integrates: (i) multi-source quality assessment methods, following the approach of de Waal et al. (2021), oriented towards estimating output accuracy in a multi-source context; and (ii) an extended version of the traditional Total Survey Error (TSE) paradigm (Groves et al., 2009) and the ESS Quality Framework, aimed at analysing how automation enabled by Industry 5.0 may affect the different dimensions of these paradigms, following the conceptual developments of Puts et al. (2024).
Based on the experimental evidence collected, the analysis formulates general considerations on how traditional quality assessment paradigms for business surveys could be integrated to accommodate multi-source data acquisition practices and digital automation processes.
Diego Distefano
Istituto Nazionale di Statistica (ISTAT)
Read more
Read less
I hold a degree in Economics and Business Administration from ‘La Sapienza’ University of Rome, and subsequently specialised in Marketing Management. Since 1998, I have been employed at the Italian National Institute of Statistics (ISTAT), where I have worked extensively on structural business statistics concerning medium-sized and large enterprises, as well as on surveys of large multinational enterprises. I have also served as Head of the Office of the Director of Economic Statistics, including annual operational planning and three-year strategic planning.
In recent years, my contribution have focused on the design of strategies aimed at identifying alternative data sources and innovative solutions to optimise the production processes of official business statistics. In particular, my work has focused on assessing the readiness of different segments of Italian enterprises for the adoption of digitalisation processes associated with the development of artificial intelligence models and the Industry 5.0 transition.
CO-AUTHORS:
Presentation title
Towards more stable and transparent small‑area estimates in Eustat: 2025 methodological update.
This work presents the 2025 revision and enhancement of small area estimation models used in the survey on Information Society in Enterprises carried out by Eustat (Basque Statistics Institute).
Read more
Read less
Since the early 2000s, direct, indirect, and model‑based estimators have been developed, in collaboration with the University of the Basque Country (EHU), to generate reliable small‑area indicators for business statistics at the regional level. In recent years, logistic regression models fitted based on the pseudo-likelihood function considering sampling design weights have been applied at the unit level to produce results for sub regional areas. However, the progressive incorporation of new indicators on digitalisation within the business sector posed methodological challenges, prompting a comprehensive review of the modelling framework.
The paper describes the principal methodological aspects considered in the 2025 revision. First, the modelling strategy assigns an independent model to each indicator, allowing the specific behaviour and data structure of each phenomenon to be captured more accurately . Second, the universe of analysis focuses on establishments with ten or more employees, using data from the 2019–2023 period to ensure consistency with the auxiliary information available. Third, the set of covariates has been expanded with newly available structural and administrative information—such as foreign trade, institutional classification, turnover categories, and additional descriptors of the legal unit—improving the predictive capacity of the models, measured by means of the area under the roc curve adjusted by sampling weights (2023, Iparragirre, A., Barrio, I., & Arostegui, I.).
A complementary analysis explored the incorporation of contextual territorial characteristics—specifically linguistic composition at the municipal level—as an additional explanatory factor. This information proved relevant in explaining territorial patterns and contributed to improved alignment between model based estimates and previously published results. The study also highlights the usefulness of calibrating model based estimates to higher level domain totals to ensure coherence across statistical outputs.
These changes lead to estimates that are more stable across time and territories and include the computation of design-based bootstrap confidence intervals, strengthening the transparency and interpretability of the results. Publication plans for 2026 include the release of updated estimates, enhanced dissemination tables incorporating confidence intervals, and an expanded methodological report.
The methodological advancements presented here provide a more robust, flexible, and transparent framework for producing small area indicators on digitalisation within the business sector, supporting better territorial analysis and evidence based decision making.
Marta Mas
Eustat - Basque Statistics Institute
Read more
Read less
Marta Mas holds a Bachelor’s Degree in Statistics from the University of Zaragoza and a Bachelor’s Degree in Market Research and Techniques from the Universitat Oberta de Catalunya (UOC). She has developed her professional career in the field of official statistics in the Basque Country, carrying out various research and development projects at the Basque Statistics Institute (Eustat) and working mainly in the areas of education and employment statistics. She currently works as a statistician in Eustat’s Methodology department and is responsible, among other tasks, for the R&D project on small-area estimation.
CO-AUTHORS: