BACK TO SCHEDULE SPECIAL

Session 21

Collaborating With Artificial Intelligence and Machine Learning for Quality

4 June 2026
14:15 – 15:45
ŠIBENIK II

Presentation title
Collaboration for Quality: An Overview of the One Stop Shop for AI and ML for Official Statistics project
Artificial Intelligence and Machine Learning for Official Statistics (AIML4OS) is an ESSnet collaborative project involving 16 countries, which aims to develop a comprehensive set of resources for implementing AI/ML-based solutions in statistical production.

Read more Read less The project is scheduled to run for four years, having launched in April 2024. It draws on the capabilities and expertise of all the countries in the consortium to address the dominant topics and challenges surrounding AI that NSIs are faced with today.

The broad range of resources that are being developed include methodologies, guidelines, sandboxes, labelled data, processes, and frameworks. The project also aims to provide support and guidance for integration and maintenance, and to develop communities around open-sourced solutions which can share ideas and experiences.

Central to the success of the project as it marks its halfway point has been the effective collaboration across the project, which has underpinned the quality and relevance of its outputs. While the project has been structured into thirteen workpackages covering specific functions and use cases, we have had excellent collaboration and knowledge transfer across the project, with examples such as the development of shared standards, and the comparison of methods in different jurisdictions. This presentation will describe how collaboration has helped to deliver on quality, and how the project will serve ESS staff over the coming years.

This presentation will introduce work packages from the project which exemplify the theme of “Collaborating for quality”, with areas of application ranging from functional to technical to methodological.

Main author / Presenter
Eimear Crowley
Central Statistics Office

Read more Read less Eimear Crowley is a Statistician in the Data Science division in the Central Statistics Office in Ireland and the Project Co-ordinator for the AIML4OS project with the responsibility of supporting, coordinating and facilitating the consortium activities in the project.


CO-AUTHOR:

Brendan O'Dowd, Central Statistics Office

Presentation title
A New Way to Learn Machine learning and AI: The European AIML4OS Funathon, a Non-competitive Hackathon
The effective adoption of data science, machine learning and artificial intelligence in official statistics is essential for improving productivity and modernizing statistical systems.

Read more Read less A major challenge, however, lies in providing training that leads to lasting skill acquisition. Traditional classroom-based courses, typically 1-3 days, often prove insufficient: participants have limited opportunities for hands-on practice, tend to forget newly learned methods quickly, and such formats do not scale well across organizations.

To address these limitations, the French National Statistical Institute (NSI) introduced a three-part training strategy in 2021. This strategy combines self-paced e-learning through introductory notebooks supported by trainers, online masterclasses for advanced topics, and an annual non-competitive hackathon called the “Funathon.” The Funathon was designed as an inclusive, low-pressure training event aimed at helping participants practice and strengthen their data science skills in a collaborative setting. Its success at the national level led to the organization of a European edition under the AIML4OS project, funded by Eurostat, scheduled for May 27–28, 2026.

The Funathon is a two-day virtual event focused on learning rather than competition. Its non-competitive nature is central to its design, as many statisticians do not feel confident enough to participate in traditional hackathons. By removing competitive pressure, the Funathon encourages participation from individuals with diverse professional backgrounds and skill levels, while fostering teamwork and continuous support from experienced data scientists.

Each edition of the Funathon revolves around a specific thematic topic, such as Airbnb data, global warming, agricultural statistics, or flight data. Organizers prepare several projects with varying levels of difficulty, along with starter notebooks in both Python and R to guide participants and suggest extensions. Participants collaborate using a cloud-based data science platform (SSP Cloud), GitHub for code sharing, and online communication tools that allow organizers to provide real-time assistance. The event concludes with an optional session where teams can present their work.

From 2021 to 2024, each Funathon attracted 150-200 participants organized into 30-40 teams. Satisfaction levels were very high, with most participants expressing a desire to take part again. Key success factors included flexible registration, extensive communication, a clearly defined general theme, and constant technical and methodological support. Main challenges involved onboarding participants to technical tools, selecting appropriate project difficulty levels, and managing the substantial preparatory workload.

Building on this experience, the European AIML4OS Funathon will focus specifically on ML and AI, offering European statisticians practical exposure to modern AI/ML tools and cloud-native environments.

Main author / Presenter
Olivier Meslin
Insee

Read more Read less Data scientist in Insee.

Presentation title
MLUtils: an Official Statistics oriented common interface for Machine Learning
The use of machine learning techniques is becoming increasingly common in Official Statistics, and they will soon be an important part of the production pipelines.

Read more Read less While there are many R and Python libraries that provide a wide variety of machine learning techniques and implementations, they have some drawbacks for their use on official statistics production. First, every library has its own syntax, and if we want to replace one method with another, we might end up changing the script. Second, it is usually not straightforward to address some of the problems that arise in applications of machine learning to official statistics, like model-assisted estimation of aggregates with complex survey designs, the need to enforce restrictions between variables or the use of models for semicontinuous variables.

MLUtils is a library (with R and Python versions), aimed at providing a common interface for the machine learning tasks that might take part in the standard processes of a statistical office. Its purpose is to standardize the use of machine learning across official statistical processes, while offering functionalities that are often essential in this domain but uncommon in general-purpose machine learning applications. The library follows an object-oriented design and is highly modular, making it straightforward to extend and incorporate more sophisticated techniques as needs evolve.

The improvement in the quality of statistical production achieved by using MLUtils is twofold. On the one hand, by standardizing the use of machine learning in the production workflows it facilitates reproducibility, reduces implementation effort, and promotes harmonization of methodologies across teams and departments. On the other hand, the techniques implemented in the package allow us to use in an easy way new uncertainty quantification measures and quality indicators specifically tailored to official statistics, like design-based predictive inference.

Main author / Presenter
Luis Sanguiao
INE (Spanish NSI)

Read more Read less Luis Sanguiao-Sande holds a PhD in Mathematics (2006) and is a statistician specializing in time series analysis, seasonal adjustment, and statistical learning methods for official statistics. His work focuses on the integration of administrative and alternative data sources into statistical production. He has published in several journals, such as Journal of Official Statistics and Sankhyā A, and has presented his work at international conferences, including the World Statistics Congress.


CO-AUTHOR:

INE Methodology Unit, INE (Spanish NSI)

Presentation title
Generalising Earth Observation AI/ML pipelines for European statistics
Several statistical institutes, but also other parties, are developing AI/ML pipelines using Earth Observation data, thereby showing promising results.

Read more Read less The goal of our project is to investigate whether these AI/ML pipelines can be generalized to other countries/regions (space) and/or timeframes (time) and under what conditions, so that they can also be applied by other statistical institutes on their area of interest. This would result in a lot of efficiency (build once, run everywhere). After a thorough selection process, we selected the land cover model of IGN, France, and the crop mapping model of GUS, Poland.

For land cover, IGN produced four models with different configurations (small/big and RGB/IRT) on which quality evaluation experiments have been conducted (photo-interpretation, comparison with external maps, etc). Each of the participating countries (Austria, Denmark and Italy), successfully installed and ran some of the models locally on some parts of the country (NUTS2). The results have been analysed and are very promising. To increase the processing speed, the pipeline has been installed on the Copernicus Data Space Ecosystem (CDSE) cloud infrastructure. It will be executed on the selected areas and the results analysed, which are expected mid 2026.

For crop type, GUS prepared a generic process pipeline for preprocessing Sentinel 1 and Sentinel 2 data. For the participating countries (Austria, Ireland, Netherlands and Portugal), the area of interest has been selected (NUTS2). Samples were taken and the needed images were manually coded. The pipeline has been installed locally by the countries, including training, executing and validating the models. The results have been analysed and are very promising also. Also the crop type model will be installed and executed on the CDSE cloud infrastructure. The results are expected mid 2026.

Considering the size of the earth observation data (petabytes) and the required computing power, it is clear that most national statistical institutes will need to run their earth observation models on platforms like the CDSE. Neither the land cover nor the crop type model require sensitive data. However, other models may require it, and European and local legislation prevents this data from leaving the statistical environment, despite the fact that the CDSE offers a secure environment (encrypted private buckets). To secure the data and still use the processing power of the CDSE, we are looking into applying Privacy Preserving Techniques (PPT).

Main author / Presenter
Remco Paulussen
Statistics Netherlands

Read more Read less Remco Paulussen is senior project manager of Eurostat funded projects, like Earth Observation (AIML4OS and GEOS), Modernisation of Agriculture Statistics (MAS), and previously Smart Surveys (SSI) and Big Data (BD1, BD2). These projects are characterised by the collaborations of many partners, not only within the statistical community, but also beyond.


CO-AUTHORS:

Fabrizio de Fausti, ISTAT
Anatol Garioud, IGN
Marko Roos, Statistics Netherlands
Przemysław Slesiński, GUS

Cookies

This website uses cookies to ensure you get the best experience.

x