CERENA and INESC-ID will organise the “Webinar Series in Spatial Data Sciences …think spatially about your data science problems”, from 26th May to 23rd June 2021. The webinars will take place on Wednesdays, at 12.30PM (WEST).

Have you ever wondered why a given phenomenon happens at a specific location? What tools do we have to model and predict phenomena with an important spatial component? Spatial Data Sciences shed light into these questions.

Spatial Data Sciences is an interdisciplinary area of knowledge that integrates traditional Data Sciences methods – Machine Learning and Artificial Intelligence – with spatial analysis methodologies such as Geographic Information Systems, Geostatistics and Remote Sensing to understand, characterize and manage big Spatial Data. This set of tools is essential for areas that require, for example, predicting human behavior, spatial consumption patterns, pandemic dynamics, climate. Spatial data science methods are now an essential part of the IT core business of companies like Google, Uber, and Amazon. This webinar cycle covers different methods and applications of Spatial Data Science. This webinar series covers a wide range of methods and applications of Spatial Data Sciences.

**Agenda**

- May 26th – Geostatistical COVID-19 infection risk maps for Portugal. Maria João Pereira (CERENA)
- June 2nd – Geospatial Data Disaggregation with Convolutional Neural Networks. Bruno Martins (INESC-ID)
- June 9th – The role of Volunteered Geographic Information (VGI). Jacinto Estima (INESC-ID)
- June16th – A spacetime model of an alert system for detecting anomalous incidence values of COVID-19. Leonardo Azevedo (CERENA)
- June 23rd – Modeling the geospatial evolution of COVID-19 using spatio-temporal convolutional sequence-to-sequence neural networks. Arlindo Oliveira (INESC-ID).

Seminars are moderated by Amilcar Soares (CERENA).

Attendance is free.** **Registration is required. The Zoom link will be sent by email to registered participants.

### Abstracts

**Geostatistical COVID-19 infection risk maps for Portugal**

The rapid spread of the SARS-CoV-2 epidemic has simultaneous time and space dynamics. This behaviour results from a complex combination of factors, including social ones, which lead to significant differences in the evolution of the spatiotemporal pattern between and within countries. Usually, spatial smoothing techniques are used to map health outcomes, and rarely uncertainty of the spatial predictions are assessed. As an alternative, we propose to apply direct block sequential simulation to model the spatial distribution of the COVID-19 infection risk in mainland Portugal. Given the daily number of infection data provided by the Portuguese Directorate-General for Health, the daily updates of infection rates are calculated by municipality and used as experimental data in the geostatistical simulation. The model considers the uncertainty/error associated with the size of each municipality’s population. The calculation of daily updates of the infection risk maps results from the median model of one ensemble of 100 geostatistical realizations of daily updates of the infection risk. The ensemble of geostatistical realizations is also used to calculate the associated spatial uncertainty of the spatial prediction using the interquartile distance. The risk maps are updated daily and show the regions with greater risks of infection and the critical dynamics related to its development over time.

**Geospatial Data Disaggregation with Convolutional Neural Networks**

Demographic and socio-economic statistics are widely available on a variety of subjects. Still, the data are often collected or released for highly aggregated geospatial areas, masking important local hotspots. When conducting spatial analysis, one often needs to disaggregate the source data, transforming statistics for a set of source zones into values for a set of target zones, with different geometry and a higher spatial resolution. In this work, we report on a novel dasymetric disaggregation method that uses encoder-decoder convolutional neural networks similar to those used in image segmentation (i.e., models inspired by the popular U-Net), to combine different types of ancillary data when deriving the dasymetric weights. Model training constitutes a particular challenge, given that disaggregation tasks do not entail the direct use of supervision signals, in the form of training instances mapping the low-resolution aggregated data into the corresponding high-resolution representations. We propose to address the problem through self-training or co-training, iteratively refining initial estimates from seminal disaggregation heuristics by training a single model over progressively better estimates, or using the results of one model to support the training of another. We conducted experiments related to the disaggregation of socio-demographic variables collected for Continental Portugal, originally available for coarse-grained administrative divisions and into raster cells with a resolution of 200m. The results show that the proposed approaches outperform baseline methods, including other regression models to infer the dasymetric weights. Our experiments also highlight the impact of different training strategies, e.g. involving different loss functions and/or regularization schemes.

**The role of Volunteered Geographic Information (VGI)**

Until early 2000, geographic information was produced solely by official/governmental mapping agencies. During the first half of 2000, a revolution started to take place. A number of enabling technologies (e.g., GPS accurate position, Web 2.0, among others) became available empowering users to start making their own maps and contribute to mapping initiatives. Such data has been first coined by Michael Goodchild back in 2008 and since then it has been growing exponentially. It has helped research, humanitarian mitigation and rescue, mobile apps development, among many others. However, these data have raised concerns, particularly related to their quality given the lack of formal qualifications and expertise of their contributors. In this talk, I will explore the concept of VGI and other related terms, discussing also the main challenges related to this type of data. I will also show some applications and the most recent work we have been developing at INESC-ID.

**A space time model of an alert system for detecting anomalous incidence values of COVID-19**

In the current pandemic context, predicting anomalous incidence values at the local level is important to devise effective mitigation measures and optimize resources allocation. An anomalous incidence value of covid-19 is the one that exceeds above a threshold the predicted or expected value at one given spatial location. Anomalous incidence values differ from its neighbours, as they have a spatial and/or distinct temporal behaviour. We propose a methodology to detect anomalous incidence values, by accounting for the spatio-temporal behaviour of covid-19 incidence values of a given region. For this aim, one propose a spatio-temporal model to predict local probability distribution functions of incidence values of covid-19. A genetic programming model is used for the temporal prediction and the spatial component is modelled with a stochastic sequential simulation. The incidence of Covid 19 during September 2020 – March 2021, in Portugal, is used to illustrate the methodology.

**Modeling the geospatial evolution of COVID-19 using spatio-temporal convolutional sequence-to-sequence neural networks.**

We proposed the use of a previously developed methodology and official municipality level data from the Portuguese Directorate-General for Health (DGS), relative to the first twelve months of the pandemic, to compute an estimate of the incidence rate in each location of mainland Portugal. The resulting sequence of incidence rate maps was then used as a gold standard to test the effectiveness of different approaches in the prediction of the spatial-temporal evolution of the incidence rate. Four different methods were tested: a simple cell level autoregressive moving average (ARMA) model, a cell level vector autoregressive (VAR) model, a municipality-by-municipality compartmental SIRD model followed by direct block sequential simulation and a convolutional sequence-to-sequence neural network model based on the STConvS2S architecture. We conclude that the convolutional sequence-to-sequence neural network is the best performing method, when predicting the medium-term future incidence rate, using the available information.

More information:

elisapirescosta@tecnico.ulisboa.pt