COnVIDa - COVID19 data monitoring in Spain
Introduction
COnVIDa is a tool developed by the Cybersecurity and Data Science Laboratory at the University of Murcia (Spain) that allows easily gathering data related to the COVID19 pandemic form different data sources, in the context of Spain, and visualize them in a graph. Contact us at convida@listas.um.es.
How to use COnVIDa
In order to use this tool, first we have to select the dates range for which we want to collect data.
Then we will select the Autonomous Communities and/or Provinces of Spain that interest us.
PFinally, we simply select the data items within each fuente de datos (COVID19, INE, Movilidad, MoMo y AEMET) that we want to know and automatically all selected data will be displayed in the main temporal and regional graphs, as well as their respective summary tables. It is important to note that when you move the mouse over each data item, a description of the data will be displayed.
COnVIDa offers two types of data visualisation: temporal and regional. In the temporal display (make sure that the panel is activated) the daily values will be shown in the graph for those temporal data for which information is available (the statistical data of the INE do not make sense here). For example, if we select COVID19 cases, smoking rates, mobility in parks, observed deaths, and insolation; in Murcia, Madrid, Cuenca, Granada and Spain as a whole; from 21/02/2020 to 21/01/2021; the X-axis will be divided into the days between these two dates, while the Y-axis will show the types of data selected for these geographic locations. As the data may have different scales, it is possible that some variables may make other variables insignificant in the overlay, but the graph can be interactively explored in detail in the upper right hand corner.
Additionally, it is also possible to change the type of graph, choosing between line graph or bar graph.
On the other hand, the regional display is subdivided into two panels. On the left, the data are grouped by selected regions and aggregated into boxplots (taking into account the data series for the selected time range). Once the data is plotted, it is possible to easily change the scale of the graph, either linear or logarithmic. The logarithmic scale is useful for simultaneously displaying data series with different orders of magnitude. On the right, a national map is displayed showing the selected regions whose statistical data can be directly compared. Only one type of geographical granularity (the whole country, autonomous communities, or provinces), one measure (the mean, maximum, minimum, or principal percentiles), and one variable can be represented on the map at a time.
Finally, each summary table shows, as the name suggests, a statistical summary of each of the selected data items, including: a count of the data, the arithmetic mean of the data, the standard deviation, the minimum, the 25th percentile, the median, the 75th percentile and the maximum value of the series.
As can be seen, two buttons are offered to download either all the data collected according to the criteria specified by the user or the summary table. COnVIDa offers the possibility to download either of these two data tables in CSV, XLS, JSON and HTML formats.
Data sources
Current version of COnVIDa includes 5 data sources related to the COVID19 pandemic in Spain. These data sources are:
- COVID19: The first source of data from which to obtain relevant information about the COVID19 pandemic in Spain are all those data items actually associated to sch pandemic published by Escovid19data en Github (which in turn retrieves such data from open data repositories such as those collected manually in a shared online spreadsheet and the Instituto de Salud Carlos III). From here COnVIDa takes the data items of recovered, cases, PCR confirmed, test confirmed, deaths, hospitalized, ICU, cumulative incidence, vaccines, etc.
- INE: The second relevant data source considered in COnVIDa is INE (Instituto Nacional de Estadística). From this source some remarkable data items are collected such as physical activity, body mass index (BMI), tobacco consumption, household by family type, households by occupation density and over 65 years old alone.
- Movilidad: The next data source included in COnVIDa has to do with the citizens’ mobility. In this case the original data sources are actually two: Google COVID-19 Community Mobility Reports and Apple COVID‑19 - Mobility Trends Reports. From the first data source we obtain the citizens’ mobility in different spaces such as: i) grocery and pharmacy, ii) parks, iii) residential, iv) retail and recreation, v) transit stations and vi) workplace. In turn, from the second data source we retrieve the citizens’ mobility while driving their vehicles.
- MoMo: Another data source with remarkable information about the pandemic is the mortality monitoring system MoMo handled by the Instituto de Salud Carlos III. From this source we obtain the daily observed deaths for each region, as well as the lower and upper bounds of such series. Additionally, the daily expected deaths per region are also collected, as well as the 1st and 99th percentiles of such series.
- AEMET: Finally, COnVIDa also includes meteorological data stemming from the AEMET (Agencia Estatal de Meteorología). In particular, this tool allows querying daily values about rainfall, maximum pressure, minimum pressure, maximum gust, isolation, maximum temperature, mean temperature, minimum temperature, wind speed, altitude and gust direction.
As stated previously, when passing the mouse over each data item, a description of such item will be automatically displayed.
Source code
COnVIDa has been developed from its very conception as an Open Science project with the aim and spirit of serving and assisting anyone who might need it in the context of the COVID19 pandemic in Spain. In this regard, all the project source code is publicly accessible through the next repository, where a developer manual is also included:
https://github.com/CyberDataLab/COnVIDa
Limitations
COnVIDa was born from the Cybersecurity and Data Science Laboratory of the University of Murcia (CyberDataLab) as a disinterested response to the critical situation generated by the pandemic. Thus, in spite of the involvement and technical capabilities invested, the project has limitations such as the dependence on external sources to collect data (which may fail or have invalid values), small bugs in the web page, or certain impurities in the visualisation of the data.
References
Enrique Tomás Martínez Beltrán, Mario Quiles Pérez, Javier Pastor-Galindo, Pantaleone Nespoli, Félix Jesús García Clemente y Félix Gómez Mármol. COnVIDa: COVID-19 multidisciplinary data collection and dashboard Journal of Biomedical Informatics. March, 2021.
I will be updating the blog post as improvements are made to the tool. Thank you for your time and attention. Feel free to contact me with any questions or suggestions.
Enrique Tomás