Innovating with new data sources from digital footprints has become an important part of responding to the pandemic. New data sources can provide speedier and newer kinds of insights not available from traditional methods in managing short- and longer-term challenges. Forthcoming NIRAS Data Futures Hub research will show how new digital data sources can add value to international development as a whole. This article provides an overview of its conclusions in respect of responding to the pandemic.
The NIRAS Data Future’s Hub in partnership with DFID’s Frontier Technologies Hub has been carrying out the Frontier Data Study for the UK’s Department for International Development (DFID) to identify the best emerging data opportunities for DFID and how to best operationalise them.
The full study will be released soon (follow us on LinkedIn for news on its release or our main web page). Our research has identified that there are three categories of new digital data sources that, in terms of the quality of the data they produce, have the most potential to help in international development. With the right approaches to risk management, these can be utilised now to add value in many different contexts:
- Earth Observation (EO) data (such as from satellites and geo-sensors)
- Passive locational data from mobile phones, and
- Artificial Intelligence (AI) techniques to analyse digital datasets.
The Study will provide detailed user guidance for these sources and look in more depth at the strengths of weaknesses of the many options we now have in the new data landscape, including setting out other emerging data opportunities that require some significant developments before they can be confidently used.
Examples of innovations with the highest potential data sources which could be applied in the developing world
Prevention and surveillance
- AI-derived data can be used to detect early possible outbreaks and track transmission by analysing patterns across new and traditional data sources (Eg: social media, news reports, airline ticketing, animal and plant disease networks and official proclamations).
- Location data from mobile phones can give crucial data on risk profiling locations, identifying actions that will reduce disease spread, monitoring social distancing compliance and influencing factors, and tracing at-risk people exposed to infected COVID-19 individuals.
- EO data is crucial for understanding the location of transport networks and commercial ports, their real-time use, and where closures need to be made to restrict spread.
- A Lancet publication used AI on social media and news reports from DXY.cn to reconstruct the progression of the COVID-19 outbreak at the patient-level.
Diagnosis and treatment
- AI-derived data can speed-up patient diagnosis of COVID-19 reducing pressure on overwhelmed hospitals. AI is being used to identify possible drugs targets.
- Chinese hospitals used AI software to read CT lung scans and look for signs of pneumonia caused by coronavirus.
Mitigating economic impacts
- Location data from mobile phones can show the social and economic consequences of movement-restriction measures on different sectors, by monitoring changes in movement within different work areas.
- EO data can monitor changes in key outcomes related to economic activity and human well-being. AI-derived data from EO, passive locational data from mobile phones and combined with other data sources can be used to predict economic impacts and explore relationships.
- Location data from mobile phones has not yet been widely used for this application but examples exist from other sectors (Eg: assessing the employment impact of auto-mobile factory closures.
- EO data can monitor changes in trade (by tracking cargo moving through ports) and economic activity within commercial and industrial areas (Eg: by monitoring road traffic and air pollution).
The Data Futures Hub is also about to launch our Corona Virus Knowledge Hub, bringing together our pick of the global innovations underway and a live dashboard of some of the most authoritative sources of data.
Key guidance points from the Frontier Data Study in using these new data sources
There is no magic wand in using digital data sources. The benefits and risks need to be weighed up, and the risks managed according to each need for data in specific contexts and the characteristics of each data source.
Passive locational data from mobile phones (automatically collected by mobile network operators or from smartphone apps) has huge potential, particularly in combination with EO data. This includes helping reduce the spread of COVID-19 and understanding the economic impacts by understanding when, where, and with whom people move. But it has not yet been used extensively, not least because of privacy concerns. There are signs that governments are willing to relax privacy rules for a range of digital data with the aim of rapidly releasing data for research. While this may help in making data available and ensuring they can be anonymised to protect privacy, the ethical risks need close management. Moreover, a relaxing of the rules may change, at worse increase, the overall risk environment in the long term.
AI-derived data, such as through machine learning, has already been widely used during the pandemic and has provided some useful insights to inform decision-making. As opposed to passive locational data, there is also huge scope to harvest the data that individuals actively input to smartphones and other devices, such as gathering opinions shared on social media and even targeting information tailored to different population groups. But AI-derived data need to be used with caution due to a range of significant data quality and ethical reasons.
The best insights for informing decision-making, in terms of data quality, are likely to come from combining new data sources and combining them with traditional data such as surveys and admin data (such as hospital records). Ethical risks, however, potentially grow with the increase in the volume, interconnectivity, and variety of data held about individuals and their traceability to individuals. The long-term legacy of ill-thought action in this area needs to be carefully considered. Not least in terms of human rights and people’s willingness to share any personal data in the future. Moreover, in developing countries in particular, investments in traditional sources, such as crucial admin data, will need even greater attention as public sector organisations are stretched in dealing with increased demands for their services and data.
Context, official statistics, and focus on the bigger picture are essential
While official statistics can be slow, the Data Futures Hub recommends that data innovation should always be seen in the context of official statistics:
- On one hand, current weaknesses in traditional official sources in developing countries should not discount them from consideration. These statistics may have inherently better data quality in numerous aspects and are often already integrated or are more ready for integration into decision-making by local/national actors. It may be better to focus on innovation for improving the timeliness and other quality aspects of some official statistics rather than innovating with new data sources. This must be weighed-up in each case.
- On the other hand, in many cases, new data sources could add significant value in complementing official statistics, if approached correctly. Moreover, innovation in new data sources could inform longer term sustainable improvements in national data infrastructures. Official statistics are likely to continue to be the backbone of data-systems for development-related decision making for the foreseeable future, and they need constant improvements in capacity-building.
- In any case, innovating with new data sources benefits hugely from bringing in highly developed approaches to managing data quality, methodological insights, and learning around evidence-based policymaking from official statistics – data is data, we shouldn’t throw the baby out with the bath water.