Breakthroughs to Provide Access to Reliable Data

During our discussion on barriers to access reliable data and meeting with Brain Trust members, we came across the following constraint:

Forecasting and planning is compromised by the lack of reliable, rapid, and complex data and integrated data infrastructure, while data ownership structures lack clarity.

What do you think are some of the breakthroughs we can expect by 2040, which could provide transparency on data ownership and help access to:

  • Quick reliable data
  • Integrated data infrastructure

@djaffe, @swihera and @mprakhar - We would love to hear your thoughts on this discussion.

Hi @danfortin, @MachineGenes and @taras - Given your vast experience in AI and Data, we want to know your point of view on breakthrough solution to have access to reliable data and integrated data infrastructure in relation to climate change.

1 Like

Hi Everyone - this is an excellent observation and an insight. And we already have an answer!
I am happy to inform everyone that our company OPT/NET is participating in a new EU Horizon 2020 project called CENTURION, for which the Grant Agreement has just been signed and project achieved the funded status last week!
This multimillion 3 year long project will deliver the publicly accessible platform with support for both commercial and public projects in European Union (and worldwide in the future) which is actually based on the early prototype developed by an AI Xprize team OptOSS!
The first iteration of the platform prototype is expected to be available for early (alpha) testers and a select number of committed use cases already in the late 2021 (due to high technological maturity of its main elements). The selected use cases range from climate induced changes to the land with Flood monitoring applications, Argi-forestry monitoring, maritime activity monitoring among others.

We are planning a broader announcement in online media in the coming weeks as a part of a whole consortium and a new website will be launched imminently. I will keep everyone posted!
Kind regards,

1 Like

Hi @taras,

That’s great news, Congratulation!

@DanSelz and I would be happy to know more details of the platform and its working once the announcement is made. Please do keep us posted we would love to promote it via our alumni newsletter as well. Thanks.

1 Like

Hi Shashi, over the Xprize contest, we had difficulty finding reliable data pertinent to soils and water (surface and groundwater) in public areas such as EPA or similar organisation. It is often very noisy and difficult to harness. We have developed some tools over time but are still facing ownership challenges as our partners (engineering firms) don’t owe the data and they have to obtain their customers permission to use it. It will be the case for any initiative related to climat change. EPA is open to let us use their historic data now but it took time and efforts to convince them.

1 Like

Thanks @danfortin for sharing your thoughts on this discussion. We would further like to understand what do you foresee as an ideal solution for such problems?

Hey @Shashi, I can definitely advise: apart from my AI background, I was a former analyst/senior analyst in the Australian Government, working in climate change for six years in projections (future economic and emissions modelling) and reporting from time to time to the UN (UNFCCC). However, as a direct consequence of this experience my advice is likely to be very pessimistic.

With all due respect, from the perspective of someone who’s been involved in emissions reporting at a government level, the immediate problem is not data infrastructure.

The key problems are:

  • Data: is not reliable. Even for something as simple as emissions reporting. One of the big challenges with modelling climate change is the fact that standards of data reporting and quantitative modelling are wildly variable across the globe. This is a major problem that the United Nations Framework Conventions on Climate Change (UNFCCC) staff continue to struggle with. Even with ongoing permanent attempts at data quality control, the results are problematic, even among G20 countries that one would expect can be resourced properly. To some extent this is a political problem: some governments are determinedly not interested in rigorous reporting (typically ones that think that they can game the UN treaties to their own financial advantage in terms of rebates etc.). Let’s assume we can solve this.

  • Model equations: not really known with certainty. Even though organizations such as the Tyndall Centre in the UK do heroic work on modelling, the reality is that climate involves a large number of massively nonlinear equations operating at multiple spatial and temporal scales simultaneously. And even so, major effects remain poorly understood. (If you want to really ruin a climate modeller’s day, simply ask about the role of clouds in climate. Then, if an assertion is made, ask them for proof. Still not well understood. And once we’ve got that sorted, enquire about the role of turbulence, at multiple scales of thermodynamics and fluid dynamics. ) In the interim, most of the ‘representative’ models beloved of economists are frankly utter bunk, that don’t withstand any sort of rigorous scrutiny. But assume we can fix this. Which brings us to the next problem…

  • The Butterfly Effect. Yes, that one. First discovered by Lorenz and colleagues, christened to describe the wildly uncertain nature of extremely simple climate models, whereby a slight change in initial conditions causes radical changes in model predictions. Gave birth to aspects of Chaos Theory-- sensitive dependence upon initial conditions. Vastly worse with much more realistic high-complexity equations, above. This imposes a (relatively short) horizon on the validity of prediction. How short? I doubt anyone knows, although again, last I heard, the Tyndall Centre is working hard on the interplay between multiple spatial and temporal scales of time-series data.

  • Assuming a set of representative (highly nonlinear, highly coupled, high-dimensional) equations for global thermodynamics has been agreed by COP (UNFCCC Council of the Parties, the treaty governments concerned), with appropriate spatial and temporal resolution, and appropriate sensor data is agreed as input to this predictive model, with a workable predictive horizon established… then the question then becomes, what is the global sensor grid resolution required to make this all happen? And then, how do you build that?

Assuming all of this is established, then that’s the time to start talking about integrated data infrastructure. And then, perhaps we might start talking about a role for AI etc.

There’s a lot of mathematics and high-performance computer modelling to be done before the priority becomes a data infrastructure/ownership one. Frankly, I think there should be a huge prize for anyone who manages to get an appropriate climate model with rigorous resolution of the problems listed up above, to a level of maturity where it might be regarded as authoritative by COP. Because nobody is there yet.

1 Like

Thanks @MachineGenes for these deep insights on this topic. Just to understand this area further we would like to know what is the level of innovation activity and Investment in R&D in this area .

@Shashi, there’s a lot of innovation activity in this modelling area, by research NGOs such as the Tyndall Centre, the Max Planck Institute etc.

However, although I’ve been out of the loop for a while, I suspect this activity is brittle. Their government-level funding is precarious. Also, wider institutions like the so-called carbon market in the EU are far more fragile than they appear to external onlookers, as carbon pricing relies (1) on all relevant stakeholders agreeing on an artificial pricing structure for greenhouse gases, (2) the global and national economies being sufficiently strong for everyone to agree to sustain this pricing structure, without ‘leakage’, and (3) no external shocks that disrupt this.

The extent to which the post-COVID global economy will still support a carbon market remains to be seen, especially given that if we’ve globally passed the point where an abatement-only strategy can stabilize climate change-- and an increasing number of scientists are suggesting this has already happened-- then carbon markets will collapse, as abatement and offsets become worthless. (Ironically, when approaching such a threshold, greenhouse gas sequestration technology such as that of the Carbon XPRIZE challenge becomes almost priceless, as the only way of avoiding collapse in the conventional carbon market.)

As far as I know the conventional VC investment in mathematical modelling of climate change is essentially nil, as the result would be a public good rather than a commercial commodity. So it looks a good area for an XPRIZE, to create or encourage creation of that public good?

1 Like

Hi @cayhap, @RicRothenberg, and @JourneyBinder - Given your vast experience in AI and Data, we want to know your point of view on breakthroughs to have access to reliable data and integrated data infrastructure in relation to climate change.

As a starter of this thread, I would like to answer this question - you can not hope that someone would give you the reliable data for free, this will not happen unless funded by a very generous government. But even then, there are limits to the extent. Imagine if malicious actors start using the high-quality data for reaching their evil goals… Also, good data is hard to find and reliable data is very hard to create. We call this Analysis Ready Data (ARD) Cubes and ARD Hypercubes in our CENTURION project.
These will be offered as a product/service to the subscribers of the platform we build.
How to get the data - simple answer - you just have to go and collect it yourself for now. At least till our platform opens up for broader audiences, and we are woking very hard on this. Bear with us… For example, please see below how we performed the analysis of the humanitarial crisis of the century - the unprovoked Russian war against Ukraine - the data we are collecting is coming from the many sources and processed with our AI platform and visualised for the human analysts to make decisions. There are almost 10,000 geolocated and confirmed events in this dataset.

As time goes by, we will build a vast dataset covering as many catastrophes as is technically feasible with the funding we can obtain.