A Better Approach to Organizing Data Marketplaces
By Osho Jha, co-founder and CEO of dClimate
Why Build a Decentralized Marketplace for Data?
As our modern economy becomes more and more reliant on large amounts of data, it becomes increasingly important to use marketplaces as a mechanism for organizing data by bringing together consumers and providers.
Selling data as a good and analysis of that data as a service has moved from a bespoke business to a relatively commonplace practice in the industrial world. Data and derived analytical products are abstract goods whose value depends on the ability to structure and organize complementary products as opposed to concrete goods - physical or packaged software - whose value is self-contained. By borrowing marketplace infrastructure which works for concrete goods, data marketplaces suffer from a lack of flexibility and ultimately limit the way consumers and providers can generate value from disparate data sets.
Marketplaces are collections of decentralized entities gathering in a common area in order to exchange goods and services in return for some form of value. In particular, marketplaces facilitate the ability to conduct peer-to-peer transactions. Our approach at dClimate is to use the peer-to-peer nature of blockchain technology to increase efficiency of data marketplaces and facilitate greater value extraction from data.
An API Can’t Fix This
Application Programming Interfaces, or APIs, are a popular and efficient way of serving data to client side applications. However, it is important to remember that APIs are a tool and not a solution for all problems. In order to understand the limitations of an API in the context of a marketplace, consider that an API is essentially a centralized server serving data to a client side application and where there is centralization there is an arbiter of norms. Most data providers have some form of an API available to clients who want to programmatically access data.
In the case of a data marketplace, how does one enforce norms and standards in terms of data structuring? If one sets norms and standards for data structuring, how does one attract multiple vendors and make the marketplace easy to use for data providers instead of a burden on providers forcing them to make a decision on how much they want to reformat their own data API vs. the value they expect to gain by participating in a data marketplace? For a marketplace facilitator - in this case dClimate - having multiple vendors with differing API’s becomes a burden because how do we create easy to use end points for new clients who are increasing their data intake without forcing them to access multiple disparate API’s in their application? The value generated from indexing and organizing data erodes and disappears. So the answer is clear: remove the API layer.
dClimate runs an API, which allows developers working with legacy or traditional systems to easily access the data they need. Similar to the marketplace, the dClimate API serves ZARR datasets and legacy formats are being converted into ZARR in order to fit into this architecture. Our post on ZARR and its implementation details the benefits it brings to the end user. Despite these similarities, the dClimate marketplace is architecturally separated from the API. By building a decentralized marketplace on its own ZARR implementation, we can allow decentralized applications to call the data directly from IPFS as opposed to depending on a centralized API run by dClimate. For context, IPFS is somewhat akin to a Bittorrent for data storage - in our case, this helps us implement a P2P marketplace with easy data sharing that scales query performance as more nodes spin up: put simply, the more nodes serving a data set the faster the query becomes. This performance is further boosted by the ZARR implementation allowing us to have a decentralized data infrastructure that competes with legacy centralized solutions in an industrial use case.
As we look forward, it becomes clear that API’s are a bottleneck to more efficient and robust application development by limiting the decentralization possible in an application and by creating friction between data consumers and data providers. Where API’s allow for quick access to data from a single provider they become a burden for serving data from multiple providers. API’s also pose security and maintenance problems and are a classic example of sacrificing robustness for speed. Consider the example of multiple vendors serving a mix of free and paid data. How does one white list for paid data all under the same API? You could embed this information in an API key but it is messy, requires regular maintenance, and requires a central party, such as dClimate, to not only maintain a database of who is paying for what but it also makes it difficult for dClimate to operate without having to assess the vendor’s data - which puts the vendor in difficult place as they have to expose their highly prized data.
Software Development Kits, or SDKs, fix this problem and allow for data serving to be unified under a simple package which can take decryption keys for paid data. An SDK on top of a decentralized marketplace allows ease of access on top of the index and organization. We can also implement validation scoring which allows the consumer to see a grade of the data set’s quality without having the vendor share a sample or expose any of their live data. SDKs ultimately lead to a frictionless data acquisition and sales process which benefits both the data consumers and the data providers.
Financialization of Data
While the above has explored how marketplaces and particularly decentralized marketplaces benefit both data consumers and providers, another foundational goal of a marketplace is to bring liquidity to the market. In the context of data marketplaces, liquidity means facilitating purchases/usage of data but also facilitating capital for increasing data mass. Because the underlying data infrastructure of a decentralized marketplace is decentralized, we have the ability to implement NFT (non fungible token) structures around the data set. This allows for ease of transferring ownership of a data set and financializing data. It has become increasingly obvious that data is the fuel for technological growth - indeed, you will still sometimes hear the cliche that “data is the new oil” and we encourage you to cringe accordingly. However, NFT’s allow for treating data as an asset as opposed to something abstract that is merely an input in a more valuable asset. Looking at traditional tech, how much of a large social media network’s or streaming company’s value is tied up in its UI or content versus the data it has on its user base’s habits? That data should have its own ascribable value.
Consider an example where a data provider on the dClimate network has a crucial forecasting dataset for New York City flooding. There are a number of different entities and stakeholders that would find that data interesting and pay for it, thus generating cash flow for the data provider. By having the data asset itself structured as an NFT, the provider has a few interesting options that were not open to them before:
- They can offer to sell exclusive rights to the data for a higher price and lock the NFT after a sale.
- They can sell their rights to the data to another interested party such as a real estate developer or even a private equity shop that wants that exclusive cash flow.
- They can use a DeFi protocol to securitize and borrow against the cash flows of their data sales to invest in new hardware and software for another model they are working on.
This flexibility brings more value to the data provider than can be offered in a typical centralized marketplace all without the intervention of a centralized party managing the marketplace. Additionally, by allowing a smaller data vendor to easily monetize a smaller data set which is an important part of a whole, we can prevent the current trend of data being acquired and owned by large entities creating a data monopoly - something much of traditional tech is suffering from.
Conclusion
By building our marketplace in a decentralized and modular way we are not only helping ease the burdens around accessing climate data, but we are setting forth new infrastructure for marketplaces as a whole. In our analysis, building specialized data marketplaces on blockchain offers significant benefits for data consumers and data providers alike compared to legacy, centralized marketplace solutions. The robustness offered by the SDK approach to serving data only adds to this easy user-experience for those looking to programmatically access data. This allows a marketplace to live as part of a programmatic approach to data access and application development. Lastly, by structuring data assets as NFT’s we are allowing data owners to utilize their data as a financial asset which can bring more capital to the data collection process. We believe that these ideas and the code behind the marketplace is extendable to many different domains where data access is currently limited and opaque such as genomics and anonymized transaction data.