Artificial intelligence (AI) is making headlines, but decentralized AI (DeAI) faces a critical challenge: the lack of diverse, secure, and verifiable datasets. While on-chain data is integral to DeAI, it remains too limited to train the models required for truly effective AI. This gap in data availability threatens to hand the future of AI to centralized giants, which benefit from access to vast, often unchecked, data sources.
The appeal of traditional AI is its insatiable need for data—the more it has, the smarter it becomes. But this strength also exposes a fundamental weakness: centralized AI models are often trained using data harvested without explicit consent, raising significant privacy and control concerns.
DeAI, rooted in the principles of blockchain decentralization and transparency, promises a more ethical and open approach to AI. However, the majority of data available on-chain stems from financial transactions and decentralized finance (DeFi), leaving small language models especially underserved. These models require more specific and diverse datasets to perform at competitive levels.
Unlike the limited data available in DeFi, the rich datasets of web2, such as The Pile and Common Crawl, contain billions of unique sources that have allowed centralized AI models to thrive. Replicating this volume and diversity of data on-chain, however, isn’t feasible on a practical timeline. While some AI companies have faced accusations of misusing data, there is a potential solution to bring more data into DeAI systems—making it safer to use.
This is where cryptographic techniques like zero-knowledge proofs (ZKPs) come into play. ZKPs, which are already being leveraged for blockchain scalability and privacy, can enable DeAI to access larger datasets securely. Specifically, two methods—zero-knowledge fully homomorphic encryption (zkFHE) and zero-knowledge TLS (zkTLS)—could be key to unlocking the vast stores of web2 data for DeAI.
zkFHE allows AI computations to be performed on encrypted data without decrypting it. This would enable DeAI models to train on sensitive data—such as medical records—without exposing the original information. zkTLS, on the other hand, facilitates the use of data in web2 while maintaining user privacy. It allows users to prove the possession of certain data, like a credit score or social media activity, without revealing the underlying data itself. This technique could be used to integrate secure data from traditional financial institutions into DeAI models.
By combining zkFHE and zkTLS, DeAI could tap into the wealth of web2 data while still upholding the principles of decentralization and privacy. This would create a more competitive environment, giving DeAI the tools to rival centralized AI models and perhaps even exceed their capabilities.
For example, large language models (LLMs), which currently dominate the AI landscape, require massive datasets for training. zkTLS could enable DeAI developers to access publicly available data from the internet in a privacy-preserving way, helping to create more democratic and transparent models.
Despite the promise of zkFHE and zkTLS, implementing these cryptographic solutions is a significant challenge. The technology is computationally intensive and would require major advances in both hardware and software. Additionally, standardization and interoperability are essential for widespread adoption. However, the potential rewards—both in terms of AI capabilities and the promotion of a fairer, more equitable AI ecosystem—are enormous.
In the race for AI supremacy, data is the key resource. With the adoption of cryptographic techniques like zkFHE and zkTLS, DeAI could gain access to the vast datasets it needs to thrive, ultimately leading to a more democratic and equitable AI future.
Related topics:
Watchdog Group Sues SEC Over Withheld Crypto Report
Aptos Introduces Shardines, Unlocking 1 Million TPS for Horizontal Scalability
Arkham Intelligence Partners with Sonic Labs for Blockchain Data Integration