Exploring the AI+Web3 Landscape: Opportunities and Challenges from Infrastructure to Business Models

2025-07-07 07:58:50

AI+Web3: Towers and Squares

TL;DR

Web3 projects with AI concepts have become targets for capital attraction in both primary and secondary markets.
The opportunities of Web3 in the AI industry are reflected in: utilizing distributed incentives to coordinate potential supply in the long tail------across data, storage, and computation; at the same time, establishing open-source models and a decentralized market for AI Agents.
AI is mainly applied in the Web3 industry for on-chain finance (cryptocurrency payments, trading, data analysis) and assisting development.
The utility of AI + Web3 is reflected in the complementarity of the two: Web3 is expected to counter AI centralization, while AI is expected to help Web3 break boundaries.

Introduction

In the past two years, the development of AI seems to have been pressed on the accelerator, and this butterfly effect triggered by ChatGPT has not only opened up a new world of generative artificial intelligence but has also stirred up great waves in the Web3 field.

With the support of AI concepts, financing in the crypto market has significantly boosted. According to statistics, 64 Web3+AI projects completed financing in the first half of 2024, among which the AI-based operating system Zyber365 secured $100 million in Series A funding.

The secondary market is even more prosperous. Data from cryptocurrency aggregation websites shows that in just over a year, the total market value of the AI sector has reached $48.5 billion, with a 24-hour trading volume close to $8.6 billion. The positive impact of mainstream AI technology advancements is evident; after the release of OpenAI's Sora text-to-video model, the average price of the AI sector rose by 151%. The AI effect has also radiated to one of the cryptocurrency fundraising sectors, Meme: the first AI Agent concept MemeCoin------GOAT quickly gained popularity and achieved a valuation of $1.4 billion, sparking an AI Meme frenzy.

Research and topics on AI + Web3 are equally hot, from AI + Depin to AI Memecoin to the current AI Agent and AI DAO, the FOMO sentiment can no longer keep up with the speed of the new narrative rotation.

AI+Web3, this combination of terms filled with hot money, trends, and future fantasies, is inevitably seen as a capital-matched arranged marriage. It seems difficult for us to discern beneath this glamorous exterior, whether it is a playground for speculators or the eve of a dawn explosion?

To answer this question, a key consideration is whether it will improve with the other party involved. Can we benefit from the other party's model? In this article, we attempt to examine this pattern: how Web3 can play a role in every aspect of the AI technology stack, and what new vitality AI can bring to Web3?

Part.1 What opportunities does Web3 have under the AI stack?

Before we dive into this topic, we need to understand the technology stack of AI large models:

In simple terms, the entire process can be explained as follows: a "large model" is like the human brain. In the early stages, this brain belongs to a newborn baby, which needs to observe and absorb a vast amount of external information to understand the world. This is the "data collection" stage. Since computers do not have the multi-sensory capabilities of humans, before training, the large-scale unlabelled information from the outside world needs to be transformed through "preprocessing" into a format that can be understood and used by computers.

After inputting data, the AI constructs a model with understanding and predictive capabilities through "training", which can be seen as the process of a baby gradually understanding and learning about the outside world. The model's parameters are like the language abilities of a baby that are constantly adjusted during the learning process. When the learning content begins to be categorized or when feedback is received from communication with people and corrections are made, it enters the "fine-tuning" phase of the large model.

Once children grow up and start speaking, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of AI large models, which can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve problems through language abilities, which is also similar to how AI large models, after completing training, are applied to various specific tasks during the reasoning phase, such as image classification and speech recognition.

The AI Agent is closer to the next form of large models—able to independently execute tasks and pursue complex goals, not only possessing thinking abilities but also memory, planning, and the ability to use tools to interact with the world.

Currently, in response to the pain points of AI across various stacks, Web3 has initially formed a multi-layered, interconnected ecosystem that covers all stages of the AI model process.

1. Basic Layer: The Airbnb of Computing Power and Data

Hash Rate

Currently, one of the highest costs of AI is the computing power and energy required for training and inference models.

Meta's LLAMA3 requires 16,000 NVIDIA-produced H100 GPUs (which are top-tier graphics processing units designed for artificial intelligence and high-performance computing workloads) 30 days to complete training. The latter's 80GB version costs between $30,000 and $40,000, necessitating a computing hardware investment of $400 million to $700 million (GPU + network chips), while monthly training consumes 1.6 billion kilowatt-hours, resulting in energy expenses of nearly $20 million per month.

The release of AI computing power is also one of the earliest intersections of Web3 and AI------DePin (Decentralized Physical Infrastructure Network). Currently, the DePin Ninja data website has listed over 1,400 projects, among which representative projects for GPU computing power sharing include io.net, Aethir, Akash, Render Network, and more.

Its main logic lies in the fact that the platform allows individuals or entities with idle GPU resources to contribute computing power in a decentralized manner without permission. By creating an online marketplace for buyers and sellers similar to Uber or Airbnb, it increases the utilization of underutilized GPU resources, allowing end users to obtain efficient computing resources at a lower cost. At the same time, the staking mechanism ensures that resource providers will face corresponding penalties if there are violations of quality control mechanisms or network interruptions.

Its characteristics lie in:

Gather idle GPU resources: The suppliers are mainly independent small and medium-sized data centers, excess computing power resources from operators of cryptocurrency mining farms, and mining hardware using the PoS consensus mechanism, such as FileCoin and ETH miners. Currently, there are also projects dedicated to starting lower-threshold devices, such as exolab utilizing local devices like MacBook, iPhone, iPad, etc., to establish a computing power network for running large model inference.
Facing the long-tail market of AI computing power:

a. The "technical end" decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities brought by ultra-large cluster scale GPUs, while inference has relatively lower requirements for GPU computing performance, such as Aethir focusing on low-latency rendering work and AI inference applications.

b. The "demand side" of small computing power demanders will not train their own large models separately, but will only choose to optimize and fine-tune around a few leading large models, and these scenarios are naturally suitable for distributed idle computing power resources.

Decentralized ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments according to demand, while also generating income.

Data

Data is the foundation of AI. Without data, computation is as useless as floating duckweed, and the relationship between data and models is akin to the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For the training of current AI models, data determines the model's language ability, comprehension ability, and even values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:

Data hunger: AI model training relies on massive data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching trillions.
Data Quality: With the integration of AI into various industries, the timeliness, diversity, specialization of vertical data, and the incorporation of emerging data sources such as social media sentiment have raised new demands for its quality.
Privacy and compliance issues: Countries and companies are gradually becoming aware of the importance of high-quality datasets and are imposing restrictions on dataset scraping.
High data processing costs: large data volume and complex processing. Public information shows that over 30% of AI companies' R&D costs are used for basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

Data Collection: The free provision of scraping real-world data is rapidly depleting, and AI companies' expenditures for data are increasing year by year. However, this spending has not been returned to the true contributors of the data, as platforms fully enjoy the value creation brought by the data. For example, Reddit has achieved a total revenue of $203 million through data licensing agreements with AI companies.

The vision of Web3 is to allow users who truly contribute to participate in the value creation brought by data, and to acquire more private and valuable data from users at a low cost through distributed networks and incentive mechanisms.

Grass is a decentralized data layer and network that allows users to run Grass nodes, contributing idle bandwidth and relayed traffic to capture real-time data from across the internet, and earn token rewards;
Vana has introduced a unique Data Liquidity Pool (DLP) concept, allowing users to upload private data (such as shopping records, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize specific third parties to use this data;
In PublicAI, users can use #AI或#Web3 as a classification label on X and @PublicAI to achieve data collection.

Data Preprocessing: In the AI data processing process, the collected data is often noisy and contains errors. It must be cleaned and converted into a usable format before training the model, involving standardization, filtering, and handling missing values in repetitive tasks. This stage is one of the few manual processes in the AI industry, giving rise to the profession of data annotators. As the model's requirements for data quality increase, the entry barrier for data annotators also rises, and this task is inherently suitable for the decentralized incentive mechanisms of Web3.

Currently, Grass and OpenLayer are both considering incorporating data annotation as a key step.
Synesis proposed the concept of "Train2earn", emphasizing data quality, where users can earn rewards by providing labeled data, annotations, or other forms of input.
The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.

Data Privacy and Security: It needs to be clarified that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages and potential application scenarios of Web3 privacy technology are reflected in two aspects: (1) Training with sensitive data; (2) Data collaboration: Multiple data owners can jointly participate in AI training without sharing the original data.

The currently common privacy technologies in Web3 include:

Trusted Execution Environment ( TEE ), such as Super Protocol;
Fully Homomorphic Encryption (FHE), such as BasedAI, Fhenix.io or Inco Network;
Zero-knowledge technology (zk), such as Reclaim Protocol using zkTLS technology, generates zero-knowledge proofs of HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still exploring. A current dilemma is that the computing costs are too high, for example:

The zkML framework EZKL takes about 80 minutes to generate a proof for the 1M-nanoGPT model.
According to data from Modulus Labs, the overhead of zkML is more than 1000 times higher than pure computation.

Data Storage: Once the data is available, a place is needed to store it on the chain, as well as the LLM generated using that data. With data availability (DA) as a core issue, before the Ethereum Danksharding upgrade, its throughput was 0.08MB. However, training AI models and real-time inference typically require a data throughput of 50 to 100GB per second. This magnitude of difference leaves existing on-chain solutions struggling when facing "resource-intensive AI applications."

0g.AI is a representative project in this category. It is a centralized storage solution designed for high-performance AI needs, with key features including: high performance and scalability, supporting fast upload and download of large datasets through advanced sharding and erasure coding technologies, with data transfer speeds approaching 5GB per second.

2. Middleware: Model Training and Inference

Open Source Model Decentralized Market

The debate over whether AI models should be open-source or closed-source has never disappeared. The collective innovation brought about by open-source is unmatched by closed-source models.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

9 Likes