Q: What are the primary components and functions of the Pastel Inference Layer server, and how do they work together to process inference requests?
The Pastel Inference Layer server has two primary components that work together to process inference requests: Credit Pack Tickets and Inference Requests.
Credit Pack Tickets are new Pastel blockchain tickets that users can create to pre-pay for future inference requests.
- Users pay for Credit Pack Tickets with a specified amount of PSL, which is burned upon the ticket's creation.
- The number of credits a user receives is determined by market prices and must be agreed upon by a majority of Supernodes, which are responsible for fulfilling the inference requests.
- Credit Pack Tickets simplify accounting for inference requests by denominating costs in credits instead of PSL.
- These tickets work differently than other Pastel blockchain tickets because their cost is burned rather than going to Supernodes, and their creation obligates the entire Pastel Network, rather than a single Supernode, to honor future inference requests paid for using those credits.
Inference Requests are REST requests made by the inference client to the Inference Layer server.
- The inference client can be written in Python or JS, and the server is a FastAPI application.
- Examples of inference requests include: text completion, image generation, asking questions about images, document embedding, and semantic search.
- The Inference Layer aims to make the request process robust and secure by:
- Ensuring users receive what they paid for, either a new credit pack ticket or inference results.
- Validating each step in creating a new credit pack or fulfilling an inference request.
- Avoiding price surprises by providing quotes before any PSL or credits are spent.
When a user submits an Inference Request, the Responding Supernode (determined by XOR distance) calculates the request's cost in credits and sends the user an InferenceAPIUsageResponse message. If the user agrees to the proposed cost, they send a confirmation transaction, burning a tiny amount of PSL from their designated tracking address, corresponding to the number of credits being spent . The Responding Supernode verifies this transaction, executes the request using the specified model and parameters, and generates an InferenceAPIOutputResult message, containing the output data. Finally, the user can retrieve the results via the appropriate endpoint. Additionally, the user can audit the request response and result to ensure transparency and accuracy.
Importantly, while the Responding Supernode handles the primary interaction, other Supernodes are not kept in the dark. For example, when creating a new credit pack, the Responding Supernode proposes a price quote, but the other Supernodes need to agree on the pricing terms before finalizing the credit pack. This consensus mechanism ensures that the pricing is fair and economically viable for the entire network.
Q: What are the different types of inference requests supported by the Pastel Inference Layer server, and how are their costs calculated?
The Pastel Inference Layer server supports a variety of inference request types, broadly categorized into those handled by external APIs and those processed by Swiss Army Llama, either locally on the Supernode's CPU or remotely on a GPU-enabled instance. The cost calculation for these requests differs based on the model or service used.
For API-based models, costs are primarily determined by the API provider's pricing model. The server incorporates logic to estimate these costs and then converts them into credit pack credits, ensuring users have a clear understanding of the expense in a unified manner.
- Text Completion: The cost is calculated by analyzing the input prompt using the designated tokenizer for that model. This tokenizer counts the number of tokens in the prompt, which is then used in conjunction with the API provider's pricing (e.g., cost per 1,000 tokens) to determine the overall cost.
- Image Generation: The cost is calculated using the pricing structure of the specific image generation API. Factors considered include the desired resolution of the output image and other parameters specified in the request.
- Ask a Question About an Image: The cost calculation for these requests relies on the API's pricing model, taking into account the resolution of the input image and the complexity of the question.
For models hosted using Swiss Army Llama, the cost calculation differs from API-based models. Instead of relying on external pricing, the system aims to quantify the computational resources used.
- Locally Hosted LLMs: The cost is estimated based on parameters specific to each model outlined in the
model_menu.jsonfile. These parameters consider the model's memory and processing requirements. - Remotely Hosted LLMs (e.g., on vast.ai): The cost estimation for remotely hosted models is expected to evolve. The goal is to measure the total time the GPU-enabled instance is used for the request and then apply the average pricing of services like vast.ai to determine a cost that reflects the actual resource utilization.
- Document Embedding: The cost depends on whether the document is processed locally or remotely. Factors considered include the total number of tokens and sentences in the document.
- Audio Transcription and Embedding: The cost is calculated based on the duration of the audio file in seconds.
The sources provide specific examples of how cost is calculated for different models:
- OpenAI GPT-4o: The cost is determined by the number of input and output tokens, the number of completions requested, and a per-call cost associated with the OpenAI API.
- Local LLM for Text Completion: The calculation involves input tokens, output tokens, number of completions, an estimated compute cost, and an estimated memory cost, all derived from the
model_menu.jsonfile. - Local LLM for Embedding Documents: This cost factors in total tokens, total sentences, whether a query string is included (potentially increasing computational needs), and the estimated compute and memory costs.
It's important to note that the cost calculation for Swiss Army Llama models is in a transitional phase, eventually aiming to reflect the actual time spent on the GPU for more accurate pricing.
Q: What security measures are implemented in the Pastel Inference Layer to prevent malicious actors from altering messages or manipulating the credit pack creation process?
The Pastel Inference Layer incorporates various security measures to prevent message tampering and manipulation during credit pack creation. These measures primarily revolve around a system of message exchange, verification, and validation throughout the process.
To guarantee message integrity and prevent alterations, every step in the credit pack creation process involves messages that undergo rigorous verification and validation.
- Message Hashing and Signing: Relevant fields of each message are combined and hashed. This hash is included in the message "envelope" and signed by all involved parties using their PastelID private keys. This method allows for verification of the message's contents, ensuring no alterations have occurred. Any discrepancy in the hash would invalidate the signatures.
- Timestamp and Block Height Inclusion: Each message includes the current UTC timestamp and Pastel block height. Messages are considered invalid if received more than one minute after the timestamp or more than two blocks away from the specified block height. This mechanism prevents replay attacks and ensures messages are current and haven't been intercepted and delayed.
These measures collectively guarantee that requests and responses haven't been tampered with. For instance, if a user requests a credit pack with 1,000 credits for 30,000 PSL, a malicious Responding Supernode cannot alter the request to provide only 100 credits while still receiving the full 30,000 PSL.
The Pastel Inference Layer also addresses potential price manipulation during credit pack creation.
- Price Quotes and User Agreement: Before any PSL or credits are spent, the Responding Supernode provides a price quote to the user. The user must agree to this price before the process continues, ensuring transparency and informed consent. Additionally, users can set a maximum PSL cost for the credit pack.
- Client-Side Price Validation: The client software automatically validates the price quote using the current market price of PSL and the same rules Supernodes use for pricing. If the offered price deviates significantly from the expected range, the client rejects it, even if below the user-specified maximum. This mechanism protects users from unfair pricing.
- Supernode Consensus on Pricing: While the Responding Supernode proposes the initial price quote, the final price requires agreement from other Supernodes. The Responding Supernode shares the credit pack request details with other Supernodes, and a majority must agree on the proposed pricing before finalization. This consensus-based approach prevents individual Supernodes from setting arbitrary or exploitative prices.
The sources highlight that while the primary interaction is between the user and the Responding Supernode, this doesn't mean other Supernodes are excluded or that the Responding Supernode operates without accountability. The process is designed to ensure network-wide obligation and prevent a single Supernode from acting maliciously or failing to fulfill its obligations.
It's important to note that the sources primarily focus on the technical measures implemented within the Pastel Inference Layer itself. They do not explicitly address potential security vulnerabilities outside the system's scope, such as user error or external attacks on individual devices.
Q: What role does the model_menu.json file play in the cost calculation for Swiss Army Llama models, and how does this impact the economics of the Pastel Inference system?
The model_menu.json file plays a crucial role in determining the cost of inference requests for Swiss Army Llama models, which in turn affects the overall economics of the Pastel Inference system. It acts as a central repository of information about the computational requirements of each model and how these translate into credit costs.
Unlike API-based services where pricing is predetermined, Swiss Army Llama models rely on the model_menu.json file for cost estimation. This file contains specific parameters for each model that quantify the resources consumed during inference.
- Compute and Memory Costs: The
model_menu.jsonfile specifies the estimated compute and memory costs associated with each Swiss Army Llama model. These costs are factored into the final credit cost calculation for various inference types, including text completion, document embedding, and audio embedding.
The sources indicate that the current cost calculation for Swiss Army Llama models is in a transitional phase. The system is moving towards a more precise pricing model based on the actual time spent on the GPU during inference. This time-based approach aims to:
- Reflect Resource Utilization: By accurately measuring the GPU time consumed, the Pastel Inference system can more precisely align the credit cost with the actual resources used for each Swiss Army Llama request.
- Ensure Economic Viability: Accurately measuring and pricing GPU time ensures that the credit system fairly compensates Supernode operators for their computational resources, contributing to the long-term economic viability of the Pastel Inference system.
The use of model_menu.json and the transition towards time-based costing for Swiss Army Llama models have significant implications for the Pastel Inference system's economics:
- Transparency and Fairness: The detailed cost breakdowns provided in the
model_menu.jsonfile, and the eventual shift to time-based costing, promote transparency by providing users with clear insights into how inference costs are calculated. This transparency fosters trust and fairness within the Pastel Network. - Decentralized Model Deployment: The flexibility of deploying Swiss Army Llama models locally or on shared remote GPU instances aligns with the decentralized nature of the Pastel Network. The ability to adjust pricing based on actual resource usage further supports this decentralized model by ensuring equitable compensation for Supernode operators.
- Competition and Innovation: The ability to define costs based on resource utilization can foster competition among Supernode operators. Operators with more efficient setups or access to lower-cost GPU resources could potentially offer more competitive pricing, driving innovation within the Pastel Inference ecosystem.
By using the model_menu.json file and implementing a time-based cost model for Swiss Army Llama, the Pastel Inference system aims to create a sustainable and fair ecosystem for AI inference tasks.
Q: What are the main benefits of using Swiss Army Llama models as opposed to API based models in the Pastel Inference Layer?
The sources emphasize several key advantages of using Swiss Army Llama models compared to API-based models within the Pastel Inference Layer. These benefits stem from Swiss Army Llama's open-source nature and flexible deployment options, aligning with Pastel's core principles of decentralization, censorship resistance, and economic viability.
One of the most significant advantages of Swiss Army Llama models is their contribution to the Pastel Inference Layer's decentralized architecture:
- No Reliance on External APIs: Swiss Army Llama models can operate independently without depending on third-party APIs like OpenAI or Stability. This autonomy eliminates the need for API keys, user accounts, or external authentication, aligning with Pastel's commitment to decentralization.
- Uncensored Model Access: Supernode operators have complete control over the Swiss Army Llama models they deploy. This control allows for using models without content filters or safety restrictions, which are often present in API-based services. Users can access "completely uncensored models" and receive responses to prompts that might be rejected by commercial API providers concerned about content policies.
Swiss Army Llama models offer cost-effectiveness and deployment flexibility, benefiting both users and Supernode operators:
- Potential for Cost Savings: While still in development, the cost model for Swiss Army Llama aims to reflect the actual GPU time used for inference. This approach has the potential to be more cost-effective than API-based pricing, particularly for resource-intensive tasks or large-scale inference requests.
- Flexible Deployment Options: Swiss Army Llama models can be deployed locally on the Supernode's CPU or remotely on a GPU-enabled instance via services like Vast.ai. This flexibility allows operators to optimize for cost and performance based on their resources and user demands.
- Shared GPU Resources: The ability to run Swiss Army Llama on a shared remote GPU instance further enhances cost-effectiveness. Multiple Supernodes can utilize the same GPU, distributing the cost and making powerful AI capabilities more accessible.
Running Swiss Army Llama models provides Supernode operators with a higher degree of control and customization compared to using external APIs:
- Model Selection and Updates: Operators can choose from a variety of Swiss Army Llama models, including those with specific specializations or capabilities. They can also easily update these models as new versions become available, ensuring access to the latest advancements in AI.
- Fine-grained Parameter Control: Swiss Army Llama models often offer more fine-grained control over model parameters compared to some API-based services. This control allows for customizing the inference process to better suit specific use cases or achieve desired output characteristics.
- Integration of Custom Features: The open-source nature of Swiss Army Llama allows for the integration of custom features or modifications tailored to the Pastel Network's needs. This customization potential can lead to unique functionalities or optimizations not available in closed-source API solutions.
While Swiss Army Llama models offer various advantages, there are also challenges:
- Technical Complexity: Setting up, configuring, and maintaining Swiss Army Llama, particularly in a remote GPU-enabled environment, can be technically demanding. Supernode operators require a certain level of technical expertise to effectively manage these deployments.
- Evolving Cost Model: The transition to a time-based cost model for Swiss Army Llama is still underway. While aiming for fairness and accuracy, this transition might require adjustments and refinements as more data on actual GPU usage patterns become available.
Despite these challenges, Swiss Army Llama models represent a strategic choice for the Pastel Inference Layer. They embody the principles of decentralization, offer cost advantages and flexibility, and provide enhanced control and customization options. As the cost model matures and the technology continues to evolve, Swiss Army Llama is poised to play an increasingly important role in shaping the future of AI inference within the Pastel Network.
Q: How does the Pastel Inference Layer ensure that users pay for, and receive, the exact results of their inference requests?
The Pastel Inference Layer incorporates a robust system to ensure users pay for, and receive, the precise results of their inference requests. This system leverages credit pack tickets, a multi-step process with confirmations, and various validation mechanisms to maintain fairness and transparency.
The Pastel Inference Layer uses "credit pack tickets" for inference payments. Here's how they work:
- Pre-Payment and Burning: Users pre-pay for inference requests by creating credit pack tickets. These tickets represent a specific number of "inference credits," purchased by burning PSL tokens. This process avoids per-request payments and simplifies accounting.
- Tracking Address: Each credit pack ticket is linked to a specific PSL address called the "tracking address". This address is crucial for tracking credit usage.
- Authorized PastelIDs: Credit pack tickets can authorize multiple PastelIDs to spend credits. This feature allows multiple users to share a single credit pack while tracking individual usage.
- Confirmation Transactions: When a user initiates an inference request, the Responding Supernode calculates the cost in credits. The user then confirms their agreement by sending a small, calculated amount of PSL to the burn address from their designated tracking address. This confirmation transaction authorizes the deduction of the corresponding credits from their credit pack. This mechanism ensures users only pay for requests they approve at a predetermined cost.
The entire process, from credit pack creation to inference request fulfillment, relies on a secure message exchange system:
- Challenge/Response Authentication: All endpoints involved in the process require a challenge/response system for access. This system ensures that only authorized entities (users or Supernodes) can interact with the endpoints, preventing unauthorized access and potential manipulation.
- Message Hashing and Signing: Every message exchanged between the user and Supernodes includes a hash of its content, signed by the sending party's PastelID private key. This mechanism guarantees the message's integrity, ensuring no tampering has occurred during transmission. Any alteration invalidates the signature, alerting the recipient to potential foul play.
- Timestamp and Block Height: Messages also contain the current UTC timestamp and Pastel block height. This inclusion prevents replay attacks and ensures messages are current, further enhancing security and preventing manipulation by malicious actors.
The Pastel Inference Layer prioritizes transparency and user control over their inference requests and payments:
- Price Quotes and Agreement: Before creating a credit pack, users receive a price quote from the Responding Supernode and must agree to it before proceeding. This process ensures users are fully aware of the cost and can back out if unsatisfied.
- Client-Side Price Validation: The client software automatically validates price quotes using the current market price of PSL and the same pricing rules as Supernodes. This validation protects users from price gouging or manipulation.
- Supernode Consensus: Credit pack pricing, while initially proposed by the Responding Supernode, requires agreement from a majority of other Supernodes in the network. This consensus mechanism prevents individual Supernodes from setting unfair prices.
- Inference Request Cost Confirmation: Before executing an inference request, the Responding Supernode informs the user of the cost in credits. The user confirms their acceptance by sending the corresponding PSL to the burn address. This step ensures users consciously agree to the cost before their credit pack is debited.
- Result Verification: After receiving the inference output, users can audit the details of both the request response and the result itself. This audit trail, coupled with the message signing and validation procedures, allows users to verify they received the correct output for the agreed-upon cost.
By combining credit pack tickets, secure message exchange, and a multi-step confirmation process, the Pastel Inference Layer strives to create a secure and transparent environment for AI inference. The system is designed to ensure that users pay only for the specific inference requests they make and receive the correct output, mitigating the risk of manipulation or unfair practices.
Q: What mechanism ensures users receive results for their inference requests, even if the responding supernode fails?
The sources highlight the Pastel Inference Layer's efforts to ensure users receive their inference results even if the initially responding Supernode fails. This reliability is achieved through message broadcasting and a mechanism allowing other Supernodes to step in and complete the request.
After a user initiates an inference request and the Responding Supernode calculates the cost and sends the InferenceAPIUsageResponse, the Responding Supernode doesn't just keep this information to itself. It takes an additional step:
- Broadcast to Nearby Supernodes: The Responding Supernode broadcasts a combined message containing both the
InferenceAPIUsageRequestand theInferenceAPIUsageResponseto multiple nearby Supernodes (determined by XOR distance to the user's PastelID).
This broadcast serves as a redundancy measure. Even if the Responding Supernode encounters issues, other Supernodes in proximity are now aware of the request, the proposed cost, and the user's intended action. This awareness is key to ensuring continued service.
The redundancy measure doesn't end with the request and response. Once the inference task is executed and the results are ready:
- Result Broadcast: The Responding Supernode, after sending the result to the user, also broadcasts the
InferenceAPIOutputResultmessage to the same nearby Supernodes that received the initial request and response broadcast.
This additional broadcast ensures that multiple Supernodes store a record of the completed inference request and its output. If the initial Responding Supernode becomes unavailable, these other Supernodes can step in.
While the sources don't explicitly detail the exact mechanism of how a user would connect with a different Supernode to retrieve results if the original one failed, the system's design provides a robust foundation for this:
- User Awareness of Failure: If a user cannot contact the Responding Supernode to check the status or retrieve results, this would signal a potential failure.
- Alternative Supernode Selection: The user could then initiate contact with one of the Supernodes that received the broadcast messages. Since these Supernodes have the
InferenceAPIOutputResult, they possess the necessary information about the request and its output. - Result Retrieval: The user could then potentially retrieve the results from this alternative Supernode, ensuring service continuity even if the original Responding Supernode is offline.
It's important to note that the provided source material doesn't directly elaborate on the user-facing process of connecting to an alternative Supernode after a failure. This aspect might involve additional mechanisms or procedures within the Pastel Inference Layer.
Q: How does the Pastel Inference Layer handle situations where the responding Supernode fails to provide a valid price quote for a credit pack?
The sources provide a detailed explanation of how the Pastel Inference Layer handles situations where the responding Supernode doesn't offer a valid price quote for a credit pack. It centers around a system of price validation, both on the client-side and through consensus among Supernodes.
Even before involving other Supernodes, the Pastel Inference Layer incorporates a layer of client-side protection against unreasonable price quotes:
- Estimated Market Price Calculation: The inference client software utilizes the
estimated_market_price_of_inference_credits_in_psl_termsfunction to calculate a fair market price for inference credits. This calculation considers the current PSL market price (fetched from sources like CoinMarketCap and CoinGecko), a target value per credit, and a predetermined profit margin. - Price Difference Calculation: When a user receives a preliminary price quote from a Responding Supernode, the client software uses the
calculate_price_difference_percentagefunction to compare the quoted price to the estimated fair market price. - Automatic Rejection: If the price difference exceeds a certain threshold, even if the total price is below the user's maximum limit, the client software automatically rejects the price quote.
This initial client-side check acts as a safeguard against Supernodes attempting to charge exorbitant prices.
Beyond the client-side checks, the Pastel Inference Layer implements a consensus mechanism among Supernodes to ensure fair credit pack pricing:
- Preliminary Price Quote: The Responding Supernode, after receiving a credit pack purchase request, first calculates and sends a preliminary price quote to the user.
- User Agreement: If the user agrees to the preliminary quote (which has already passed the client-side validation), the Responding Supernode initiates the consensus process.
- Broadcast to Potential Agreers: The Responding Supernode identifies a set of other Supernodes (the
potentially_agreeing_supernodes) and broadcasts the credit pack details, including the proposed price, to them using the/credit_pack_price_agreement_requestendpoint. - Agreement or Rejection: Each
potentially_agreeing_supernodeindependently evaluates the proposed price. They respond to the Responding Supernode's request with either an agreement or rejection message, signing their response with their PastelID. - Quorum and Majority Requirements: The Responding Supernode awaits responses from a minimum percentage (quorum) of the
potentially_agreeing_supernodes. Then, for the price to be considered valid, a higher percentage (majority) of those who responded must have agreed. - Termination on Disagreement: If either the quorum or majority requirements aren't met (meaning a sufficient number of Supernodes disagreed with the proposed price), the entire credit pack creation process terminates. The user is informed of this failure and can retry with a different Responding Supernode.
This consensus mechanism prevents a single Supernode from dictating unfair prices. Even if a Responding Supernode attempts to provide a price quote significantly different from the norm, the broader network acts as a regulating force, ensuring price stability and fairness.
Q: What steps are taken in the Pastel Inference Layer to ensure that users receive a fair price for their inference requests?
The sources explain the mechanisms the Pastel Inference Layer implements to guarantee users get fair prices for inference requests. This system relies on a multi-faceted approach involving client-side price validation, Supernode consensus, a transparent pricing model, and the inherent value proposition of the PSL token.
Before a user even agrees to a price, the Pastel Inference Layer client software performs a check:
- Estimating Fair Market Value: The client uses the function
estimated_market_price_of_inference_credits_in_psl_termsto determine a fair market price for inference credits in PSL. This calculation considers the current market price of PSL obtained from reputable sources like CoinMarketCap and CoinGecko. - Comparing Against Quoted Price: When a user receives a price quote from the Responding Supernode, the client software compares this quoted price against the estimated fair market price.
- Automatic Rejection of Unfair Quotes: If the difference between the quoted price and the estimated fair price exceeds a certain threshold, the client automatically rejects the quote, protecting the user from potentially inflated prices. This rejection happens even if the quoted price remains below the user's predefined maximum price limit.
The Pastel Inference Layer implements a system requiring agreement among multiple Supernodes to guarantee fair credit pack pricing:
- Preliminary Price Quote: The Responding Supernode, upon receiving a request for a new credit pack, calculates and presents a preliminary price quote to the user.
- User Agreement: If the user agrees to this preliminary price, the Responding Supernode initiates a consensus process among other Supernodes. This agreement implies the quote has already passed the client-side validation.
- Seeking Agreement from Other Supernodes: The Responding Supernode identifies a group of Supernodes called the
potentially_agreeing_supernodesand sends them a message containing the credit pack details, including the proposed price. This message is sent using the/credit_pack_price_agreement_requestendpoint. - Supernode Evaluation and Response: Each
potentially_agreeing_supernodeindependently assesses the proposed price and sends back either an agreement or rejection message. These messages are signed with the respective Supernode's PastelID, ensuring authenticity and accountability. - Quorum and Majority Requirements: The Responding Supernode waits for responses from a minimum number of the
potentially_agreeing_supernodes(quorum). For the price to be deemed valid, a larger percentage of the responding Supernodes must agree (majority). - Process Termination on Disagreement: If the quorum or majority requirements aren't met, meaning a sufficient number of Supernodes disagree with the proposed price, the entire credit pack creation process ends. The user receives notification about the failure and is free to retry with a different Responding Supernode.
This consensus mechanism ensures fair pricing by preventing any single Supernode from setting arbitrary prices for credit packs. The distributed nature of the agreement process helps establish a fair market value for inference credits.
The pricing of credit pack tickets is designed to be transparent and directly tied to the value of PSL:
- Stable Pricing in USD Terms: The goal is to maintain the price of inference credits relatively stable in USD terms, even if the PSL market price fluctuates. This stability is achieved by adjusting the PSL price of inference credits according to the USD market value of PSL.
- Cost Coverage and Theoretical Profit Margin: The price of inference credits aims to cover the cost of serving inference requests. A 10% "theoretical profit margin" is factored in to ensure the economic viability of the Pastel inference system.
- Burning Mechanism and PSL Value: While Supernodes don't directly receive the PSL paid for credit packs (it gets burned), they benefit indirectly. This burning reduces the total supply of PSL, theoretically increasing the value of PSL held by Supernode operators and other stakeholders.
By pegging the cost of inference credits to the USD value of PSL, the Pastel Inference Layer aims to:
- Protect users from price volatility in the cryptocurrency markets.
- Guarantee the long-term sustainability of the network by aligning the cost of services with the underlying value of the PSL token.
The Pastel Inference Layer strives to ensure users receive fair prices for their inference requests. This fairness is achieved through a combination of:
- Client-side price validation to automatically reject unfair quotes.
- A consensus mechanism among Supernodes to determine and regulate credit pack pricing.
- A transparent pricing model tied to the inherent value of the PSL token, ensuring price stability and economic viability.
This multi-layered approach aims to establish a fair and transparent marketplace for AI inference services within the decentralized framework of the Pastel Network.
Q: What role do credit pack tickets play in the Pastel Inference Layer and how does their structure differ from previous Pastel blockchain tickets?
The sources explain that credit pack tickets are central to the Pastel Inference Layer's operation, acting as a form of prepaid payment for AI inference requests. These tickets differ significantly in their structure and function from previous blockchain tickets used in other Pastel Network applications like Sense, Cascade, and NFTs.
- Decoupling Payment from Inference: Credit pack tickets allow users to pre-pay for a specific amount of inference "credits" using PSL tokens. This separates the payment process from the act of making inference requests, streamlining the user experience.
- Enabling Microtransactions: When a user makes an inference request, they don't need to pay the full PSL cost upfront. Instead, a tiny fraction of PSL, corresponding to the credit cost of that specific request, is burned from their designated "tracking address."
- Simplifying Blockchain Overhead: This micro-transaction system, based on burning minuscule amounts of PSL tied to specific credit amounts, eliminates the need for complex on-chain tracking of credit balances within each ticket.
- Network-Wide Obligation: The creation of a credit pack ticket doesn't just involve the Responding Supernode. It creates an obligation across the entire Pastel Network to honor future inference requests paid for with credits from that ticket. This distributed responsibility enhances the system's robustness and reliability.
Credit pack tickets represent a departure from previous Pastel blockchain ticket types:
- Payment Burning: Unlike previous tickets where fees were paid to specific Supernodes, the cost of a credit pack ticket is burned. This burning mechanism simplifies the ticket creation process, eliminating the need for registration and activation stages.
- Generic Contract Type: Credit pack tickets utilize a new "contract" ticket type in
pasteld(Pastel Daemon). This new type allows for dynamic modification of ticket content without requiring changes to the underlyingpasteldcode. This flexibility extends to potential future applications beyond credit packs. - External Validation: In contrast to previous tickets that
pastelddirectly validates, the validation of credit pack tickets happens within the Inference Layer server code (written in Python). This external validation is necessary due to the dynamic nature of the "contract" ticket type and allows for greater flexibility in defining and updating the ticket structure.
Credit pack tickets are crucial to the Pastel Inference Layer. They decouple payment from inference requests, streamline micro-transactions, simplify blockchain management, and establish a network-wide obligation to honor prepaid credits. Their structure, based on a new "contract" type and external validation, allows for greater flexibility and opens up possibilities for future applications within the Pastel Network.
Q: What are the primary components and functions of the Pastel Inference Layer server, and how do they work together to process inference requests?
The Pastel Inference Layer server facilitates decentralized AI inference on the Pastel Network. It hinges on two primary components: Credit Pack Tickets and Inference Requests. Let's examine these components and their functions in processing inference requests:
1. Credit Pack Tickets: Pre-Paid Inference Credits
- Purpose: Users pre-pay for AI inference requests by purchasing Credit Pack Tickets with Pastel's native PSL tokens. This pre-payment system simplifies accounting and avoids per-request payments.
- Creation: Users initiate a purchase request with a Supernode (a special node in the Pastel network) for a specific number of credits. The PSL used for the purchase is “burned” (sent to an unusable address), effectively removing it from circulation.
- Pricing: The cost of credits is designed to remain stable in USD terms despite market fluctuations of PSL. The pricing model aims to cover the cost of serving inference requests, with an added 10% theoretical profit margin factored in to ensure the system's economic viability. While Supernodes don't directly receive the PSL, they benefit indirectly from the deflationary pressure the burning mechanism creates.
- Structure and Validation:
- Credit pack tickets utilize a new, generic 'contract' ticket type in the Pastel blockchain, allowing for flexible adaptation to future applications beyond just inference credits.
- Each ticket specifies a PSL address as the "tracking address" to track credit usage and lists authorized PastelIDs permitted to spend credits from the pack. This facilitates credit sharing among multiple users while tracking individual usage.
- Before finalization, the credit pack price, determined by the Responding Supernode, undergoes a consensus process among multiple Supernodes to guarantee fairness.
- To ensure the ticket's validity, the Pastel Inference Layer server performs rigorous checks, including verifying the authenticity of involved Supernodes and performing hash checks on the ticket data.
2. Inference Requests: Utilizing AI Models
-
Mechanism: Users submit REST requests to the Inference Layer server to perform various AI tasks using their preferred models and parameters.
-
Model Options: The system supports two broad categories of models:
- API-based services: These offer access to powerful, proprietary models (like OpenAI, Claude3, Groq, Mistral, OpenRouter) if the specific Supernode supports them. The server includes logic to estimate API costs and converts them into credit pack credits for user clarity.
- "Locally Hosted" LLMs (Swiss Army Llama): These models, running directly on the Supernode's hardware (CPU or a remote GPU-enabled instance rented through services like Vast.ai), bypass external APIs and their content restrictions, enabling fully decentralized and uncensored inference. The cost of these models is initially estimated based on resource parameters outlined in a 'model_menu.json' file but will eventually transition to reflect the actual GPU time used for accurate pricing.
-
Cost and Payment:
- The cost of each inference request, measured in credits, is calculated by the Responding Supernode and communicated to the user upfront.
- To authorize payment, users send a small, calculated amount of PSL (corresponding to the credit cost) from their tracking address to the burn address.
-
Request Flow: Here's a step-by-step breakdown of the inference request process:
- Initiation: Users send a POST request with an
InferenceAPIUsageRequestmessage containing: their PastelID, credit pack ticket TXID, requested model, inference type, model parameters, input data, and other details. The user signs this message with their PastelID to ensure its integrity. - Validation and Costing: The Responding Supernode validates the request (checking the signature, model support, data format) and calculates the cost in credits. The cost calculation varies depending on the model and task, factoring in aspects like token count, audio duration, or image resolution.
- Response: The Supernode sends an
InferenceAPIUsageResponsemessage to the user detailing the proposed cost, remaining credits, and other relevant information. - Confirmation: If the user agrees to the cost, they confirm by sending a tiny PSL transaction (corresponding to the credit cost) from their designated tracking address to the Pastel burn address. This acts as proof of authorization. They then send an
InferenceConfirmationmessage with their PastelID and transaction details. - Execution: The Supernode verifies the confirmation transaction and begins processing the inference request using the chosen model and parameters.
- Output Generation: Once completed, the Supernode generates an
InferenceAPIOutputResultmessage containing the results (serialized and base64-encoded), file type, and a signature from the Supernode's PastelID ensuring authenticity. - Result Retrieval: The user can check the status of their request and, when ready, retrieve the results by sending requests to designated endpoints. The Supernode then sends the
InferenceAPIOutputResult. - Auditing: Users can audit both the request response and the result itself, promoting transparency and allowing verification that the correct output was provided for the agreed cost.
- Initiation: Users send a POST request with an
3. Robustness and Security:
The Pastel Inference Layer incorporates several mechanisms to ensure robustness, security, and prevent manipulation:
- Message Hashing and Signing: Every message exchanged between users and Supernodes undergoes hashing and signing using PastelID private keys. This ensures message integrity and prevents tampering.
- Timestamp and Block Height Inclusion: Messages include timestamps and Pastel block heights to prevent replay attacks and ensure message freshness.
- Challenge-Response Authentication: Access to secure endpoints requires a challenge-response system based on PastelIDs, ensuring only authorized entities interact with the system.
- Redundancy and Fault Tolerance: The Responding Supernode broadcasts the
InferenceAPIUsageRequestandInferenceAPIUsageResponseto nearby Supernodes, enabling others to step in and fulfill the request if the initial Supernode fails. Additionally, if a Responding Supernode fails to store credit pack ticket data after the user has paid, other Supernodes can be called upon to complete the process, safeguarding user funds.
The Pastel Inference Layer, with its pre-paid credit system, diverse model choices, and robust security measures, enables a decentralized and trustworthy platform for AI inference.
Q: How does the Pastel Inference Layer handle the distribution of inference requests to different Supernodes?
The sources provide a detailed explanation of how the Pastel Inference Layer handles the distribution of inference requests across its network of Supernodes. This distribution process is vital for ensuring efficient resource utilization, redundancy, and fault tolerance. Here's a breakdown of the key mechanisms involved:
1. Closest Supernode Selection:
- XOR Distance Calculation: When a user initiates an inference request, the system identifies the "closest" Supernode to handle the request. This "closeness" isn't based on physical proximity but on a concept called XOR distance, calculated between the user's PastelID and the PastelIDs of potential Supernodes. The Supernode with the shortest XOR distance is selected as the "Responding Supernode."
- Model Support Consideration: In cases where a specific AI model or service is requested, the selection process adds an additional layer. The system will only consider Supernodes that have declared support for the desired model. This ensures the request is routed to a Supernode capable of fulfilling it.
- Dynamic Responding Supernode: This selection process ensures that the "Responding Supernode" for a particular user can change over time, depending on factors like network conditions, the addition of new Supernodes, and the specific models required for different inference requests. This dynamic allocation contributes to load balancing across the network.
2. Request Broadcasting for Redundancy:
- Broadcast to Nearby Supernodes: The selected "Responding Supernode" doesn't keep the inference request information to itself. After calculating the request cost and sending an
InferenceAPIUsageResponseback to the user, the Responding Supernode broadcasts both the initialInferenceAPIUsageRequestand theInferenceAPIUsageResponseto multiple nearby Supernodes. These nearby Supernodes are also determined based on XOR distance calculations, but this time, the distance is calculated relative to the user's PastelID. - Purpose of Broadcasting: This broadcasting mechanism serves two primary purposes:
- Redundancy: If the Responding Supernode experiences issues or fails during processing, other nearby Supernodes are already aware of the request and its associated information. This allows them to potentially step in and complete the request, preventing service disruption for the user.
- Result Replication: Once the Responding Supernode successfully completes the inference task, it broadcasts the
InferenceAPIOutputResultmessage to the same set of nearby Supernodes. This ensures that multiple Supernodes store a copy of the results, further enhancing redundancy and data availability.
3. Network-Wide Awareness and Shared Responsibility:
- Credit Pack Obligations: While a single Responding Supernode handles the initial interaction and processing, the Pastel Inference Layer emphasizes a network-wide responsibility for fulfilling requests. When a user purchases a credit pack, the obligation to honor those prepaid credits for future inference tasks falls on the entire network, not just the Supernode that facilitated the purchase.
- Supernode Consensus on Pricing: To prevent individual Supernodes from setting arbitrary or unfair prices for credit packs that could negatively impact the entire network, a consensus mechanism is employed. The Responding Supernode's initial price quote must be agreed upon by a majority of other Supernodes before the credit pack is deemed valid. This consensus-based pricing model promotes fairness and economic stability across the network.
- Distributed Validation and Monitoring: The system incorporates various validation steps at different stages. For instance, before processing an inference request using a credit pack, a Supernode will verify that the Supernodes involved in the pack's creation were active at the time of purchase and that the agreed-upon pricing was reached through legitimate consensus. If discrepancies are found, the credit pack can be flagged as invalid.
4. Fault Tolerance and User Recourse:
- Alternative Supernode Selection: Though not explicitly detailed in the sources, the system's design suggests that if a user cannot connect to the Responding Supernode to check the status of a request or retrieve results, they could potentially initiate contact with one of the nearby Supernodes that received the broadcast messages. Since these Supernodes would have a copy of the
InferenceAPIOutputResult, they could potentially provide the inference output, ensuring service continuity even if the original Responding Supernode is offline. - Retry Mechanisms: The sources highlight specific scenarios where, if the Responding Supernode fails to fulfill its obligations (like storing credit pack ticket data after a user has paid), the user is permitted to request assistance from other Supernodes involved in the original agreement. These "backup" Supernodes can step in and complete the process, ensuring the user doesn't lose their prepaid credits.
In conclusion, the Pastel Inference Layer distributes inference requests across its Supernode network using a combination of XOR distance-based selection, request broadcasting, network-wide obligation sharing, consensus mechanisms, and robust validation procedures. This distributed architecture, coupled with fault-tolerance measures and user recourse options, promotes a robust, scalable, and reliable system for decentralized AI inference.