diff --git a/service-level-indicators/spark-retrieval-success-rate.md b/service-level-indicators/spark-retrieval-success-rate.md index 7b6ba47..7309471 100644 --- a/service-level-indicators/spark-retrieval-success-rate.md +++ b/service-level-indicators/spark-retrieval-success-rate.md @@ -10,6 +10,7 @@ - [Spark Protocol](#spark-protocol) - [Deal Ingestion](#deal-ingestion) - [Deal Eligibility Criteria](#deal-eligibility-criteria) + - [Spark v1.5 DDO Deal Ingestion](#spark-v15-ddo-deal-ingestion) - [Task Sampling](#task-sampling) - [Retrieval Checks](#retrieval-checks) - [Reporting Measurements to Spark-API](#reporting-measurements-to-spark-api) @@ -46,7 +47,7 @@ ## Document Purpose -This document is intended to become the canonical resource that is referenced in [the Storage Providers Market Dashboard](https://github.com/filecoin-project/filecoin-storage-providers-market) wherever the “Spark Retrievability” graphs are shown. A reader of those graphs should be able to read this document and understand the “Spark Retrievability SLO”. The goal of this document is to explain fully and clearly “the rules of the game”. With the “game rules”, we seek to empower market participants - onramps, aggregators and Storage Providers (SPs) - to “decide how they want to play the game”. +This document is intended to become the canonical resource that is referenced in [the Storage Providers Market Dashboard](https://github.com/filecoin-project/filecoin-storage-providers-market) wherever the "Spark Retrievability" graphs are shown. A reader of those graphs should be able to read this document and understand the "Spark Retrievability SLO". The goal of this document is to explain fully and clearly "the rules of the game". With the "game rules", we seek to empower market participants - onramps, aggregators and Storage Providers (SPs) - to "decide how they want to play the game". ## Versions @@ -85,7 +86,7 @@ We will now go through the Spark protocol in more depth to show exactly how the ## Deal Ingestion -The first step in the Spark protocol is to build a list of all files that should be available for “fast” retrieval. When we say “fast”, we mean that this file is stored unsealed so that it can be retrieved without needing to unseal the data first. +The first step in the Spark protocol is to build a list of all files that should be available for "fast" retrieval. When we say "fast", we mean that this file is stored unsealed so that it can be retrieved without needing to unseal the data first. At least as of January 2025, every night, a service operated by Space Meridian automatically runs a deal ingestion process ([Github](https://github.com/filecoin-station/fil-deal-ingester)) that scans through all recently-made storage deals in the f05 storage market actor and stores them as [Eligible Deals](#eligible-deals) in an off-chain Spark database, hosted by Space Meridian, the independent team that is building Spark. An Eligible Deal is the tuple `(CID, Storage Provider)`, where the CID refers to a payload CID, as opposed to a piece CID or a deal CID. A payload CID is the root CID of some data like a file. An [Eligible Deal](#eligible-deal) indicates that the `Storage Provider` should be able to serve a fast retrieval for the payload `CID`. @@ -105,11 +106,23 @@ Spark considers a deal as eligible for retrieval testing if it meets the followi - `bafk` - `Qm` +### Spark v1.5 DDO Deal Ingestion +With the introduction of Spark v1.5, the ingestion of Direct Data Onboarding (DDO) deals has been streamlined through the new `spark-deal-observer` pipeline. This pipeline allows for real-time ingestion of DDO deals as they are created, ensuring that the Spark Eligible Deal database is always up-to-date. + +The `spark-deal-observer` works by monitoring the Filecoin network for new DDO deals and automatically adding them to the Spark Eligible Deal database. This process involves the following steps: + +1. **Event Listening**: The `spark-deal-observer` listens for events related to DDO deals on the Filecoin blockchain. +2. **Data Extraction**: When a new DDO deal is detected, the observer extracts relevant information, including the payload CID and the associated storage provider. +3. **Database Update**: The extracted data is then formatted and inserted into the Spark Eligible Deal database, marking the deal as eligible for retrieval testing. +4. **Real-Time Updates**: This process allows for immediate updates to the Spark system, meaning that new storage providers can be evaluated for their Spark scores without the previous delays associated with nightly batch processes. + +This enhancement significantly reduces the time it takes for new DDO deals to be reflected in the Spark retrieval success rate calculations, providing a more accurate and timely assessment of storage provider performance. + ## Task Sampling Each round of the Spark protocol is approximately 20 minutes. At the start of each round, the [Spark tasking service randomly selects a set of records](https://github.com/filecoin-station/spark-api/blob/f77aa4269ab8c19ff64b9b9ff22462c29a6b8514/api/lib/round-tracker.js#L310) from all [Eligible Deals](#eligible-deal). We refer to each of these records as a [Retrieval Task](#retrieval-task) and, specifically, it is the combination of an Eligible Deal and a Round Id. We will also refer to the set of Retrieval Tasks in the round as the [Round Retrieval Task List](#round-retrieval-task-list). -It is important for the security of the protocol that the Retrieval Tasks in each round are chosen at random. This is to prevent Spark checkers from being able to choose their own tasks that may benefit them, such as if an SP wanted to run lots of tasks against itself. We don’t yet use drand for randomness for choosing the Round Retrieval Task List but we would like to introduce that to improve the end-to-end verifiability ([Github](https://github.com/space-meridian/roadmap/issues/182)). +It is important for the security of the protocol that the Retrieval Tasks in each round are chosen at random. This is to prevent Spark checkers from being able to choose their own tasks that may benefit them, such as if an SP wanted to run lots of tasks against itself. We don't yet use drand for randomness for choosing the Round Retrieval Task List but we would like to introduce that to improve the end-to-end verifiability ([Github](https://github.com/space-meridian/roadmap/issues/182)). During each round, the Spark Checkers are able to download the current Round Retrieval Task List. You can see the current Round Retrieval Task List here: [http://api.filspark.com/rounds/current](http://api.filspark.com/rounds/current). @@ -123,33 +136,33 @@ The current approach lies between these two ends of the spectrum. By asking a su **How does the protocol orchestrate the Spark Checkers to perform different tasks from each other at random such that each Task is completed by at least $m$ checkers?** -As previously mentioned, the checkers start each round by downloading the Round Retrieval Task List. For each task that’s in the List, the checker calculates the task’s “key” by SHA256-hashing the task together with the current dRand randomness, fetched from the [dRand network API](https://github.com/filecoin-station/spark-evaluate/blob/a231822d3d78e3d096425a53a300f8c6c82ee01f/lib/drand-client.js#L29-L33). Leveraging the wonderful properties of cryptographic hash functions, these hash values will be randomly & uniformly distributed in the entire space of $2^{256}$ values. Also, by including the randomness in the input alongside the Eligible Deal details, we will get a different hash digest for a given Eligible Deal in each round. We can define the `taskKey` as +As previously mentioned, the checkers start each round by downloading the Round Retrieval Task List. For each task that's in the List, the checker calculates the task's "key" by SHA256-hashing the task together with the current dRand randomness, fetched from the [dRand network API](https://github.com/filecoin-station/spark-evaluate/blob/a231822d3d78e3d096425a53a300f8c6c82ee01f/lib/drand-client.js#L29-L33). Leveraging the wonderful properties of cryptographic hash functions, these hash values will be randomly & uniformly distributed in the entire space of $2^{256}$ values. Also, by including the randomness in the input alongside the Eligible Deal details, we will get a different hash digest for a given Eligible Deal in each round. We can define the `taskKey` as ```jsx taskKey = SHA256(payloadCid + miner_id + drand) ``` -Each Spark Checker node can also calculate the SHA256 hash of its Station Id. This is fixed across rounds as it doesn’t depend on any round-specific inputs. In the future, the protocol may add the drand value into the hash input of the nodeKey too. +Each Spark Checker node can also calculate the SHA256 hash of its Station Id. This is fixed across rounds as it doesn't depend on any round-specific inputs. In the future, the protocol may add the drand value into the hash input of the nodeKey too. ```jsx nodeKey = SHA256(station_id) ``` -The checker can then find its $k$ “closest” tasks, using XOR as the distance metric. These $k$ tasks are the Retrieval Tasks the Spark Checker node is eligible to complete. Any other tasks submitted by the checker are dismissed. +The checker can then find its $k$ "closest" tasks, using XOR as the distance metric. These $k$ tasks are the Retrieval Tasks the Spark Checker node is eligible to complete. Any other tasks submitted by the checker are dismissed. ```jsx dist = taskKey XOR nodeKey ``` -Note that, at the start of each round, the protocol doesn’t know which Spark Checkers will participate as there are no uptime requirements on these checkers. This means the Spark protocol can’t centrally form groups of checkers and assign them a subset of the Round Retrieval Task List. The above approach doesn’t make any assumptions about which checkers are online, but instead relies on the fact that the `nodeKeys` will be evenly distributed around the SHA256 hash ring, so that enough nodes will be assigned each task. +Note that, at the start of each round, the protocol doesn't know which Spark Checkers will participate as there are no uptime requirements on these checkers. This means the Spark protocol can't centrally form groups of checkers and assign them a subset of the Round Retrieval Task List. The above approach doesn't make any assumptions about which checkers are online, but instead relies on the fact that the `nodeKeys` will be evenly distributed around the SHA256 hash ring, so that enough nodes will be assigned each task. -Following the above approach, for each Retrieval Task, there are a set of Checkers who find this task among their $k$ closest and they will attempt the task in the round. We refer to the set of checkers who attempt a Retrieval Task as the “[Committee](#committee)” for that task.. +Following the above approach, for each Retrieval Task, there are a set of Checkers who find this task among their $k$ closest and they will attempt the task in the round. We refer to the set of checkers who attempt a Retrieval Task as the "[Committee](#committee)" for that task.. **How many tasks should each checker do per round? What value is given to $k$?** The choice of $k$ is determined by Spark protocol logic that aims to keep the overall number of Spark measurements completed by the total network per round fixed. This is important because -- We don’t want the number of requests that Storage Providers need to deal with to go up as the number of Spark Checkers in the network increases +- We don't want the number of requests that Storage Providers need to deal with to go up as the number of Spark Checkers in the network increases - There needs to be enough nodes in each committee for its result to be considered reliable In each round, the [Round Retrieval Task List data object](http://api.filspark.com/rounds/current) specifies $k$ in a field called `maxTasksPerNode`. At the start of each round, the [spark-api](https://github.com/filecoin-station/spark-api) service looks at the number of measurements reported by the network in the previous round, compares it against the desired value, and adjusts both maxTaskPerNode $k$ and the length of the Round Retrieval Task List for the new round. @@ -168,7 +181,7 @@ There is an in-depth discussion on how this part of the protocol works here: [ht Here is a summary taken from the blog post: -A Spark Checker’s retrieval test of `(CID, providerID)` is performed with the following steps: +A Spark Checker's retrieval test of `(CID, providerID)` is performed with the following steps: 1. Call Filecoin RPC API method `Filecoin.StateMinerInfo` to map `providerID` to `PeerID`. 2. Call [`https://cid.contact/cid/{CID}`](https://cid.contact/cid/%7BCID%7D) to obtain all retrieval providers for the given CID. @@ -191,15 +204,15 @@ Failed retrieval attempts are also reported with a reason code for the failure r [Spark-Evaluate](https://github.com/filecoin-station/spark-evaluate) is the Spark service that evaluates each measurement to decide whether or not it is valid, and then further processes the valid results for later consumption. It listens out for on-chain events that indicate that the Spark Publish logic has posted a commitment on chain. It then takes the CID of the on chain commitment and fetches the corresponding measurements from Storacha. -Once Spark-Evaluate retrieves the measurements, it does “fraud detection” to remove all unwanted [Retrieval Task Measurement](#retrieval-task-measurement)s as summarized below: +Once Spark-Evaluate retrieves the measurements, it does "fraud detection" to remove all unwanted [Retrieval Task Measurement](#retrieval-task-measurement)s as summarized below: | [Retrieval Task Measurement](#retrieval-task-measurement)s which are removed | Why removed? | | --- | --- | -| Those for [Eligible Deal](#eligible-deal)s not in the round | To prevent checkers from checking any [Eligible Deal](#eligible-deal) of their choosing either to inflate or tarnish an SP’s stats. | -| Those which are for [Retrieval Tasks](#retrieval-task) that are not within the $k$-closest for a checker | Same principle as above. It’s not good enough to pick any [Retrieval Task](#retrieval-task) from the [Round Retrieval Task List](#round-retrieval-task-list). | -| Those which are submitted after the first $k$ measurements from a given IPV4 /24 subnet. (i.e, The first $k$ [Retrieval Task Measurement](#retrieval-task-measurement) within a IPv4 /24 subnet are accepted. Others are rejected.) | Prevent a malicious actor from creating tons of station ids that are then used to “stuff the ballot box” from one node. IPV4 /24 subnets are being used here as a scarce resource. | +| Those for [Eligible Deal](#eligible-deal)s not in the round | To prevent checkers from checking any [Eligible Deal](#eligible-deal) of their choosing either to inflate or tarnish an SP's stats. | +| Those which are for [Retrieval Tasks](#retrieval-task) that are not within the $k$-closest for a checker | Same principle as above. It's not good enough to pick any [Retrieval Task](#retrieval-task) from the [Round Retrieval Task List](#round-retrieval-task-list). | +| Those which are submitted after the first $k$ measurements from a given IPV4 /24 subnet. (i.e, The first $k$ [Retrieval Task Measurement](#retrieval-task-measurement) within a IPv4 /24 subnet are accepted. Others are rejected.) | Prevent a malicious actor from creating tons of station ids that are then used to "stuff the ballot box" from one node. IPV4 /24 subnets are being used here as a scarce resource. | -With these [Committee Accepted Retrieval Task Measurements](#committee-accepted-retrieval-task-measurements) that passed fraud detection, Spark Evaluate then performs the honest majority consensus. For each task, it calculates the honest majority result from the task’s committee, and it stores this aggregated result in Spark’s DB. These aggregate results are called [Provider Retrieval Result Stats](#provider-retrieval-result-stats). They are packaged into a CAR which will is stored with Storacha, and the CID of this CAR is stored on chain. +With these [Committee Accepted Retrieval Task Measurements](#committee-accepted-retrieval-task-measurements) that passed fraud detection, Spark Evaluate then performs the honest majority consensus. For each task, it calculates the honest majority result from the task's committee, and it stores this aggregated result in Spark's DB. These aggregate results are called [Provider Retrieval Result Stats](#provider-retrieval-result-stats). They are packaged into a CAR which will is stored with Storacha, and the CID of this CAR is stored on chain. ## On-chain Evaluation and Rewards @@ -211,7 +224,7 @@ There are two final steps in the Spark protocol that pertain to how Spark checke At this point of the protocol, we have a set of valid measurements for each round stored off chain and committed to on chain for verifiability. From this we can calculate the Spark RSR values. -Given a specific time frame, the top level Spark RSR figure is calculated by taking the number of all successful valid retrievals made in that time frame and dividing it by the number of all “contributing” valid retrieval requests made in that time frame. When we say “contributing” retrieval requests, we mean all valid successful retrieval requests as well as all the valid retrieval requests that failed due to some issue on the storage provider’s end or with IPNI. (See [Retrieval Result Mapping to RSR](#retrieval-result-mapping-to-rsr) for a detailed breakdown of failure cases that are “contributing”.) +Given a specific time frame, the top level Spark RSR figure is calculated by taking the number of all successful valid retrievals made in that time frame and dividing it by the number of all "contributing" valid retrieval requests made in that time frame. When we say "contributing" retrieval requests, we mean all valid successful retrieval requests as well as all the valid retrieval requests that failed due to some issue on the storage provider's end or with IPNI. (See [Retrieval Result Mapping to RSR](#retrieval-result-mapping-to-rsr) for a detailed breakdown of failure cases that are "contributing".) $$ RSR = {count(successful) \over count(successful) + count(failure)} @@ -219,7 +232,7 @@ $$ However, as you may notice, we have not used the honest majority consensus results in this calculation. Here we are counting over all valid requests. This is because there is an intricacy when it comes to using the committee consensus results. -To give an example to explain this, let’s say that a Storage provider only serves 70% of retrieval requests. Assuming that all Spark checkers are acting honestly, when the Spark checkers make their checks, 70% of each committee reports that the CID is retrievable from the SP, while 30% report that is is unretrievable. With honest majority consensus, this file is deemed to be retrievable. If we use the results of the honest majority consensus rather than the raw measurements, we lose some fidelity in the retrievability of data from this SP. Specifically, instead of reporting a Spark RSR of 70%, we report a spark RSR of 100%, which seems misleading. +To give an example to explain this, let's say that a Storage provider only serves 70% of retrieval requests. Assuming that all Spark checkers are acting honestly, when the Spark checkers make their checks, 70% of each committee reports that the CID is retrievable from the SP, while 30% report that is is unretrievable. With honest majority consensus, this file is deemed to be retrievable. If we use the results of the honest majority consensus rather than the raw measurements, we lose some fidelity in the retrievability of data from this SP. Specifically, instead of reporting a Spark RSR of 70%, we report a spark RSR of 100%, which seems misleading. We believe that the 70% value is more accurate, yet we also need committees to prevent fraudulent behaviour. Currently, we are storing both the committee based score as well as the raw measurement score and we plan to use the committee results as a reputation score by which to weight the measurements from checkers in a committee ([GH tracking item](https://github.com/space-meridian/roadmap/issues/180)). @@ -248,7 +261,7 @@ Nodes that do retrievability checks. In practice, these are primarily Station n [Checker / Checker Node](#checker--checker-node)s that have performed the same [Retrieval Task](#retrieval-task) in a round -- Committees aren’t formed during tasking. Nodes pick tasks based on the hash of their self-generated id. The scheduler doesn’t assign tasks to nodes, because at the beginning of the round it doesn’t yet know who will participate. See [Task Sampling](#task-sampling) for more info. +- Committees aren't formed during tasking. Nodes pick tasks based on the hash of their self-generated id. The scheduler doesn't assign tasks to nodes, because at the beginning of the round it doesn't yet know who will participate. See [Task Sampling](#task-sampling) for more info. ### Eligible Deal @@ -365,20 +378,20 @@ For full transparency, a list of potential issues or concerns about this SLI are 1. The Spark protocol [is dependent on GLIF RPC nodes](https://github.com/filecoin-station/spark-api/blob/f77aa4269ab8c19ff64b9b9ff22462c29a6b8514/publish/bin/spark-publish.js#L12) to function properly. 1. Potential problems - 1. Single point of failure: If GLIF’s API is down then the Spark protocol is unable to function properly and Spark scores are not updated. This can be mitigated by having multiple RPC providers but there are currently no other options that suffice. + 1. Single point of failure: If GLIF's API is down then the Spark protocol is unable to function properly and Spark scores are not updated. This can be mitigated by having multiple RPC providers but there are currently no other options that suffice. 2. Delegated trust: Even if there a multiple RPC providers that can be used by the Spark protocol, this amounts to a centralised component in the Spark protocol, whose the end goal is a completely trustless decentralised protocol. 2. Why is this the case? - 1. In Spark team’s experience there unfortunately is no other reliable endpoint. -2. Spark Publish doesn’t have any AuthN/AuthZ: In an ideal future state, Spark Checkers would sign their measurements so that we can build confidence and reputation around Spark Checker (Station) Ids. Without AuthN/AuthZ, it is easy to impersonate another checker since all `checkerId` are publicly discoverable by processing the publicly retrievable [Retrieval Task Measurement](#retrieval-task-measurement)s. -3. Spark only checks on deals stored with datacap using the f05 storage market actor and the deal must have a “payload CID” label. This means Spark v1 excludes DDO deals, which when last checked in October 2024, means that Spark is checking about 56% of deals. Spark v2 which will ship in early 2025 will include DDO deals. -4. Spark API is receiving, aggregating, and publishing the checker results which are discoverable on chain. Spark’s code is open-sourced, but there is trust that Spark isn’t doing additional result modifying like adding results for a checker that didn’t actually submit results. An individual Spark checker can verify that their own measurements have been included and committed to on chain. They can also rerun the Spark evaluate logic with all the measurements from the round. -5. An SP gets credit for a retrieval even if they fail [IPFS trustless gateway HTTP retrieval](https://specs.ipfs.tech/http-gateways/trustless-gateway/) but succeed with GraphSync retrieval. The concern with GraphSync only support is that there aren’t active maintainers for the protocol and that it is harder for clients to use than HTTP. (See [Why is GraphSync used?](#why-is-graphsync-used) ) -6. Verified deals are indexed on a weekly basis. As a result, it’s possible that the payload CID of a verified deal will not get checked for a week plus after deal creation. -7. Spark station ids are self-generated. This means a checker can potentially have many station ids and then report results using a station id that passes “[fraud checks](#fraud-checks)”. The limit on duplicate IPv4 /24 subnets helps prevent “stuffing/overflowing the ballot box” but it doesn’t prevent one from “poisoning it” with some untruthful failed results. There is a [backlog item](https://github.com/space-meridian/roadmap/issues/180) to weight results by to-be-determined “checker reputation”. -8. Storage Providers can currently go into “ghost mode” where they don’t get any retrieval results reported, regardless if they actually are or aren’t retrievable. They accomplish this by keeping retrieval connections open for longer than the round by making a “byte of progress every 60 seconds”. This is because Spark checkers currently only have a “progress timeout”, not a “max request duration” timeout. There is a [backlog item to fix this](https://github.com/filecoin-station/spark/issues/99). -9. Spark makes retrievability checks assuming public global retrievability. It doesn’t support network partitions or access-controlled data. As a result, SPs in China with storage deals will have a 0 retrieval success rate. + 1. In Spark team's experience there unfortunately is no other reliable endpoint. +2. Spark Publish doesn't have any AuthN/AuthZ: In an ideal future state, Spark Checkers would sign their measurements so that we can build confidence and reputation around Spark Checker (Station) Ids. Without AuthN/AuthZ, it is easy to impersonate another checker since all `checkerId` are publicly discoverable by processing the publicly retrievable [Retrieval Task Measurement](#retrieval-task-measurement)s. +3. Spark only checks on deals stored with datacap using the f05 storage market actor and the deal must have a "payload CID" label. This means Spark v1 excludes DDO deals, which when last checked in October 2024, means that Spark is checking about 56% of deals. Spark v2 which will ship in early 2025 will include DDO deals. +4. Spark API is receiving, aggregating, and publishing the checker results which are discoverable on chain. Spark's code is open-sourced, but there is trust that Spark isn't doing additional result modifying like adding results for a checker that didn't actually submit results. An individual Spark checker can verify that their own measurements have been included and committed to on chain. They can also rerun the Spark evaluate logic with all the measurements from the round. +5. An SP gets credit for a retrieval even if they fail [IPFS trustless gateway HTTP retrieval](https://specs.ipfs.tech/http-gateways/trustless-gateway/) but succeed with GraphSync retrieval. The concern with GraphSync only support is that there aren't active maintainers for the protocol and that it is harder for clients to use than HTTP. (See [Why is GraphSync used?](#why-is-graphsync-used) ) +6. Verified deals are indexed on a weekly basis. As a result, it's possible that the payload CID of a verified deal will not get checked for a week plus after deal creation. +7. Spark station ids are self-generated. This means a checker can potentially have many station ids and then report results using a station id that passes "[fraud checks](#fraud-checks)". The limit on duplicate IPv4 /24 subnets helps prevent "stuffing/overflowing the ballot box" but it doesn't prevent one from "poisoning it" with some untruthful failed results. There is a [backlog item](https://github.com/space-meridian/roadmap/issues/180) to weight results by to-be-determined "checker reputation". +8. Storage Providers can currently go into "ghost mode" where they don't get any retrieval results reported, regardless if they actually are or aren't retrievable. They accomplish this by keeping retrieval connections open for longer than the round by making a "byte of progress every 60 seconds". This is because Spark checkers currently only have a "progress timeout", not a "max request duration" timeout. There is a [backlog item to fix this](https://github.com/filecoin-station/spark/issues/99). +9. Spark makes retrievability checks assuming public global retrievability. It doesn't support network partitions or access-controlled data. As a result, SPs in China with storage deals will have a 0 retrieval success rate. 10. Spark can only check retrievability of data that has an unsealed copy. There currently is no protocol-defined way for requesting an SP unseal a sector and then checking for retrievability later. -11. Per [Retrieval Result Mapping to RSR](#retrieval-result-mapping-to-rsr), an SP’s RSR can be impacted by areas outside of their control like IPNI. See also [Why do IPNI outages impact SP RSR?](#why-do-ipni-outages-impact-sp-rsr). +11. Per [Retrieval Result Mapping to RSR](#retrieval-result-mapping-to-rsr), an SP's RSR can be impacted by areas outside of their control like IPNI. See also [Why do IPNI outages impact SP RSR?](#why-do-ipni-outages-impact-sp-rsr). 12. Checker traffic is pretty easy to differentiate from real traffic for a Storage Provider given checker traffic only requests the `payloadCid`. A SP can look at the [Round Retrieval Task List](#round-retrieval-task-list), see which tasks they are the listed "minerId" for, and then just make sure to serve the corresponding `payloadCid`s in that 20 minute period. This would enable them to have 100% RSR even if they don't serve any other retrievals. ## Retrieval Result Mapping to RSR @@ -406,7 +419,7 @@ When [Retrieval Task Measurement](#retrieval-task-measurement)s are submitted i
Every request made by a Spark checker feeds into the metric (after doing basic “fraud” detection”), regardless of whether the checker’s result aligns with its committee.
+Every request made by a Spark checker feeds into the metric (after doing basic "fraud" detection"), regardless of whether the checker's result aligns with its committee.
For every <round, providerId, payloadCid>
, the number of data points feeding into the SLI should be close to the size of the committee. Committee size p50 is ~80, but it ultimately depends on depends on which nodes participate in the round. The aim is to have most committee sizes in the range of 40 to 100.
Only the committee’s honest majority result for a <round, providerId, payloadCid>
feeds into the metric.
Only the committee's honest majority result for a <round, providerId, payloadCid>
feeds into the metric.
For every <round, providerId, payloadCid>
, there will be a single datapoint feeding into the SLI. (This is ~80x less data points than the non-committee case.)
Con: If a bad actor has 1% of the Spark Checkers and they always report that SPs fail, they can effectively bring every SP’s Spark RSR down by 1 percentage point.
+Con: If a bad actor has 1% of the Spark Checkers and they always report that SPs fail, they can effectively bring every SP's Spark RSR down by 1 percentage point.
That seems too sensitive.
Pro: assuming most checkers are honest, it will differentiate the SPs that 80% of the time respond to requests with valid/correct results vs. those that give 99+%.
<round, providerID, payloadCID>
from the same IPv4 /24 subnet are discarded as part of “fraud detection”. This means an attacker can’t simply spin up a plethora of nodes on one machine or their local network, but would need to distribute across multiple IP addresses. (Note that in 202410, the Spark dashboard shows ~7k daily active checkers.)<round, providerID, payloadCID>
from the same IPv4 /24 subnet are discarded as part of "fraud detection". This means an attacker can't simply spin up a plethora of nodes on one machine or their local network, but would need to distribute across multiple IP addresses. (Note that in 202410, the Spark dashboard shows ~7k daily active checkers.)