Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkIntervals will not validate correctly in certain nested txn scenarios #16380

Open
4 tasks done
bsbds opened this issue Aug 7, 2023 · 7 comments · May be fixed by #16395
Open
4 tasks done

checkIntervals will not validate correctly in certain nested txn scenarios #16380

bsbds opened this issue Aug 7, 2023 · 7 comments · May be fixed by #16395

Comments

@bsbds
Copy link

bsbds commented Aug 7, 2023

Bug report criteria

What happened?

In checkIntervals there's seems to be bug in checking nested txns. In the following loop it check sub txn request and append the deletes to dels at the end.

for _, req := range reqs {
tv, ok := req.Request.(*pb.RequestOp_RequestTxn)
if !ok {
continue
}
putsThen, delsThen, err := checkIntervals(tv.RequestTxn.Success)
if err != nil {
return nil, dels, err
}
putsElse, delsElse, err := checkIntervals(tv.RequestTxn.Failure)
if err != nil {
return nil, dels, err
}
for k := range putsThen {
if _, ok := puts[k]; ok {
return nil, dels, rpctypes.ErrGRPCDuplicateKey
}
if dels.Intersects(adt.NewStringAffinePoint(k)) {
return nil, dels, rpctypes.ErrGRPCDuplicateKey
}
puts[k] = struct{}{}
}
for k := range putsElse {
if _, ok := puts[k]; ok {
// if key is from putsThen, overlap is OK since
// either then/else are mutually exclusive
if _, isSafe := putsThen[k]; !isSafe {
return nil, dels, rpctypes.ErrGRPCDuplicateKey
}
}
if dels.Intersects(adt.NewStringAffinePoint(k)) {
return nil, dels, rpctypes.ErrGRPCDuplicateKey
}
puts[k] = struct{}{}
}
dels.Union(delsThen, adt.NewStringAffineInterval("\x00", ""))
dels.Union(delsElse, adt.NewStringAffineInterval("\x00", ""))
}

Consider the following senerio(assume all requests are in the success branch): If a txn S contains two sub txn A and B. A contains a put, B contains a delete, and the put overlaps with the delete. Then S[A(put), B(del)] will return a ok but S[B(del), A(put) will return an error.

The reason this inconsistency exist is that the sub txn requests are checked in a specific order, and append the delete to dels in this order. And the put checks dels for overlapping. So if the put is checked before the delete, it will not know that it overlaps with that delete.

What did you expect to happen?

Both S[A(put), B(del)] and S[B(del), A(put) should return an error. The deletes should be append to dels first before check the intersection of puts.

How can we reproduce it (as minimally and precisely as possible)?

It should be reproduced by the following code

func TestCheckIntervals(t *testing.T) {
  innerPut := pb.TxnRequest{
    Success: []*pb.RequestOp{
      {
        Request: &pb.RequestOp_RequestPut{RequestPut: &pb.PutRequest{
          Key: []byte("k1"),
        }},
      },
    },
  }
  innerDel := pb.TxnRequest{
    Success: []*pb.RequestOp{
      {
        Request: &pb.RequestOp_RequestDeleteRange{RequestDeleteRange: &pb.DeleteRangeRequest{
          Key:      []byte("k0"),
          RangeEnd: []byte("k3"),
        }},
      },
    },
  }
  req1 := pb.TxnRequest{
    Success: []*pb.RequestOp{
      {
        Request: &pb.RequestOp_RequestTxn{RequestTxn: &pb.TxnRequest{
          Success: []*pb.RequestOp{
            {
              Request: &pb.RequestOp_RequestTxn{RequestTxn: &innerPut},
            },
            {
              Request: &pb.RequestOp_RequestTxn{RequestTxn: &innerDel},
            },
          },
        }},
      },
    },
  }
  req2 := pb.TxnRequest{
    Success: []*pb.RequestOp{
      {
        Request: &pb.RequestOp_RequestTxn{RequestTxn: &pb.TxnRequest{
          Success: []*pb.RequestOp{
            {
              Request: &pb.RequestOp_RequestTxn{RequestTxn: &innerDel},
            },
            {
              Request: &pb.RequestOp_RequestTxn{RequestTxn: &innerPut},
            },
          },
        }},
      },
    },
  }
  _, _, err1 := checkIntervals(req1.Success)
  _, _, err2 := checkIntervals(req2.Success)
  isErr1 := err1 != nil
  isErr2 := err2 != nil
  if isErr1 != isErr2 {
    t.Errorf("checkIntervals should return same error for both req1 and req2")
  }
}

Anything else we need to know?

No response

Etcd version (please run commands below)

The latest main branch

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

@bsbds bsbds added the type/bug label Aug 7, 2023
@ahrtr
Copy link
Member

ahrtr commented Aug 7, 2023

It's real issue, thanks for reporting it.

Please raise a PR to fix it, the change should be as simple as below,

$ git diff
diff --git a/server/etcdserver/api/v3rpc/key.go b/server/etcdserver/api/v3rpc/key.go
index 2c1de2a90..9603a06db 100644
--- a/server/etcdserver/api/v3rpc/key.go
+++ b/server/etcdserver/api/v3rpc/key.go
@@ -218,6 +218,8 @@ func checkIntervals(reqs []*pb.RequestOp) (map[string]struct{}, adt.IntervalTree
                if err != nil {
                        return nil, dels, err
                }
+               dels.Union(delsThen, adt.NewStringAffineInterval("\x00", ""))
+               dels.Union(delsElse, adt.NewStringAffineInterval("\x00", ""))
                for k := range putsThen {
                        if _, ok := puts[k]; ok {
                                return nil, dels, rpctypes.ErrGRPCDuplicateKey
@@ -240,8 +242,6 @@ func checkIntervals(reqs []*pb.RequestOp) (map[string]struct{}, adt.IntervalTree
                        }
                        puts[k] = struct{}{}
                }
-               dels.Union(delsThen, adt.NewStringAffineInterval("\x00", ""))
-               dels.Union(delsElse, adt.NewStringAffineInterval("\x00", ""))
        }
 
        // collect and check this level's puts

@bsbds
Copy link
Author

bsbds commented Aug 8, 2023

It's real issue, thanks for reporting it.

Please raise a PR to fix it, the change should be as simple as below,

$ git diff
diff --git a/server/etcdserver/api/v3rpc/key.go b/server/etcdserver/api/v3rpc/key.go
index 2c1de2a90..9603a06db 100644
--- a/server/etcdserver/api/v3rpc/key.go
+++ b/server/etcdserver/api/v3rpc/key.go
@@ -218,6 +218,8 @@ func checkIntervals(reqs []*pb.RequestOp) (map[string]struct{}, adt.IntervalTree
                if err != nil {
                        return nil, dels, err
                }
+               dels.Union(delsThen, adt.NewStringAffineInterval("\x00", ""))
+               dels.Union(delsElse, adt.NewStringAffineInterval("\x00", ""))
                for k := range putsThen {
                        if _, ok := puts[k]; ok {
                                return nil, dels, rpctypes.ErrGRPCDuplicateKey
@@ -240,8 +242,6 @@ func checkIntervals(reqs []*pb.RequestOp) (map[string]struct{}, adt.IntervalTree
                        }
                        puts[k] = struct{}{}
                }
-               dels.Union(delsThen, adt.NewStringAffineInterval("\x00", ""))
-               dels.Union(delsElse, adt.NewStringAffineInterval("\x00", ""))
        }
 
        // collect and check this level's puts

I think the puts should only be checked after all deletes have been added to dels?

I suggest the following changes:

diff --git a/server/etcdserver/api/v3rpc/key.go b/server/etcdserver/api/v3rpc/key.go
index 2c1de2a90..677155609 100644
--- a/server/etcdserver/api/v3rpc/key.go
+++ b/server/etcdserver/api/v3rpc/key.go
@@ -222,9 +222,6 @@ func checkIntervals(reqs []*pb.RequestOp) (map[string]struct{}, adt.IntervalTree
 			if _, ok := puts[k]; ok {
 				return nil, dels, rpctypes.ErrGRPCDuplicateKey
 			}
-			if dels.Intersects(adt.NewStringAffinePoint(k)) {
-				return nil, dels, rpctypes.ErrGRPCDuplicateKey
-			}
 			puts[k] = struct{}{}
 		}
 		for k := range putsElse {
@@ -235,14 +232,16 @@ func checkIntervals(reqs []*pb.RequestOp) (map[string]struct{}, adt.IntervalTree
 					return nil, dels, rpctypes.ErrGRPCDuplicateKey
 				}
 			}
-			if dels.Intersects(adt.NewStringAffinePoint(k)) {
-				return nil, dels, rpctypes.ErrGRPCDuplicateKey
-			}
 			puts[k] = struct{}{}
 		}
 		dels.Union(delsThen, adt.NewStringAffineInterval("\x00", ""))
 		dels.Union(delsElse, adt.NewStringAffineInterval("\x00", ""))
 	}
+	for k := range puts {
+		if dels.Intersects(adt.NewStringAffinePoint(k)) {
+			return nil, dels, rpctypes.ErrGRPCDuplicateKey
+		}
+	}
 
 	// collect and check this level's puts
 	for _, req := range reqs {

@ahrtr
Copy link
Member

ahrtr commented Aug 8, 2023

I think the puts should only be checked after all deletes have been added to dels?

You are right.

  • We may want to refactor the loops a little bit. We can use just one loop to collect all deletes operations in both current level and all children levels.
	// collect deletes from current level and all children levels,
	// and collect puts from all children levels
	dels := adt.NewIntervalTree()
	puts := make(map[string]struct{})
	for _, req := range reqs {
		reqDel, ok := req.Request.(*pb.RequestOp_RequestDeleteRange)
		if ok {
			//......
			continue
		}
		reqTxn, ok := req.Request.(*pb.RequestOp_RequestTxn)
		if ok {
			//......
			continue
		}
	}
  • Construct enough unit test cases to reproduce different kinds of issues, and make sure the PR can fix all of them.

@bsbds
Copy link
Author

bsbds commented Aug 8, 2023

I think the puts should only be checked after all deletes have been added to dels?

You are right.

  • We may want to refactor the loops a little bit. We can use just one loop to collect all deletes operations in both current level and all children levels.
	// collect deletes from current level and all children levels,
	// and collect puts from all children levels
	dels := adt.NewIntervalTree()
	puts := make(map[string]struct{})
	for _, req := range reqs {
		reqDel, ok := req.Request.(*pb.RequestOp_RequestDeleteRange)
		if ok {
			//......
			continue
		}
		reqTxn, ok := req.Request.(*pb.RequestOp_RequestTxn)
		if ok {
			//......
			continue
		}
	}
  • Construct enough unit test cases to reproduce different kinds of issues, and make sure the PR can fix all of them.

Sure, I'll take on this issue.

@ahrtr
Copy link
Member

ahrtr commented Sep 7, 2023

Anyway, we should clearly define & clarify the expected behaviour firstly.

@bsbds
Copy link
Author

bsbds commented Sep 14, 2023

Anyway, we should clearly define & clarify the expected behaviour firstly.

I'll try to define the expected behaviour:

We divide puts/dels into two class:

  • puts/dels in current level, denoted as Puts_c/Dels_c
  • puts/dels in the subtree of a txn branch, denoted as Puts_t/Dels_t (in the code they are putsThen/delsThen/putsElse/delsElse)

The constraints are:

  1. Puts_c should not intersect with Dels_c or any Dels_t (aka all deletes)
  2. A Puts_t should not intersect with Dels_c
  3. For any two distinct txn_X and txn_Y, a Puts_t in txn_X should not intersect with a Dels_t in txn_Y

For (2, 3) alternatively, together they can be described as: A Puts_t should not intersect with all deletes except for the deletes in the other branch of the same transaction.

  1. For any two distinct put in Puts_c, their keys should not duplicate
  2. A Puts_t should not intersect with Puts_c
  3. For any two distinct txn_X and txn_Y, a Puts_t in txn_X should not intersect with a Puts_t in txn_Y

@bsbds
Copy link
Author

bsbds commented Sep 14, 2023

The issue actually only lies in constraint 3. The most straightforward way to check this is to enumerate for each two transactions, but it's indeed not very efficient. Do you have any alternative suggestions? @ahrtr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants