Skip to content

panic: runtime error: comparing uncomparable type map[string]interface {} #19700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
cruvie opened this issue Apr 1, 2025 · 18 comments
Open
4 tasks done

Comments

@cruvie
Copy link

cruvie commented Apr 1, 2025

Bug report criteria

What happened?

use grpc client made by etcd resolver with google.golang.org/grpc v1.71.0 cause panic

What did you expect to happen?

call rpc method successfully

How can we reproduce it (as minimally and precisely as possible)?

get a grpc client like this

import (
	"go.etcd.io/etcd/client/v3/naming/resolver"
	"google.golang.org/grpc"
)

// GetGrpcClient get grpc client for serverName
//
// don't forget to close conn conn.Close()
func GetGrpcClient[T any](serverName string,
	clientBuilder func(grpcConn grpc.ClientConnInterface) (client T),
	opts ...grpc.DialOption) (conn *grpc.ClientConn, client T, err error) {

	//refers to https://etcd.io/docs/v3.5/dev-guide/grpc_naming/
	etcdResolver, err := resolver.NewBuilder(GetClient())
	if err != nil {
		return nil, client, err
	}

	options := []grpc.DialOption{
		grpc.WithResolvers(etcdResolver),
		grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
	}
	options = append(options, opts...)

	conn, err = grpc.NewClient("etcd:///grpc/"+serverName,
		options...,
	)
	client = clientBuilder(conn)
	return conn, client, err
}

and use the client call a method

Anything else we need to know?

go 1.24
google.golang.org/grpc v1.70.0 works fine
create a grpc client manually works fine

func TestPanic(t *testing.T) {
	etcdCloseFunc := global_config.InitEtcdClient(false)
	defer etcdCloseFunc()
	_, client, err := GetClient()
	if err != nil {
		t.Error(err)
	} else {
		status, err := client.UpdateLocationOnlineStatus(context.Background(), &UpdateLocationOnlineStatus_Input{
			UserId: "test",
		})
		if err != nil {
			t.Error(err)
		}
		t.Log(status)
	}
}

func TestWorksFine(t *testing.T) {

	conn, err := grpc.NewClient("127.0.0.1:58759",
		grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("did not connect: %v", err)
	}
	defer conn.Close()

	client := NewRPCDaZiNearbyClient(conn)
	ctx, cancelFunc := context.WithTimeout(context.Background(), time.Second*2)
	defer cancelFunc()
	resp, err := client.UpdateLocationOnlineStatus(ctx, &UpdateLocationOnlineStatus_Input{
		UserId: "test",
	})
	if err != nil {
		log.Fatalf("could not test: %v", err)
	}
	log.Printf("DeleteTest response: %v", resp)
}

Etcd version (please run commands below)

deployed by docker

  ss-etcd:
    image: quay.io/coreos/etcd:v3.5.17-arm64
    # image: quay.io/coreos/etcd:v3.5.17-amd64
    container_name: ss-etcd
    command:
      [
        "/usr/local/bin/etcd",
        "--data-dir=/etcd-data",
        "--name=node1",
        "--initial-advertise-peer-urls=http://ss-etcd:2380",
        "--listen-peer-urls=http://0.0.0.0:2380",
        "--advertise-client-urls=http://ss-etcd:2379",
        "--listen-client-urls=http://0.0.0.0:2379",
        "--initial-cluster=node1=http://ss-etcd:2380"
      ]
    restart: unless-stopped
    ports:
      - "2379:2379"
      - "2380:2380"
    volumes:
      - ./etcd/data:/etcd-data

Etcd configuration (command line flags or environment variables)

default

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

standalone

Relevant log output

=== RUN   TestPanic
panic: runtime error: comparing uncomparable type map[string]interface {}

goroutine 11 [running]:
google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.equalAddressIgnoringBalAttributes(0x1400031c970, 0x1400031ca10)
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:931 +0xe0
google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*addressList).seekTo(0x14000333e20, {{0x14000040468, 0x13}, {0x0, 0x0}, 0x0, 0x140001225a0, {0x104f75a00, 0x14000414030}})
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:905 +0xc8
google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).updateSubConnState(0x14000333dd0, 0x1400030d880, {0x2, {0x0, 0x0}, {{0x14000040468, 0x13}, {0x14000152ea0, 0x21}, 0x0, ...}})
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:627 +0x81c
google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).newSCData.func1({0x2, {0x0, 0x0}, {{0x14000040468, 0x13}, {0x14000152ea0, 0x21}, 0x0, 0x140001225a0, {0x104f75a00, ...}}})
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:197 +0x84
google.golang.org/grpc/internal/balancer/gracefulswitch.(*Balancer).updateSubConnState(0x140000f05a0, {0x105116900, 0x1400003ba40}, {0x2, {0x0, 0x0}, {{0x14000040468, 0x13}, {0x14000152ea0, 0x21}, ...}}, ...)
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/internal/balancer/gracefulswitch/gracefulswitch.go:262 +0x2bc
google.golang.org/grpc/internal/balancer/gracefulswitch.(*balancerWrapper).NewSubConn.func1({0x2, {0x0, 0x0}, {{0x14000040468, 0x13}, {0x14000152ea0, 0x21}, 0x0, 0x140001225a0, {0x104f75a00, ...}}})
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/internal/balancer/gracefulswitch/gracefulswitch.go:378 +0x98
google.golang.org/grpc.(*acBalancerWrapper).updateState.func1({0x105112108?, 0x140004f8c30?})
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/balancer_wrapper.go:341 +0x274
google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run(0x14000111d80, {0x105112108, 0x140004f8c30})
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/internal/grpcsync/callback_serializer.go:94 +0x10c
created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer in goroutine 25
	/Users/cruvie/go/pkg/mod/google.golang.org/grpc@v1.71.1/internal/grpcsync/callback_serializer.go:52 +0x118
@cruvie cruvie added the type/bug label Apr 1, 2025
@siyuanfoundation
Copy link
Contributor

Thanks @cruvie for raising the issue.
We have tests for the resolver in tests/integration/clientv3/naming/resolver_test.go. Would you be willing to add a unit test to reproduce your issue? This could help testing potential fix and catch future problems.

@cruvie
Copy link
Author

cruvie commented Apr 3, 2025

@siyuanfoundation Hi, I made a pr which includes two unit tests and TestEtcdGrpcResolverRoundRobinWithMetaInfo will fail

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Apr 3, 2025

@cruvie
Thanks for the PR. It's really helpful.
The failure is because they added https://github.com/grpc/grpc-go/blob/51d6a43ec59753d42ccd02bb12d2e9e40c164f0f/clientconn.go#L944 in grpc.
If the type of Metadata is not comparable, it would panic.

The issue is on the grpc side, they should probably use reflect.DeepEqual(a.Metadata, b.Metadata) instead of a.Metadata == b.Metadata, even though the Metadata is deprecated.

Meanwhile, we should add Attributes to deprecate Metadata in etcd (#19706)

@serathius @ahrtr Do you think it is necessary to revert back to google.golang.org/grpc v1.70.0?

@dfawley
Copy link

dfawley commented Apr 4, 2025

The Metadata field is in an experimental package. And it has been marked as deprecated since 2019.

See also #15145 from 2023.

If you are using gRPC APIs that are clearly marked as experimental, you need to vendor gRPC, or you need to consider yourself experimental.

If some functionality you need is experimental and there is no way to get it through stable APIs, then I'm always happy to discuss how we can make that happen. gRPC has pending changes we'd like to make to these packages that are mostly just blocked on etcd's lingering usages of these APIs.

@cruvie
Copy link
Author

cruvie commented Apr 4, 2025

they are working on it grpc/grpc-go#8225

for etcd client next version, I think make a break change to remove Metadata any


and add Attributes *attributes.Attributes like
https://github.com/grpc/grpc-go/blob/ce35fd41c56908f4e703e3fd63cf85b2cba9d8f1/resolver/resolver.go#L107
is ok, since Metadata has not been fully tested for a long time?

@dfawley
Copy link

dfawley commented Apr 4, 2025

I'm not sure what the right replacement is, until I understand what it is you need to do. It could be that you don't need a custom resolver at all anymore. With the advent of https://pkg.go.dev/google.golang.org/grpc#ClientConnInterface - added in Jan 2020 - I think most of the reasons for having a custom resolver can be replaced by using multiple gRPC channels and wrapping them, similar to how GCP does it:

https://github.com/googleapis/google-api-go-client/blob/bac81c61de0c586306eb621dcabca1687fb99753/transport/grpc/pool.go#L31

If that wouldn't work: why?

@cruvie
Copy link
Author

cruvie commented Apr 5, 2025

In my use case, I overlooked the relation between metadata and resolver. I just want to store some additional information to Endpoint.

So for my use case add Attributes *attributes.Attributes as a replacement of metadata is a reasonable solution.

cruvie added a commit to cruvie/kk_etcd_go that referenced this issue Apr 5, 2025
@ahrtr
Copy link
Member

ahrtr commented Apr 7, 2025

etcd itself isn't affected by this problem. But I am not sure whether grpcproxy is affected, because the metadata is set as a struct (which I believe will be a map as well after marshaling & Unmarshaling). Given our grpcproxy related e2e / integration tests all pass, so I assume it isn't affected either.

endpoint := endpoints.Endpoint{Addr: addr, Metadata: getMeta()}

func getMeta() string {
hostname, _ := os.Hostname()
bts, _ := json.Marshal(meta{Name: hostname})
return string(bts)
}

So in general, I don't want to patch this local issue. Instead, we should spend more effort on #15145 to resolve such problems once for all. We need to have a summary on existing design/implementation first, and proposals to replace them later, and invite grpc's experts to review. Just updated the #15145 as well. cc. @fuweid @serathius

@ahrtr
Copy link
Member

ahrtr commented Apr 15, 2025

For this specific issue, we still expect grpc can fix it, grpc/grpc-go#8227 (comment)

For long-term solution, we need experienced contributors can help us to drive/work on #15145

If some functionality you need is experimental and there is no way to get it through stable APIs, then I'm always happy to discuss how we can make that happen. gRPC has pending changes we'd like to make to these packages that are mostly just blocked on etcd's lingering usages of these APIs.

@dfawley Thanks for the comment. As mentioned in #15145 (comment), etcd's usage on grpc-go's resolver package is really very simple and basic, do you think any further change on grpc-go side will break etcd? I may draft a simple google doc to summarize etcd's usage on grpc-go's resolver.

@dfawley
Copy link

dfawley commented Apr 16, 2025

do you think any further change on grpc-go side will break etcd

Generally: anything marked as experimental (e.g. the entire resolver and balancer packages) should not be used by any other library unless that library also declares itself as experimental or vendors grpc-go. We reserve the right to modify or delete any or all of it, though we will try our best to make changes responsibly (provide deprecation notices first, and a migration path to enable a smooth transition). The Metadata field in particular would be nice to just delete entirely, as we expressed the intent to do so in 2019.

And as mentioned before, if you believe you need our experimental APIs, I'm happy to discuss other ways you may be able to achieve your goals, or work to find a concrete path toward stabilization.

@ahrtr
Copy link
Member

ahrtr commented Apr 17, 2025

Thanks for the feedback.

Generally: anything marked as experimental (e.g. the entire resolver and balancer packages) should not be used by any other library unless that library also declares itself as experimental or vendors grpc-go.

Theoretically I agree with this. But unfortunately it has been years for etcd to running on grpc-go's experimental features, and maintainers turnover slows down the overall development process. Both etcd and grpc-go are part of the big cloud native ecosystem. We will have to discuss and move forward on this basis.

The Metadata field in particular would be nice to just delete entirely, as we expressed the intent to do so in 2019.

If you are planning to remove Metadata only, and still keep the resolver package, then we are happy to make the change immediately.

And as mentioned before, if you believe you need our experimental APIs, I'm happy to discuss other ways you may be able to achieve your goals, or work to find a concrete path toward stabilization.

As we discussed in 15145, i.e. #15145 (comment), completely migrating from the resolver package to other solution (i.e. maintaining & wrapping multiple gRPC channels/connections) is non-trivial effort. But anyway, we can continue the discussion in #15145

@dfawley
Copy link

dfawley commented Apr 18, 2025

@ahrtr

If you are planning to remove Metadata only, and still keep the resolver package, then we are happy to make the change immediately.

I don't think we can ever remove the exported resolver package, but it will be undergoing future changes (some of which are explained here), and we should figure out whether there are other options that enable you to move off of it.

As referenced above (#19700 (comment)), I wonder if you can use the GCP gRPC connection pool directly even? googleapis/google-api-go-client@bac81c6/transport/grpc/pool.go#L31

@ahrtr
Copy link
Member

ahrtr commented Apr 21, 2025

I don't think we can ever remove the exported resolver package, but it will be undergoing future changes (some of which are explained here), and we should figure out whether there are other options that enable you to move off of it.

thx for the feedback. Then we will plan to resolve #19706 (comment).

We already evaluated the future changes on grpc-go side about 2 years ago (#15145 (comment)), we don't think it will break etcd. Please correct us if we are wrong. thx

As referenced above (#19700 (comment)), I wonder if you can use the GCP gRPC connection pool directly even? googleapis/google-api-go-client@bac81c6/transport/grpc/pool.go#L31

Not only etcd itself uses the resolver package, but also a client utility package (which has already exposed to etcd users for years) use it. refer to https://etcd.io/docs/v3.5/dev-guide/grpc_naming/. Any refactoring might not be a trivial change.

Also honestly, we don't have strong motivation to rafactor it as long as the future changes on grpc-go do not break etcd. But we are happy to evaluate any PoC if present. cc @amosehiguese

BTW, I just raised a related ticket #19772.

@dfawley
Copy link

dfawley commented Apr 21, 2025

We already evaluated the future changes on grpc-go side about 2 years ago (#15145 (comment)), we don't think it will break etcd. Please correct us if we are wrong. thx

As covered in the plans, we intend to delete resolver.Address.Metadata, so that part would definitely affect you until you remove that usage. Also we will be deleting the Addresses field from resolver.State.

Also honestly, we don't have strong motivation to rafactor it as long as the future changes on grpc-go do not break etcd.

The reason I'm raising this issue is because I would like to make breaking changes to these experimental APIs. Your usage is the only reason we have not done so yet. By definition, if you are using an experimental API, you become experimental as well, unless you vendor. You cannot be stable if you have unstable dependencies.

At some point, we'll have to deal with the pain that these changes will cause. Better to do it ASAP than later. And if possible, it's better if we can find a non-experimental way to move forward, than to change to a new experimental API.

@ahrtr
Copy link
Member

ahrtr commented Apr 21, 2025

we intend to delete resolver.Address.Metadata

It's already tracked in #19706. We will fix it.

Also we will be deleting the Addresses field from resolver.State

So we should use/set Endpoint.Addresses?

By definition, if you are using an experimental API, you become experimental as well

Yes, I am aware of that and theoretically agreed it. The problem is that there is a user-facing package/utility go.etcd.io/etcd/client/v3/naming/resolver, which has existed for years. It's a breaking change to users.

#19772 (comment)

experimental APIs, which we do reserve the right to remove or change at any time.

It's concerning. You mentioned above that the future changes are planned in grpc/grpc-go#6472. My understanding is the direction is to stablize the resolver package. But you also said you might delete at any time. Can you please finalize the grpc-go side change?

@dfawley
Copy link

dfawley commented Apr 21, 2025

So we should use/set Endpoint.Addresses?

Yes, that is the replacement. An endpoint is logically a single server, and it may have multiple addresses that reference it (e.g. separate IPv4 and IPv6 addresses). So if migrating, you need to be conscientious of this and use the two nested fields (Endpoints & Addresses) accordingly.

Yes, I am aware of that and theoretically agreed it. The problem is that there is a user-facing package/utility go.etcd.io/etcd/client/v3/naming/resolver, which has existed for years. It's a breaking change to users.

The good news is that any users doing this also have to use WithResolvers in order to use this API, which is, itself, experimental. So they must be willing to tolerate breakages in order to be using it.

It might be worth taking some time to mark this package as experimental and/or deprecated so that:

  1. Your existing users have time to find an alternative that is stable, and
  2. new users will be less likely to start using it.

The problem only becomes worse over time if you don't make this clear.

I think we should also determine if you can provide an similar package that doesn't need experimental APIs to function. Maybe you can provide a grpc.Dial equivalent that returns something that implements grpc.ClientConnInterface (similar to the GCP connection pool referenced above) instead?

Short of any of that, I believe you need to start vendoring gRPC.

It's concerning. You mentioned above that the future changes are planned in grpc/grpc-go#6472. My understanding is the direction is to stablize the resolver package. But you also said you might delete at any time. Can you please finalize the grpc-go side change?

This isn't as simple as you make it sound. The API is going to remain experimental until all gRPC library languages (Go, C++, Java, and Node, at least) can agree on the needs and semantics of the name resolver functionality. Since I have no expected end date for that, it would be great if we can pursue any and all alternatives for now instead.

@ahrtr
Copy link
Member

ahrtr commented Apr 22, 2025

Yes, that is the replacement. An endpoint is logically a single server, and it may have multiple addresses that reference it (e.g. separate IPv4 and IPv6 addresses). So if migrating, you need to be conscientious of this and use the two nested fields (Endpoints & Addresses) accordingly.

thx for the confirmation & info. Just raised #19780 to fix it. Currently it assumes each endpoint has only one address (note it keeps the same behaviour as current implementation). If an endpoint has two addresses, then it may be connected by more clients; we will take care of this minor problem separately.

The good news is that any users doing this also have to use WithResolvers in order to use this API, which is, itself, experimental. So they must be willing to tolerate breakages in order to be using it.

I will bring this to our community meeting. I think we might just want to mark it as experimental (keep consistent with grpc-go), and it may be changed or removed at any time. If the feature gets stable on grpc-go side in future, we may also want to stabilise it by them.

Short of any of that, I believe you need to start vendoring gRPC.

I don't think vendoring is relevant here. Vendoring is for the case of a repo being completely removed. Usually locking on a specific grpc-go version is a workaround for such case.

This isn't as simple as you make it sound. The API is going to remain experimental until all gRPC library languages (Go, C++, Java, and Node, at least) can agree on the needs and semantics of the name resolver functionality. Since I have no expected end date for that, it would be great if we can pursue any and all alternatives for now instead.

thx for the feedback. Currently we will mark the user-facing utility package as experimental (to be discussed), and try to resolve the two issues (Addresses to be removed from resolve.State and Metadata to be removed from resolver.Address) for now.

In term of other alternative (completely remove resolver), it might not happen any time soon. we will keep discussing it in #15145

@amosehiguese
Copy link
Contributor

amosehiguese commented Apr 23, 2025

etcd implements the resolver.Builder interface. The implementation is really simple, just has dozens line of code.\nIt only uses resolver.UpdateState\nThe resolver.Builder is passed to grpc.DialContext using grpc.WithResolvers\nAs referenced above (#19700 (comment)), I wonder if you can use the GCP gRPC connection pool directly even? googleapis/google-api-go-client@bac81c6/transport/grpc/pool.go#L31

You’re right, the current use of the resolver.Builder interface in etcd is quite minimal, and from what I’ve seen, it doesn’t seem to use any of the deeper internals of the experimental API beyond UpdateState. If things remain that way, then yeah, breaking changes might not directly affect etcd right now.

That said, I think the concern is more future-facing, for example, with gRPC planning to deprecate types like Address in favor of Endpoint, there's potential for the surface area of change to grow. Even if the impact isn’t immediate, it might be worth keeping an eye on how these changes evolve.

Not only etcd itself uses the resolver package, but also a client utility package (which has already exposed to etcd users for years) use it. refer to https://etcd.io/docs/v3.5/dev-guide/grpc_naming/. Any refactoring might not be a trivial change.

The fact that the client utility package also relies on the resolver (and has been exposed to users for a while now) definitely adds more weight to the situation. It’s not just etcd’s internal usage anymore, users have likely built tools and workflows around it, so any refactor here is far from a quick fix.

Also honestly, we don't have strong motivation to rafactor it as long as the future changes on grpc-go do not break etcd.

That’s totally understandable. If things aren’t breaking uncontrollably right now, it makes sense not to rush into a refactor just for the sake of it.

However, @dfawley continues to highlight that breaking changes are expected.

But we are happy to evaluate any PoC if present. cc @amosehiguese

Honestly, at this point I’ve kind of run out of PoCs — most of the core ideas and possible directions have already been laid out in this discussion. From adapter patterns (I initially thought of this but requires careful examination still), to vendoring to exploring alternatives like the GCP pool, I think the groundwork is all there now.

That said, whatever direction etcd’s team decide to take; whether it's a refactor, a temporary workaround, or just holding off, I’m totally at your disposal. If you want me to expand on any of the ideas, map out trade-offs, or help shape the next steps, just say the word. I'm readily available.

cc: @ahrtr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants