reliability(consumers): add end to end test (WIP) #7266

volokluev · 2025-06-30T23:32:33Z

Up until now we had no way to sanity check that our consumer implementation put all the rows in the database it needed to. Add a test which spins up the consumer along with kafka and clickhouse

semgrep-code-getsentry · 2025-06-30T23:36:06Z

test_consumer_e2e/test_items_consumer.py

+from confluent_kafka import Message as KafkaMessage
+from confluent_kafka import Producer
+from confluent_kafka.admin import AdminClient, NewTopic
+from google.protobuf.timestamp_pb2 import Timestamp


High severity and reachable issue identified in your code:
Line 11 has a vulnerable usage of protobuf, introducing a high severity vulnerability.

ℹ️ Why this is reachable

A reachable issue is a real security risk because your project actually executes the vulnerable code. This issue is reachable because your code uses a certain version of protobuf.
Affected versions of protobuf are vulnerable to Uncontrolled Recursion. The pure-Python implementation of Protocol Buffers is vulnerable to a denial-of-service attack when processing untrusted data with deeply nested or recursive groups/messages, potentially causing the Python recursion limit to be exceeded.

References: GHSA, CVE

To resolve this comment:
Upgrade this dependency to at least version 5.29.5 at requirements.txt.

💬 Ignore this finding

To ignore this, reply with:

/fp <comment> for false positive

/ar <comment> for acceptable risk

/other <comment> for all other reasons

_{You can view more details on this finding in the Semgrep AppSec Platform here.}

onkar

Not sure if it's ready to review since the PR title has (WIP) in it, but it was open for review at the same time. LGTM mostly, left a few comments.

onkar · 2025-07-02T16:43:12Z

snuba/cli/rust_consumer.py

-    Experimental alternative to `snuba consumer`
-    """
-
+    breakpoint()


Unintended break point?

onkar · 2025-07-02T16:55:12Z

test_consumer_e2e/test_items_consumer.py

+        storage = get_storage(StorageKey("eap_items"))
+        storage.get_cluster().get_query_connection(
+            ClickhouseClientSettings.QUERY
+        ).execute("TRUNCATE TABLE IF EXISTS eap_items_1_local")


This will truncate after starting the consumer and consumer may get empty table mid-way. Is it safer to truncate before starting the consumer?

There's nothing on the consumer topic at the time it starts so no, it's not an issue

onkar · 2025-07-02T16:56:26Z

test_consumer_e2e/test_items_consumer.py

+        # Wait for consumer to initialize
+        time.sleep(2)


Will this be fragile? Can we poll until ClickHouse is ready to reduce the fragility?

I'm not waiting for clickhouse, I'm waiting for the consumer to start and I can start producing things to the topic before it fully starts

onkar · 2025-07-02T17:04:50Z

test_consumer_e2e/test_items_consumer.py

+            consumer_process.terminate()
+        except Exception:
+            pass
+        assert res.results[0][0] == num_items


Will this fail if some records aren't yet flushed? How about if we retry until a timeout instead of asserting immediately?

calling terminate on the consumer process should flush it

untitaker · 2025-07-07T11:32:10Z

you can probably write this test more reliably when done entirely from within rust. we already have some tests there. then no subprocess is necessary, you'd launch the main function directly

add end to end test

3064b36

volokluev requested a review from a team as a code owner June 30, 2025 23:32

volokluev changed the title ~~reliability(consumers): add end to end test~~ reliability(consumers): add end to end test (WIP) Jun 30, 2025

add workflow

0c198db

semgrep-code-getsentry bot reviewed Jun 30, 2025

View reviewed changes

onkar approved these changes Jul 2, 2025

View reviewed changes

phacops force-pushed the master branch from 7fca603 to c8950ad Compare September 7, 2025 04:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

reliability(consumers): add end to end test (WIP) #7266

reliability(consumers): add end to end test (WIP) #7266

Uh oh!

volokluev commented Jun 30, 2025

Uh oh!

semgrep-code-getsentry bot Jun 30, 2025

Uh oh!

onkar left a comment

Uh oh!

onkar Jul 2, 2025

Uh oh!

onkar Jul 2, 2025

Uh oh!

volokluev Jul 10, 2025

Uh oh!

onkar Jul 2, 2025

Uh oh!

volokluev Jul 10, 2025

Uh oh!

onkar Jul 2, 2025

Uh oh!

volokluev Jul 10, 2025

Uh oh!

untitaker commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

reliability(consumers): add end to end test (WIP) #7266

Are you sure you want to change the base?

reliability(consumers): add end to end test (WIP) #7266

Uh oh!

Conversation

volokluev commented Jun 30, 2025

Uh oh!

semgrep-code-getsentry bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

onkar left a comment

Choose a reason for hiding this comment

Uh oh!

onkar Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

onkar Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

volokluev Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

onkar Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

volokluev Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

onkar Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

volokluev Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

untitaker commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants