Skip to content
This repository was archived by the owner on Apr 4, 2019. It is now read-only.
This repository was archived by the owner on Apr 4, 2019. It is now read-only.

saving json data , partition by specific field (timestamp)  #58

@doriwaldman

Description

@doriwaldman

I have a question, data in kafka is in json format, in each event I have a field called"eventTimestamp" which is a long number which represents the event time , I want to save the data in s3 in hourly bucket based on that timestamp, not the time the event was added to Kafka

my settings when I used Kafka s3 connect are :

connector.class=io.confluent.connect.s3.S3SinkConnector
storage.class=io.confluent.connect.s3.storage.S3Storage
format.class=io.confluent.connect.s3.format.json.JsonFormat
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner
timestamp.extractor=RecordField
path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH
timestamp.field=eventTimestamp
partition.duration.ms=10
locale=en_IN
timezone=UTC

I see that streamx support TimeBasedPartitioner but if I understand it can only support to extract RecordField from parquet or avro not from json

Is it possible to do it with json ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions