Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
527 commits
Select commit Hold shift + click to select a range
d681daf
changed path check to be regex based and partition finalizer to use a…
Apr 28, 2016
a3fa9ff
Spelling fixes
jsoref Apr 28, 2016
740150a
Merge pull request #206 from jsoref/spelling
HenryCaiHaiying Apr 28, 2016
1b84c05
Merge pull request #205 from shawnsnguyen/master
HenryCaiHaiying May 2, 2016
21f492b
Cut version 0.17 and bump up the running snapshot version to 0.18-SNA…
May 2, 2016
ca0cf23
Make Uploader pluggable through secor.properties
lucamilanesio Apr 28, 2016
57317a5
Merge pull request #207 from lucamilanesio/pluggable-uploader-policy
HenryCaiHaiying May 4, 2016
807fac4
Alter path return prefix only, rather than whole path.
May 6, 2016
5adad74
Merge pull request #213 from shawnsnguyen/master
HenryCaiHaiying May 6, 2016
e08abca
Cut 0.18 and Bump up the version to 0.19-SNAPSHOT
May 6, 2016
90431fd
update hadoop uploader to use full s3 path
May 6, 2016
cd3a7ae
Merge pull request #214 from shawnsnguyen/master
HenryCaiHaiying May 6, 2016
3f329c1
Cut 0.19 and bump up the version to 0.20-SNAPSHOT
May 9, 2016
3c495ef
Expose existing Log file writers
lucamilanesio May 10, 2016
b295b61
Merge pull request #215 from lucamilanesio/expose-log-file-writer
HenryCaiHaiying May 10, 2016
ad3f7a2
update finalizer to use full path prefix
May 10, 2016
b1cf223
Merge remote-tracking branch 'upstream/master'
May 10, 2016
9149561
Merge pull request #216 from shawnsnguyen/master
HenryCaiHaiying May 11, 2016
a375781
Cut version 0.20 and bump up running snapshot to 0.21-SNAPSHOT
May 11, 2016
d25222b
update Readme.md
May 13, 2016
37eb462
update README.md
May 13, 2016
d7e8e04
update README.md
May 13, 2016
7b474cc
Merge pull request #217 from lovenishgoyal/master
HenryCaiHaiying May 13, 2016
c1a48fb
Removed absent parameters
live-wire May 18, 2016
237a27d
Merge pull request #218 from live-wire/master
HenryCaiHaiying May 18, 2016
1367a9d
Add Credit Karma as a company who uses Secor
zack-loebel-begelman-ck Jun 8, 2016
9299d81
Merge pull request #219 from zackloebelbegelman-ck/patch-2
HenryCaiHaiying Jun 8, 2016
9efe830
Added support for arbitrary (including nested) timestamp field in Google
mispecto Jun 23, 2016
3f1e8ce
Re-formatted using 4 spaces. Added logging.
mispecto Jun 24, 2016
d828e24
Re-format using 4 spaces, now for real.
mispecto Jun 25, 2016
fc79c10
Merge pull request #223 from spektom/master
HenryCaiHaiying Jun 25, 2016
4918940
Make secor.gs.credentials.path optional
Jun 28, 2016
5f06209
Merge pull request #225 from yunjing/yunjing.gs.credential
HenryCaiHaiying Jun 28, 2016
0afef6b
Added support for writing protobuf messages directly to Parquet
mispecto Jun 29, 2016
cf148be
Added documentation about Protobuf date parser. Added Appsflyer to the
mispecto Jun 29, 2016
620eb56
Added documentation about Protobuf date parser. Added Appsflyer to the
mispecto Jun 29, 2016
7071f1b
Removed unneeded field.
mispecto Jun 29, 2016
89c0c0b
Specify protobuf message class per Kafka topic or globally using '*'
mispecto Jul 3, 2016
cd1e2ef
Merge pull request #226 from spektom/master
HenryCaiHaiying Jul 3, 2016
0075166
Added documentation on Parquet output format
mispecto Jul 4, 2016
4b726b6
Merge pull request #228 from spektom/master
HenryCaiHaiying Jul 4, 2016
01c2563
Adding argument reading to LogVerifier
prasincs Jul 19, 2016
96de43c
Merge pull request #231 from prasincs/master
HenryCaiHaiying Jul 28, 2016
f461ac0
Uploads at minute mark for selected topics
prasincs Jul 29, 2016
ad0044b
File Coalescing: Enhance FileRegistry and LogFilePath to have the cap…
Aug 2, 2016
8597de6
Merge remote-tracking branch 'upstream/master'
Aug 2, 2016
f616ca6
Merge pull request #241 from HenryCaiHaiying/master
HenryCaiHaiying Aug 3, 2016
0008f77
Bump up the version to 0.21
Aug 3, 2016
558a6ab
Get the Iso8601 message parser to create correct log instance #243
agolotin Aug 3, 2016
7dfb9ad
Merge pull request #244 from agolotin/iso_message_parser_fix
HenryCaiHaiying Aug 3, 2016
f7a0c7b
Minor DESIGN.md wording enhancement
Aug 4, 2016
497b343
Merge pull request #245 from glikson/patch-1
jparise Aug 4, 2016
42f1e22
doing topicFilter parsing at Class init
prasincs Aug 8, 2016
f2b1825
Merge pull request #236 from prasincs/master
HenryCaiHaiying Aug 15, 2016
bd62a71
added timeout to qubole calls
Oct 3, 2016
7a8f77d
instead of return, throw io exception, and add skip qubole config
Oct 3, 2016
5d1a478
add config for timeout and add default jvm configs to common properties
Oct 4, 2016
4fbe3b5
merge conflicts in common properties
Oct 4, 2016
0fd9885
remove method default values for secor convention
Oct 4, 2016
6774d88
remove unneeded getLong method with default value
Oct 4, 2016
ae9e639
Merge pull request #259 from shawnsnguyen/qubole_client_fix
HenryCaiHaiying Oct 5, 2016
5d1c136
Bump up version to 0.23-SNAPSHOT and cut the version 0.22
Oct 5, 2016
dbd355e
Support Kafka 0.10 and bump version
dangoldin Oct 8, 2016
dce449b
Support protobuf 3.1.0 and Google Timestamp type
prasincs Oct 9, 2016
944d7c0
remove overloaded toMillis for protobuf
prasincs Oct 10, 2016
5508e4c
Merge pull request #264 from prasincs/protobuf-3-timestamps
HenryCaiHaiying Oct 10, 2016
896bf7f
Allow upload on oldest file per topic per partition
Oct 11, 2016
b6e68a2
Merge pull request #265 from morrifeldman/master
HenryCaiHaiying Oct 13, 2016
0033244
Update secor.prod.properties
ahsandar Oct 14, 2016
889d9af
Merge pull request #268 from ahsandar/master
HenryCaiHaiying Oct 14, 2016
50fe0de
expose an extractTimestampMillis on just bytes
prasincs Oct 17, 2016
3e734f1
Merge pull request #269 from prasincs/extract-timestamp-bytes
HenryCaiHaiying Oct 17, 2016
8c8738f
Update README.md
ahsandar Oct 17, 2016
deeb97c
Merge pull request #1 from ahsandar/ahsandar-patch-1
ahsandar Oct 17, 2016
c1e3410
Update secor.common.properties
ahsandar Oct 17, 2016
e6caaa9
Merge pull request #271 from ahsandar/master
HenryCaiHaiying Oct 17, 2016
aee3713
add folder structure for GCS, daily and hourly
ahsan-wego Oct 17, 2016
62394d4
Merge pull request #2 from ahsandar/ahsandar-patch-1
ahsandar Oct 17, 2016
24619fc
fixed config
ahsan-wego Oct 17, 2016
e26b10e
config updated
ahsan-wego Oct 18, 2016
289837d
flexible timestamp format
ahsan-wego Oct 20, 2016
b098248
Update secor.common.properties
ahsandar Oct 20, 2016
a1d682f
added mtDtFormat
ahsan-wego Oct 20, 2016
0204c8f
Merge branch 'flexible-partion-format' of github.com:ahsandar/secor i…
ahsan-wego Oct 20, 2016
ea9f377
updated tests
ahsan-wego Oct 20, 2016
0c37f9b
DailyOffsetMessagePArser
ahsan-wego Oct 21, 2016
61f4fed
Merge pull request #3 from ahsandar/flexible-partion-format
ahsandar Oct 21, 2016
859c9a5
changed access modifier
ahsandar Oct 22, 2016
507686f
Merge pull request #4 from ahsandar/change-accessor
ahsandar Oct 22, 2016
18a00c5
Merge pull request #275 from ahsandar/master
HenryCaiHaiying Oct 24, 2016
42f440f
Update README.md
ahsandar Oct 24, 2016
fd59512
Merge pull request #277 from ahsandar/master
HenryCaiHaiying Oct 24, 2016
896c958
partition parser
ahsan-wego Oct 27, 2016
53adb7e
removed fetch offset
ahsan-wego Oct 27, 2016
b1d76ca
Merge pull request #5 from ahsandar/partitioned-parser
ahsandar Oct 27, 2016
014095c
Update MessageParser.java
ahsandar Oct 28, 2016
b4d2566
updted tests
ahsan-wego Oct 28, 2016
777f855
removed unnecesary addition
ahsan-wego Oct 28, 2016
64bc862
Merge pull request #279 from ahsandar/master
HenryCaiHaiying Oct 28, 2016
840749e
parquet output for thrift messages support added
Oct 31, 2016
27872a6
documentations for new config added
jaimess Nov 2, 2016
18f02e7
Extend ProgressMonitorMain to run in a loop.
Nov 3, 2016
f984e7b
Make consumer group prefix on statsd metrics optional.
Nov 3, 2016
d732eee
fixup! Extend ProgressMonitorMain to run in a loop.
Nov 3, 2016
0126a60
fixup! Make consumer group prefix on statsd metrics optional.
Nov 3, 2016
b430154
Merge pull request #284 from liamstewart/ls/statsd_consumer_group_prefix
HenryCaiHaiying Nov 3, 2016
440f624
Merge pull request #283 from liamstewart/ls/monitor
HenryCaiHaiying Nov 3, 2016
13e79e1
Support build profiles for Kafka 0.8 and 0.10. Write tests (but comme…
dangoldin Nov 4, 2016
0e23190
remove gpg from Kafka 0.10 profile
dangoldin Nov 4, 2016
6cb0c6d
Allow auto.offset.reset to be specified in configuration.
Nov 3, 2016
c97fdd9
Merge pull request #286 from liamstewart/ls/auto_offset_reset
HenryCaiHaiying Nov 4, 2016
68db8e1
thrift version back to 0.5.0
jaimess Nov 8, 2016
0e13386
Added a new 0.8 dev profile without gpg
dangoldin Nov 9, 2016
6a826b3
Updated dev profiles to remove unused plugins
dangoldin Nov 10, 2016
8fa41be
Merge pull request #262 from dangoldin/support-kafka-0.10
HenryCaiHaiying Nov 10, 2016
50e83e0
Merge remote-tracking branch 'upstream/master'
jaimess Nov 11, 2016
e913278
Merge pull request #281 from jaimess/master
HenryCaiHaiying Nov 17, 2016
33a08dd
README: add newlines for human readability of cmd
Nov 21, 2016
be9ab92
Merge pull request #289 from rud-bookbites/patch-1
HenryCaiHaiying Nov 21, 2016
3ba2756
Change some members visibility for better extension
doron-levi Nov 27, 2016
e37e5ff
flexible delimiter added
ahsan-wego Nov 28, 2016
8d4d597
added flexible delimiter support
ahsan-wego Nov 28, 2016
704f1ca
Merge pull request #290 from doronl/master
HenryCaiHaiying Nov 28, 2016
ce36f7c
added null check
ahsan-wego Nov 29, 2016
b00d317
updated byte conversion logic
ahsan-wego Nov 29, 2016
3e83ae9
Merge pull request #6 from ahsandar/flexible-delimiter
ahsandar Nov 29, 2016
18aa59f
updated byte match logic for value and updated config
ahsan-wego Nov 30, 2016
07b3f54
Merge pull request #7 from ahsandar/flexible-delimiter
ahsandar Nov 30, 2016
994fa71
Fix DelimitedTextFileWriter.getLength
kpavel Nov 30, 2016
6e828a8
Merge pull request #292 from kpavel/master
HenryCaiHaiying Nov 30, 2016
cafc6b1
updated based on review
ahsan-wego Dec 2, 2016
53cd013
Merge pull request #8 from ahsandar/felxible-delimiter
ahsandar Dec 2, 2016
03e5e5b
updated delimiter write logic with a use of boolean
ahsan-wego Dec 5, 2016
3371f36
Merge pull request #9 from ahsandar/felxible-delimiter
ahsandar Dec 5, 2016
05dcc68
updated for lexible delimiter
ahsan-wego Dec 6, 2016
f3ecec4
Merge pull request #10 from ahsandar/felxible-delimiter
ahsandar Dec 6, 2016
deb583f
changed \ to \n
ahsan-wego Dec 7, 2016
cf4acd1
Merge pull request #291 from ahsandar/master
HenryCaiHaiying Dec 7, 2016
dd5d744
adds path style access
robmccoll Jan 20, 2017
252f719
AWS S3 path style access config + comment
robmccoll Jan 20, 2017
bbe8103
Merge pull request #298 from robmccoll/config/expose_aws_pathstyleaccess
HenryCaiHaiying Jan 20, 2017
a88dd58
if no prefix/topic, don't add empty component to path
robmccoll Jan 23, 2017
88e522a
style, null check for extra safety
robmccoll Jan 23, 2017
1fbad26
Merge pull request #299 from robmccoll/fix/logfilepath_extra_slashes
HenryCaiHaiying Jan 24, 2017
06099b4
zookerper should be zookeeper
Geesu Jan 24, 2017
e3579d9
Merge pull request #300 from Geesu/fix-typo-in-prod-properties
HenryCaiHaiying Jan 24, 2017
66e24e8
Added configurable date prefixes
dnguyen0304 Feb 7, 2017
a75117e
Fixed false positive test failures
dnguyen0304 Feb 8, 2017
aee073b
Merge pull request #304 from dnguyen0304/master
HenryCaiHaiying Feb 8, 2017
fda2d2b
add nanoseconds test case iso8601 parser
Feb 11, 2017
c240db7
Merge pull request #306 from gwilym/iso8601-nanosecond-test
HenryCaiHaiying Feb 13, 2017
f147b6b
remove Iso8601ParserTest::mISOFormat
Feb 14, 2017
6997b63
actually utilise DateMessageParserTest::mISOFormat
Feb 14, 2017
4e295e0
add nanoseconds and optional suffix tests to check and cover existing…
Feb 14, 2017
b8f0dd7
let DateMessageParser use partitioner.granularity.date.format
Feb 15, 2017
de5a604
Merge pull request #307 from gwilym/datemessageparsercustomdateformat
HenryCaiHaiying Feb 16, 2017
3221e88
Docker image for secor
Feb 17, 2017
2ab5b69
Merge pull request #308 from Smartling/docker-image
HenryCaiHaiying Feb 20, 2017
73352a6
Add case for transforming micors to millis
serzh Feb 21, 2017
827298d
Merge pull request #309 from KitApps/convert-micros
HenryCaiHaiying Feb 21, 2017
47d7399
Add SplitFieldMessageParser
Feb 22, 2017
34f6657
SplitByFieldMessageParser - Make partition finalization logic throw u…
Feb 24, 2017
2afb0a9
Merge pull request #310 from Smartling/split-by-field-parser
HenryCaiHaiying Feb 24, 2017
962f32f
Added zookeeper path to docker
ryansmithevans Feb 28, 2017
fbe2c4e
Merge pull request #311 from ryansmithevans/master
HenryCaiHaiying Feb 28, 2017
3aaeb92
added message parser option to docker image
ryansmithevans Mar 1, 2017
fb950db
More docker mappings to config
ryansmithevans Mar 1, 2017
6006dd0
Merge pull request #312 from ryansmithevans/master
HenryCaiHaiying Mar 2, 2017
393b7e6
typo on transformation
prasincs Mar 2, 2017
b7c9049
Merge pull request #313 from prasincs/patch-1
HenryCaiHaiying Mar 2, 2017
a98e1bf
fixed config name
ahsan-wego Mar 2, 2017
97c6a47
Merge pull request #11 from ahsandar/fix-config-nmae
ahsandar Mar 2, 2017
826b020
Merge pull request #315 from ahsandar/master
HenryCaiHaiying Mar 3, 2017
7141a4b
Initialize secor_group JVM parameter with value of SECOR_GROUP enviro…
Mar 14, 2017
0ff4c92
Merge pull request #322 from Smartling/docker-secor-group
HenryCaiHaiying Mar 14, 2017
4189570
Bump log4j library to 1.2.17 to support EnhancedPatternLayout
weichuliu Mar 17, 2017
49b2fc3
Merge pull request #323 from liuweichu/master
HenryCaiHaiying Mar 17, 2017
09d553a
MetricCollector component for ingesting secor performance metrics int…
Mar 2, 2017
e50c1e5
Merge pull request #316 from Smartling/performance-monitoring
HenryCaiHaiying Mar 23, 2017
458ab80
Updates the 0.10 kafka client to the latest version.
maxthomas Apr 6, 2017
be39fac
Merge pull request #327 from maxthomas/update-kafka-client
HenryCaiHaiying Apr 6, 2017
ffc69e7
bump parquet version, expose configuration variables
thegeebe Apr 17, 2017
562085f
Merge pull request #329 from mira/master
HenryCaiHaiying Apr 20, 2017
80d05f9
Update ThriftParquetFileReaderWriterFactory.java
jaimess May 9, 2017
6fd65b2
Orc reader/writer added
May 9, 2017
7b34471
Merge pull request #333 from jaimess/patch-1
HenryCaiHaiying May 9, 2017
86467b4
Added support for custom ORC-schema-provider implementation
May 11, 2017
068b84b
Logger added in case of json parsing exception
May 12, 2017
73d0ace
Update README.md
ahsandar May 12, 2017
726aa37
Google Maps dependency removed
May 12, 2017
7dbc356
Java 7 issue fixed
May 12, 2017
d9f3de3
Update README.md
ahsandar May 12, 2017
046712e
Merge pull request #335 from ahsandar/patch-1
HenryCaiHaiying May 12, 2017
1e37af4
Doc added
May 12, 2017
88e5a06
Merge branch 'master' into master
ashubhumca May 12, 2017
5a41fa3
Merge pull request #334 from ashubhumca/master
HenryCaiHaiying May 13, 2017
a9a57c1
Upgraded kafka version with new tag
May 16, 2017
12ac02b
Reverting change of version
May 16, 2017
dd323de
Merge pull request #337 from tygrash/master
HenryCaiHaiying May 16, 2017
33e3029
Bump up the version to 0.23
May 16, 2017
872a1ee
Bump up the version to 0.24-SNAPSHOT
May 16, 2017
b6f11d1
Added timestamp in Message class and populating it while extracting o…
May 17, 2017
8fcd6a7
Added config property to use kafka message timestamp.
May 17, 2017
a8b532f
Added usage of kafka timestamp in partition array generation
May 17, 2017
4e2d9e0
Revert "Added usage of kafka timestamp in partition array generation"
May 18, 2017
cff7763
Adding kafka timestamp extraction logic in base parser class. Added t…
May 18, 2017
fbeb3af
Calling config.useKafkaTimestamp() instead of extracting it again
May 18, 2017
9cf87fc
Merge pull request #338 from tygrash/master
HenryCaiHaiying May 18, 2017
40773fe
Calling check-safe extraction for metrics class
May 19, 2017
f1913cc
Removing unwanted new lines
May 19, 2017
63fcad5
Merge pull request #340 from tygrash/master
HenryCaiHaiying May 19, 2017
f14d109
Stores Kafka message timestamp as part of the encoded information
nachomdo Jun 12, 2017
897e190
Merge pull request #344 from nachomdo/timestamp-message-pack
HenryCaiHaiying Jun 14, 2017
c23ce24
pom.xml: bump aws sdk to 1.11.160
proger Jul 8, 2017
b4e6837
rebuild Iso8601MessageParser on top of TimestampedMessageParser
proger Jul 8, 2017
9c78138
Merge pull request #352 from proger/fix-iso8601
HenryCaiHaiying Jul 9, 2017
3946170
Merge pull request #351 from proger/aws-sdk-1.11.160
HenryCaiHaiying Jul 9, 2017
e88e987
Fix for backward compatibility in case of kafkaTimestamp usage
Jul 11, 2017
3b4f952
Fix for NPE
Jul 11, 2017
b1906d1
Trying fix NPE in setting timestamp
Jul 11, 2017
450e95e
Changed long to Long in KeyValue.java to allow null fields for timestamp
Jul 11, 2017
94270c8
Fixing failing test for MessagePackSequenceFileReaderWriterFactoryTes…
Jul 11, 2017
c2cb770
Moved dynamic class creation to constructor of timestamp factory and …
Jul 14, 2017
1c1ba5e
Long to long in timestamp classes
Jul 14, 2017
45a8f43
Changed timestamp type from Long to long in Message class
Jul 14, 2017
c314684
Convert Long to long in KeyValue constructor
Jul 15, 2017
dbac4da
Merge pull request #353 from tygrash/master
HenryCaiHaiying Jul 16, 2017
a0764ec
Modified README.md for current default active profile
Jul 16, 2017
54b6274
Merge pull request #354 from tygrash/master
HenryCaiHaiying Jul 16, 2017
2dfd2f7
Bump up version to v0.24
Jul 17, 2017
77200c8
Switch to 0.25-SNAPSHOT
Jul 17, 2017
ca092d9
Catch no committed message
Fluxx Jul 28, 2017
2067302
Match to built in errors enum
Fluxx Jul 28, 2017
2dbf16a
Log a warning if getCommittedMessage is unable to find a message
Fluxx Jul 28, 2017
4de6fec
Simply skip reporting stats if there is no latest message
Fluxx Jul 28, 2017
4975c8e
Merge pull request #356 from strava/fix_missing_topic
HenryCaiHaiying Jul 28, 2017
97b5b11
Bump the version to 0.25
Sep 21, 2017
68d700e
Set the version to 0.26-SNAPSHOT
Sep 21, 2017
75fe0cf
Per https://github.com/travis-ci/travis-ci/issues/7884, oraclejdk7 is…
Sep 21, 2017
c4d1811
Document Microsoft Azure Blob Storage support
Oct 11, 2017
18c8130
Merge pull request #366 from optiomas/patch-2
HenryCaiHaiying Oct 11, 2017
c96cee4
Use official Docker image "openjdk" instead of "java"
jethrocarr Oct 12, 2017
8fc1e18
Merge pull request #367 from jethrocarr/master
HenryCaiHaiying Oct 12, 2017
53901e7
#369 fix handling null values for orc file generation
Oct 13, 2017
3b0912e
Merge pull request #370 from attilagenc/master
HenryCaiHaiying Oct 16, 2017
e3827c4
Adding custom topic name implementation
Nov 29, 2017
50758c8
Style fixes
Dec 4, 2017
99223ad
Adding custom topics names on LogFilePath
Dec 4, 2017
b2c5886
Fix set method to custom topic name
Dec 5, 2017
8b9b1a7
Fix bug on custom topic name
Dec 7, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Compiled source #
###################
*.com
*.class
*.dll
*.exe
*.o
*.so

# Packages #
############
# it's better to unpack these files and commit the raw source
# git has its own built in compression methods
*.7z
*.dmg
*.gz
*.iso
*.jar
*.rar
*.tar
*.zip

# Logs and databases #
######################
*.log
*.sql
*.sqlite

# OS generated files #
######################
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

*.iml
.idea
target
bin/
.settings/
.project
.classpath
.m2
19 changes: 19 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
language: java
addons:
hosts:
- test-bucket.localhost
env:
- PATH=$PATH:$HOME/.s3cmd SECOR_LOCAL_S3=true S3CMD=1.0.1
jdk:
- openjdk7
- oraclejdk8
before_install:
- wget https://github.com/s3tools/s3cmd/archive/v$S3CMD.tar.gz -O /tmp/s3cmd.tar.gz
- tar -xzf /tmp/s3cmd.tar.gz -C $HOME
- mv $HOME/s3cmd-$S3CMD $HOME/.s3cmd
- cd $HOME/.s3cmd && python setup.py install --user && cd -
- gem install fakes3 -v 0.1.7
script:
- make unit
- make integration

43 changes: 39 additions & 4 deletions DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,12 @@ This document assumes familiarity with [Apache Kafka].

* **zero downtime upgrades:** it should be possible to upgrade the system to a new version in a way transparent to the downstream data clients,

* **dependence on public APIs:** the system should reply on public [Kafka] APIs only. Furthermore, it should be compatible with the most recent [Kafka] version (0.8) which offers significant improvements over 0.7, and it comes with Go language bindings (required by other pieces of the Ads infra).
* **dependence on public APIs:** the system should rely on public [Kafka] APIs only. Furthermore, it should be compatible with [Kafka] version 0.8.

No-goals:

* **minimized resource footprint:** this may become an important objective at some point but currently we don’t optimize for machine or storage footprint.

Secor will be initially used to persist Ads impression logs but in the future it may be considered as a replacement of the current logging pipeline.

## Related work

There is a number of open source [Kafka] consumers saving data to [S3]. To the best of our knowledge, none of them is
Expand Down Expand Up @@ -160,10 +158,47 @@ uploader.check_policy() {

The output of consumers is stored on local (or EBS) disks first and eventually uploaded to s3. The local and s3 file name format follows the same pattern. Directory paths track topic and partition names. File basename contains the Kafka partition number and the Kafka offset of the first message in that file. Additionally, files are labeled with generation count. Generation is basically a version number of the Secor software that increments between non-compatible releases. Generations allow us to separate outputs of Secor versions during testing, rolling upgrades, etc. The consumer group is not included explicitly in the output path. We expect that the output of different consumer groups will go to different top-level directories.

Putting this all together, a message with timestamp `<some_date:some_time>` written to topic `<some_topic>`, Kafka partition `<some_kafka_partition>` at offset `<some_offset>` by software with generation `<generation>` will end up in file `s3://logs/<some_topic>/<some_date>/<generation>_<some_kafka_parition>_<first_message_offset>.seq` where `<first_message_offset>` <= `<some_offset>`.
Putting this all together, a message with timestamp `<some_date:some_time>` written to topic `<some_topic>`, Kafka partition `<some_kafka_partition>` at offset `<some_offset>` by software with generation `<generation>` will end up in file `s3://logs/<some_topic>/<some_date>/<generation>_<some_kafka_partition>_<first_message_offset>.seq` where `<first_message_offset>` <= `<some_offset>`.

The nice property of the proposed file format is that given a list of output files and a Kafka message, we can tell which file contains the output for that message. In other words, we can track correspondence between the output files of different consumer groups. For instance, assume that a bug in the code resulted in logs for a given date being incorrectly processed. We now need to remove all output files produced by the partition group and regenerate them from the files written by the backup group. The composition of file paths guarantees that we can tell which backup files contain the relevant raw records from the names of the removed partition group output files.

## Output file formats

Secor supports two different output file formats with different capabilities.

### Text File

The Delimited Text File output format writes individual messages as raw bytes, separated by newline characters. Thus,
it is generally only appropriate for non binary messages that do not contain embedded newlines. No other metadata
about the message is recorded.

### Hadoop SequenceFile

The [SequenceFile](https://wiki.apache.org/hadoop/SequenceFile) format writes out the message body in the **value**
field of the SequenceFile record. It supports two different modes for storing additional metadata in the **key**
field of the SequenceFile.

#### Legacy

In the default, legacy mode, the kafka partition offset is stored in the key field as an 8 byte long value in big
endian format.

#### MessagePack

In the optional, [MessagePack](http://msgpack.org/index.html) mode, the key is a binary structure encoded using the
MessagePack specification. MessagePack is a hierarchical map datastructure like JSON, but has a more compact, binary
representation, and support for more types.

The MessagePack map stored in the SequenceFile key has its Secor keys stored using integer values, for compactness.
The currently defined Secor keys, their meanings, and their associated MessagePack value types are explained below.

| Key | Meaning | MessagePack Value Type |
| ------------ | ---------------------- | ---------------------- |
| 1 | kafka partition offset | 64 bit Integer |
| 2 | kafka message key | Raw Binary byte array |

Note that if the kafka message has no key, then the field will be omitted from the the MessagePack.

## New consumer code rollouts

The upgrade procedure is as simple as killing consumers running the old version of the code and letting them pick up new binaries upon restart. Generation numbers provide output isolation across incompatible releases.
Expand Down
9 changes: 9 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM openjdk:8

RUN mkdir -p /opt/secor
ADD target/secor-*-bin.tar.gz /opt/secor/

COPY src/main/scripts/docker-entrypoint.sh /docker-entrypoint.sh
RUN chmod +x /docker-entrypoint.sh

ENTRYPOINT ["/docker-entrypoint.sh"]
35 changes: 35 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
CONFIG=src/main/config
TEST_HOME=/tmp/secor_test
TEST_CONFIG=src/test/config
JAR_FILE=target/secor-*-SNAPSHOT-bin.tar.gz
MVN_OPTS=-DskipTests=true -Dmaven.javadoc.skip=true
CONTAINERS=$(shell ls containers)

build:
@mvn package $(MVN_OPTS)

unit:
@mvn test

integration: build
@rm -rf $(TEST_HOME)
@mkdir -p $(TEST_HOME)
@tar -xzf $(JAR_FILE) -C $(TEST_HOME)
@cp $(TEST_CONFIG)/* $(TEST_HOME)
@[ ! -e $(CONFIG)/core-site.xml ] && jar uf $(TEST_HOME)/secor-*.jar -C $(TEST_CONFIG) core-site.xml
@[ ! -e $(CONFIG)/jets3t.properties ] && jar uf $(TEST_HOME)/secor-*.jar -C $(TEST_CONFIG) jets3t.properties
cd $(TEST_HOME) && ./scripts/run_tests.sh

test: build unit integration

container_%:
docker build -t secor_$* containers/$*

test_%: container_%
@mkdir -p .m2
docker run -v $(CURDIR)/.m2:/root/.m2:rw -v $(CURDIR):/work:rw secor_$* sh -c "echo 127.0.0.1 test-bucket.localhost >> /etc/hosts && make clean test"

docker_test: $(foreach container, $(CONTAINERS), test_$(container))

clean:
rm -rf target/
Loading