Skip to content

Added WARCTap(WARC Scheme and WARCCollector) and a simple testcase.#1

Open
akshaydixi wants to merge 2 commits intoept:masterfrom
akshaydixi:master
Open

Added WARCTap(WARC Scheme and WARCCollector) and a simple testcase.#1
akshaydixi wants to merge 2 commits intoept:masterfrom
akshaydixi:master

Conversation

@akshaydixi
Copy link
Copy Markdown

  1. Modified the gradle file to support dependencies too.
  2. Created a new package cascading.warc and added a Tap and Scheme for supporting WARC file formats based on com.martinkl.warc.mapred.WARC[Input,Output]Format
  3. Created a simple testcase for the new cascading tap.

Please review

…odified the gradle file to support dependencies too.
@ept
Copy link
Copy Markdown
Owner

ept commented Oct 15, 2014

Thank you for your contribution! A few issues/questions:

  • I wasn't able to build this, as Gradle couldn't find the dependencies jvyaml 1.0.0 and Cascading 2.5.5. Did you need to use a particular Maven repo or build them yourself?
  • You've added a dependency on Scalding. Does the code also work if you're using Cascading directly (without using Scalding)? If so, could we change this dependency? I'd like to avoid unnecessarily broad dependencies, because it bloats things and creates potential compatibility problems.
  • It seems like the com.backtype:dfs-datastores dependency is only used in the tests. Could we put it in the testCompile scope, to avoid cluttering the compile scope with unnecessary dependencies?
  • Do the classes have to be in the cascading.warc package? I'd prefer to keep everything under com.martinkl.warc if possible, as it's tidier.
  • More generally, I think it would be good to separate MapReduce support, Cascading support, and potential future support for other frameworks each into its own module/artifact, so that a user of one framework isn't forced to also get a big pile of unrelated dependencies. Do you want to do that split into modules? I'm happy to do it, but it may take me a little while to find time to work on it.

On 11 Oct 2014, at 10:31, Akshay Dixit notifications@github.com wrote:

Modified the gradle file to support dependencies too.
Created a new package cascading.warc and added a Tap and Scheme for supporting WARC file formats based on com.martinkl.warc.mapred.WARC[Input,Output]Format
Created a simple testcase for the new cascading tap.
You can merge this Pull Request by running

git pull https://github.com/akshaydixi/warc-hadoop master
Or view, comment on, or merge it at:

#1

Commit Summary

Added WARCTap(WARC Scheme and WARCCollector) and a simple testcase. Modified the gradle file to support dependencies too.
File Changes

M .gitignore (2)
M build.gradle (3)
A src/main/java/cascading/warc/WARCCollector.java (66)
A src/main/java/cascading/warc/WARCScheme.java (82)
A src/main/java/cascading/warc/WARCTap.java (90)
A src/test/java/cascading/warc/WARCTapTest.java (95)
Patch Links:

https://github.com/ept/warc-hadoop/pull/1.patch
https://github.com/ept/warc-hadoop/pull/1.diff

Reply to this email directly or view it on GitHub.

@akshaydixi
Copy link
Copy Markdown
Author

Sorry for the late response. Had completely forgotten about this PR.
In response to your questions:

  • I had those dependencies installed locally. After removing mavenLocal() and adding the repos, I successfully built it on a new machine. It should successfully test out now.
  • Removed the Scalding dependency and added the necessary Cascading dependencies instead
  • Shifted com.backtype:dfs-datastores to the testCompile scope
  • Shifted the classes to com.martinkl.warc package
  • Not really sure what you mean by this. Do you mean fleshing it out into com.martinkl.warc.hadoop and com.martinkl.warc.cascading ?

@andreimaximov
Copy link
Copy Markdown

@akshaydixi since this was never merged, is your contribution available elsewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants