Skip to content
This repository was archived by the owner on Mar 23, 2020. It is now read-only.

Conversation

@piercefreeman
Copy link

@piercefreeman piercefreeman commented Aug 15, 2018

Fixes a crash when trying to stream from byte-valued files.

Previously, we would read from the fileobj and try to append the value to our string-valued buffer, which would cause a type mismatch. By dynamically instantiating the buffer type we can retain the existing length logic while still returning a string to end clients.

Adding a PR here per discussion in internetarchive#26

@hungrymonkey
Copy link

I just want to tell you the modules doesnt work.

Python3.7

Traceback (most recent call last):
  File "print_all_urls.py", line 13, in <module>
    for record in f:
  File "/Users/psuedofinnish/git_repo/engage-warc-test/venv/src/warc/warc/warc.py", line 406, in __iter__
    record = self.read_record()
  File "/Users/psuedofinnish/git_repo/engage-warc-test/venv/src/warc/warc/warc.py", line 377, in read_record
    self.finish_reading_current_record()
  File "/Users/psuedofinnish/git_repo/engage-warc-test/venv/src/warc/warc/warc.py", line 371, in finish_reading_current_record
    self.current_payload.read()
  File "/Users/psuedofinnish/git_repo/engage-warc-test/venv/src/warc/warc/utils.py", line 69, in read
    return self._read(self.length)
  File "/Users/psuedofinnish/git_repo/engage-warc-test/venv/src/warc/warc/utils.py", line 79, in _read
    content = self.buf + self.fileobj.read(size)
TypeError: can only concatenate str (not "bytes") to str

Using this script.

or f_name in warc_files:
		f = warc.open( COLLECTIONS_DIR + str(f_name))
		for record in f:
			print( record['WARC-Target-URI'] )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants