Skip to content

cliche crawler problem,  #93

@miaekim

Description

@miaekim

I tried to run crawler, but it didn't work.
Here's my command.

$ celery worker -A cliche.services.wikipedia.crawler --config dev.yml

And this is my dev.ml

database_url: 'postgresql:///cliche_db_'
broker_url: 'redis://localhost/0'
WIKIPEDIA_RETRY_LIMIT: 30
DEBUG: True
SECRET_KEY: 'abcd'
SENTRY_DSN: 'https://1:2@3:4/5'

This is error message.

[2015-03-05 20:01:11,658: WARNING/Worker-3] /Users/miaekim/rdflib/lib/python3.4/site-packages/celery/app/trace.py:364: RuntimeWarning: Exception raised outside body: TypeError("report_task_failure() got an unexpected keyword argument 'signal'",):
Traceback (most recent call last):
  File "/Users/miaekim/rdflib/lib/python3.4/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/Users/miaekim/rdflib/lib/python3.4/site-packages/celery/app/trace.py", line 437, in __protected_call__
    return self.run(*args, **kwargs)
  File "/Users/miaekim/cliche/cliche/services/tvtropes/crawler.py", line 181, in crawl_link
    result, tree, namespace, name, url = fetch_link(url, session)
  File "/Users/miaekim/cliche/cliche/services/tvtropes/crawler.py", line 145, in fetch_link
    name = tree.xpath('//div[@class="pagetitle"]/span')[0].text.strip()
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/miaekim/rdflib/lib/python3.4/site-packages/celery/app/trace.py", line 253, in trace_task
    I, R, state, retval = on_error(task_request, exc, uuid)
  File "/Users/miaekim/rdflib/lib/python3.4/site-packages/celery/app/trace.py", line 201, in on_error
    R = I.handle_error_state(task, eager=eager)
  File "/Users/miaekim/rdflib/lib/python3.4/site-packages/celery/app/trace.py", line 85, in handle_error_state
    }[self.state](task, store_errors=store_errors)
  File "/Users/miaekim/rdflib/lib/python3.4/site-packages/celery/app/trace.py", line 125, in handle_failure
    einfo=einfo)
  File "/Users/miaekim/rdflib/lib/python3.4/site-packages/celery/utils/dispatch/signal.py", line 166, in send
    response = receiver(signal=self, sender=sender, **named)
TypeError: report_task_failure() got an unexpected keyword argument 'signal'

  exc, exc_info.traceback)))

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions