Skip to content

Issue python-boilerpipe on docker #57

@lraghib

Description

@lraghib

I trying to use python-boilerpipe in docker, but the problem is the code block in the line

extractor = Extractor(extractor='ArticleExtractor',url=link, headers=self.headers)

without returning nothing, knowing that with out docker it work fine

my dockerfile looks like:

FROM tiangolo/uwsgi-nginx-flask:python3.6

RUN pip3 install --upgrade pip
# copy over our requirements.txt file
COPY requirements.txt /tmp/
WORKDIR /tmp/

# Install OpenJDK-11
# Install "software-properties-common" (for the "add-apt-repository")
RUN apt-get update
RUN apt-get install -y software-properties-common 
RUN add-apt-repository ppa:openjdk/ppa
RUN apt-get install -y openjdk-11-jdk && \
    apt-get install -y ant && \
    apt-get clean;

# Fix certificate issues
RUN apt-get install ca-certificates-java && \
    apt-get clean && \
    update-ca-certificates -f;

# Setup JAVA_HOME -- useful for docker commandline
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
RUN export JAVA_HOME
RUN export PATH=$PATH:/usr/lib/jvm/java-11-openjdk-amd64/bin
#Check java
RUN echo $JAVA_HOME

# boilerpipe
RUN git --version
RUN git config --global http.sslverify false
RUN git clone https://github.com/misja/python-boilerpipe.git
WORKDIR /tmp/python-boilerpipe/
RUN pip3 install -r requirements.txt
RUN python3 setup.py install



RUN pip3 install -r /tmp/requirements.txt


# copy over our app code
WORKDIR /app
COPY ./app /app


Expose 80/tcp

after some debugging i found that the line that cause that is

self.source = BoilerpipeSAXInput(InputSource(reader)).getTextDocument()

any idea how to solve this problem ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions