Skip to content

Quiet Spark Logging #5

@alope107

Description

@alope107

There is a huge amount of logging by Spark by default which clutters up the terminal and confuses new users. Findspark should cut down on this logging. @freeman-lab recommended using the following to change the logging level at runtime:

log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

This could be implemented in Findspark by monkey-patching the SparkContext like so:

import pyspark
old_init = pyspark.SparkContext.__init__
def new_init(self, *args, **kwargs):
    old_init(self, *args, **kwargs)
    log4j = self._jvm.org.apache.log4j
    log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
pyspark.SparkContext.__init__ = new_init

This however feels like a fragile solution to me. We could instead modify the logger properties files at $SPARK_HOME/conf/log4j.properties but this changes the logging for all uses of Spark, and may be too heavyweight of a solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions