-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Description
There is a huge amount of logging by Spark by default which clutters up the terminal and confuses new users. Findspark should cut down on this logging. @freeman-lab recommended using the following to change the logging level at runtime:
log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
This could be implemented in Findspark by monkey-patching the SparkContext like so:
import pyspark
old_init = pyspark.SparkContext.__init__
def new_init(self, *args, **kwargs):
old_init(self, *args, **kwargs)
log4j = self._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)
pyspark.SparkContext.__init__ = new_init
This however feels like a fragile solution to me. We could instead modify the logger properties files at $SPARK_HOME/conf/log4j.properties but this changes the logging for all uses of Spark, and may be too heavyweight of a solution.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels