You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been working on getting the spark job definition running locally. And although I've come along way, I'm not quite there yet. Thats way I'm reaching out over here, in the hope that someone on the development team can help me. My ultimate goal is to develop spark job definitions locally, be able to run unit tests on the created functions and deploy the locally developed spark job definition to fabric.
And one error I've got and was able to fix is the error: [ERROR] 2024-07-23 16:13:06.614 [Thread-3] SparkContext: Error initializing SparkContext.org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:59920
By adding the following config to the sparkconf.py. conf.set("spark.driver.host", "localhost")
And I'm wondering, is this a known error and should this setting been added to the sparkconfig.py file?
And now I'm getting a new error that i'm not sure how to fix:
[ForkJoinPool.commonPool-worker-25] PublicClientApplication: [Correlation ID: ae95b07b-05cf-4474-8657-33986c324f1e] Execution of class com.microsoft.aad.msal4j.AcquireTokenByDeviceCodeFlowSupplier failed.
com.microsoft.aad.msal4j.MsalServiceException: AADSTS70020: The provided value for the input parameter 'device_code' is not valid. This device code has expired. Trace ID: 1785d0b5-7034-40f6-b231-1f4f816b6300 Correlation ID: ae95b07b-05cf-4474-8657-33986c324f1e Timestamp: 2024-07-2
Do you know how to fix this error?
Two other observation are:
When I'm running my code locally, the following code leads to an error. Running the same code in fabric works: print("spark.synapse.pool.name : " + spark_context.getConf().get("spark.synapse.pool.name"))
I'm getting the following error:
Traceback (most recent call last):
File "c:\dev\fabric_vscode\28157445-4999-4c43-8d01-1d94f21dba1c\SparkJobDefinition\15e8cdfd-3ccd-45c1-8c61-9367de8b672b\ETL\createTablefromCSV.py", line 24, in <module>
print("spark.synapse.pool.name : " + spark_context.getConf().get("spark.synapse.pool.name"))
TypeError: can only concatenate str (not "NoneType") to str
Another thing that sparked my interest is the warning message:
[WARN ] 2024-07-23 17:02:49.508 [main] Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN"
Is there a way to fix this warning?
Keep up the good work. I'm hoping to develop a structured and modular spark job definition, instead of the multitude of notebooks that were using right now.
This is the file that I'm running: createTablefromCSV.py.txt
And the CSV that I'm referencing in the file: dimension_customer.csv
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I've been working on getting the spark job definition running locally. And although I've come along way, I'm not quite there yet. Thats way I'm reaching out over here, in the hope that someone on the development team can help me. My ultimate goal is to develop spark job definitions locally, be able to run unit tests on the created functions and deploy the locally developed spark job definition to fabric.
I've set-up VSCode based on the following youtube video and documentation:
https://www.youtube.com/watch?v=A9SjAyZ_JSc
https://learn.microsoft.com/en-us/fabric/data-engineering/setup-vs-code-extension
https://learn.microsoft.com/en-us/fabric/data-engineering/author-sjd-with-vs-code
And one error I've got and was able to fix is the error:
[ERROR] 2024-07-23 16:13:06.614 [Thread-3] SparkContext: Error initializing SparkContext.org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:59920By adding the following config to the sparkconf.py.
conf.set("spark.driver.host", "localhost")And I'm wondering, is this a known error and should this setting been added to the sparkconfig.py file?
And now I'm getting a new error that i'm not sure how to fix:
Do you know how to fix this error?
Two other observation are:
print("spark.synapse.pool.name : " + spark_context.getConf().get("spark.synapse.pool.name"))I'm getting the following error:
Is there a way to fix this warning?
Keep up the good work. I'm hoping to develop a structured and modular spark job definition, instead of the multitude of notebooks that were using right now.
This is the file that I'm running: createTablefromCSV.py.txt
And the CSV that I'm referencing in the file: dimension_customer.csv
Kind regards
Martijn
Beta Was this translation helpful? Give feedback.
All reactions