-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
Hi, I'm trying to run the echo-server example but I'm having trouble communicating with the running server. I can start the server, but when I run the client (or even try to retrieve something from the key-value store) I get the same error skein.exceptions.ConnectionError: Unable to connect to application. As an example:
$ kinit
$ skein driver start
$ APPID=$(skein application submit ./spec.yaml)
$ python
>>> import skein
>>> client = skein.Client(log_level="debug")
21/03/24 16:34:54 DEBUG skein.Driver: Starting Skein version 0.8.1
21/03/24 16:34:54 DEBUG skein.Driver: Logging in using ticket cache
21/03/24 16:34:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/24 16:34:55 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/03/24 16:34:56 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:34:56 INFO client.AHSProxy: Connecting to Application History server at epod-master3.vgt.vito.be/192.168.207.58:10200
21/03/24 16:34:56 INFO skein.Driver: Driver started, listening on 45765
21/03/24 16:34:56 DEBUG skein.Driver: Reporting gRPC server port back to the launching process
>>> apps = client.get_applications()
21/03/24 16:35:34 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:35:34 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:35:34 INFO conf.Configuration: resource-types.xml not found
21/03/24 16:35:34 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
>>> app = client.connect(apps[0].id)
>>> app.ui
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 86, in __repr__
return "WebUI<address=%r>" % self.address
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 83, in address
return self._ui_info.address
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/utils.py", line 210, in __get__
res = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/ui.py", line 59, in _ui_info
resp = self._client._call('UiInfo', proto.UIInfoRequest())
File "/home/luciof/miniconda3/envs/echo-server/lib/python3.8/site-packages/skein/core.py", line 279, in _call
raise ConnectionError("Unable to connect to %s" % self._server_name)
skein.exceptions.ConnectionError: Unable to connect to application
- Relevant logs/tracebacks
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:application.master.log
LogLastModifiedTime:Wed Mar 24 16:32:40 +0100 2021
LogLength:2073
LogContents:
21/03/24 16:32:37 INFO skein.ApplicationMaster: Starting Skein version 0.8.1
21/03/24 16:32:37 INFO skein.ApplicationMaster: Running as user luciof
21/03/24 16:32:37 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
21/03/24 16:32:37 INFO skein.ApplicationMaster: Application specification successfully loaded
21/03/24 16:32:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/24 16:32:38 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/03/24 16:32:38 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:32:38 INFO skein.ApplicationMaster: gRPC server started at epod071.vgt.vito.be:43071
21/03/24 16:32:39 INFO skein.ApplicationMaster: WebUI server started at epod071.vgt.vito.be:34293
21/03/24 16:32:39 INFO skein.ApplicationMaster: Registering application with resource manager
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
21/03/24 16:32:39 INFO client.AHSProxy: Connecting to Application History server at epod-master3.vgt.vito.be/192.168.207.58:10200
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
21/03/24 16:32:39 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
21/03/24 16:32:39 INFO skein.ApplicationMaster: Initializing service 'server'.
21/03/24 16:32:39 INFO skein.ApplicationMaster: REQUESTED: server_0
21/03/24 16:32:40 INFO skein.ApplicationMaster: Starting container_e4897_1611572280718_141323_01_000002...
21/03/24 16:32:40 INFO skein.ApplicationMaster: RUNNING: server_0 on container_e4897_1611572280718_141323_01_000002
End of LogType:application.master.log.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
***************************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:937
LogContents:
ls -l:
total 16
-rw-------. 1 luciof hadoop 491 Mar 24 16:32 container_tokens
-rwx------. 1 luciof hadoop 5303 Mar 24 16:32 launch_container.sh
drwxr-s---. 2 luciof hadoop 4096 Mar 24 16:32 tmp
find -L . -maxdepth 5 -ls:
144966289 4 drwxr-s--- 3 luciof hadoop 4096 Mar 24 16:32 .
144966295 4 -r-x------ 1 luciof luciof 1013 Mar 24 16:32 ./.skein.crt
144966299 4 -rw------- 1 luciof hadoop 491 Mar 24 16:32 ./container_tokens
144966298 8 -rwx------ 1 luciof hadoop 5303 Mar 24 16:32 ./launch_container.sh
168559658 4 -r-x------ 1 luciof luciof 1704 Mar 24 16:32 ./.skein.pem
127926301 4 -r-x------ 1 luciof luciof 1407 Mar 24 16:32 ./.skein.proto
144966297 4 drwxr-s--- 2 luciof hadoop 4096 Mar 24 16:32 ./tmp
144966292 7660 -r-x------ 1 luciof luciof 7842343 Mar 24 16:32 ./.skein.jar
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
*******************************************************************************
End of LogType:prelaunch.err.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
******************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:container-localizer-syslog
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:506
LogContents:
2021-03-24 16:32:35,545 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-03-24 16:32:36,471 WARN [ContainerLocalizer Downloader] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
End of LogType:container-localizer-syslog.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
*******************************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
******************************************************************************
Container: container_e4897_1611572280718_141323_01_000001 on epod071.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:launch_container.sh
LogLastModifiedTime:Wed Mar 24 16:32:36 +0100 2021
LogLength:5303
LogContents:
#!/bin/bash
set -o pipefail -e
export PRELAUNCH_OUT="/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/prelaunch.out"
exec >"${PRELAUNCH_OUT}"
export PRELAUNCH_ERR="/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/prelaunch.err"
exec 2>"${PRELAUNCH_ERR}"
echo "Setting up env variables"
export JAVA_HOME=${JAVA_HOME:-"/usr/java/default"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/3.1.4.0-315/hadoop/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/3.1.4.0-315/hadoop-yarn"}
export HADOOP_HOME=${HADOOP_HOME:-"/usr/hdp/3.1.4.0-315/hadoop"}
export PATH=${PATH:-"/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin"}
export HADOOP_TOKEN_FILE_LOCATION="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/container_tokens"
export CONTAINER_ID="container_e4897_1611572280718_141323_01_000001"
export NM_PORT="45454"
export NM_HOST="epod071.vgt.vito.be"
export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323,/data2/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323,/data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323"
export LOCAL_USER_DIRS="/data1/hadoop/yarn/local/usercache/luciof/,/data2/hadoop/yarn/local/usercache/luciof/,/data3/hadoop/yarn/local/usercache/luciof/"
export LOG_DIRS="/data1/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001,/data2/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001,/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001"
export USER="luciof"
export LOGNAME="luciof"
export HOME="/home/"
export PWD="/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001"
export JVM_PID="$$"
export MALLOC_ARENA_MAX="4"
export NM_AUX_SERVICE_spark_shuffle=""
export NM_AUX_SERVICE_timeline_collector=""
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
export NM_AUX_SERVICE_spark2_shuffle=""
export SKEIN_APPLICATION_ID="application_1611572280718_141323"
export LANG="en_US.UTF-8"
export APP_SUBMIT_TIME_ENV="1616599954132"
export TIMELINE_FLOW_NAME_TAG="echoserver"
export TIMELINE_FLOW_VERSION_TAG="1"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1611572280718_141323"
export CLASSPATH="$CLASSPATH:./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*"
export TIMELINE_FLOW_RUN_ID_TAG="1616599954132"
echo "Setting up job resources"
ln -sf "/data2/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/11/.skein.pem" ".skein.pem"
ln -sf "/data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/13/.skein.proto" ".skein.proto"
ln -sf "/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/12/.skein.crt" ".skein.crt"
ln -sf "/data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/10/skein.jar" ".skein.jar"
echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/launch_container.sh"
chmod 640 "/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
ls -l 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/directory.info"
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx128M -Dskein.log.level=INFO -Dskein.log.directory=/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001 com.anaconda.skein.ApplicationMaster hdfs://hacluster/user/luciof/.skein/application_1611572280718_141323 >/data3/hadoop/yarn/log/application_1611572280718_141323/container_e4897_1611572280718_141323_01_000001/application.master.log 2>&1"
End of LogType:launch_container.sh.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000001) and so may not be complete.
************************************************************************************
End of LogType:server.log.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
***************************************************************************
Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Mar 24 16:32:45 +0100 2021
LogLength:882033
LogContents:
ls -l:
total 24
-rw-------. 1 luciof hadoop 427 Mar 24 16:32 container_tokens
lrwxrwxrwx. 1 luciof hadoop 115 Mar 24 16:32 environment -> /data3/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/10/environment.tar.gz
-rwx------. 1 luciof hadoop 4791 Mar 24 16:32 launch_container.sh
lrwxrwxrwx. 1 luciof hadoop 106 Mar 24 16:32 server.py -> /data1/hadoop/yarn/local/usercache/luciof/appcache/application_1611572280718_141323/filecache/12/server.py
drwxr-s---. 2 luciof hadoop 4096 Mar 24 16:32 tmp
find -L . -maxdepth 5 -ls:
161091031 4 drwxr-s--- 3 luciof hadoop 4096 Mar 24 16:32 .
161090987 4 -r-x------ 1 luciof luciof 1334 Mar 24 16:32 ./server.py
161091039 8 -rwx------ 1 luciof hadoop 4791 Mar 24 16:32 ./launch_container.sh
42467355 4 -r-x------ 1 luciof luciof 49 Mar 24 16:32 ./.skein.sh
161091032 4 drwxr-s--- 2 luciof hadoop 4096 Mar 24 16:32 ./tmp
42467353 4 -r-x------ 1 luciof luciof 1704 Mar 24 16:32 ./.skein.pem
41033904 4 -r-x------ 1 luciof luciof 1013 Mar 24 16:32 ./.skein.crt
41025562 4 drwx------ 11 luciof luciof 4096 Mar 24 16:32 ./environment
41025565 4 drwx------ 3 luciof luciof 4096 Mar 24 16:32 ./environment/ssl
[rest of the unpacking]
161091040 4 -rw------- 1 luciof hadoop 427 Mar 24 16:32 ./container_tokens
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
*******************************************************************************
End of LogType:prelaunch.err.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
******************************************************************************
Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:container-localizer-syslog
LogLastModifiedTime:Wed Mar 24 16:32:42 +0100 2021
LogLength:506
LogContents:
2021-03-24 16:32:41,690 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-03-24 16:32:42,505 WARN [ContainerLocalizer Downloader] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
End of LogType:container-localizer-syslog.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
*******************************************************************************************
Container: container_e4897_1611572280718_141323_01_000002 on epod076.vgt.vito.be:45454
LogAggregationType: LOCAL
======================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Mar 24 16:32:45 +0100 2021
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e4897_1611572280718_141323_01_000002) and so may not be complete.
******************************************************************************
-
Version information
- Python version: 3.8.8
- Hadoop version: 3.1.1.3.1.4.0-315
- Skein version: 0.8.1
Metadata
Metadata
Assignees
Labels
No labels