Batch Accounting Visualisation

Hadoop and Spark

Get started

Monitoring and Debugging

Kerberos

When using Hadoop/yarn you need to have a Kerberos TGT. If this is to be executed from a script, you can use a keytab file like this:
- Run: cern-get-keytab --keytab sparktest.keytab --login malandes --user
- Then get the Kerberos TGT with: kinit -kt /path/to/sparktest.keytab malandes
Then, in the script, the following lines must be included:

export KRB5CCNAME=FILE:$XDG_RUNTIME_DIR/krb5cc
kinit -kt /afs/cern.ch/user/m/malandes/spark/sparktest.keytab malandes

Running on the Hadoop cluster

From the client node:
- ssh malandes@it-hadoop-client
- Make sure kerberos token
- source hadoop-setconf.sh analytix
To launch the spark job:

#!/bin/bash
source /cvmfs/sft.cern.ch/lcg/views/LCG_105a_swan/x86_64-centos7-gcc11-opt/setup.sh
source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-swan-setconf.sh analytix 3.3 spark3

export KRB5CCNAME=FILE:$XDG_RUNTIME_DIR/krb5cc
kinit -kt /afs/cern.ch/user/m/malandes/spark/sparktest.keytab malandes

spark_cmd="spark-submit \
--master yarn \
--deploy-mode client \
--keytab /afs/cern.ch/user/m/malandes/spark/sparktest.keytab \
--principal malandes@CERN.CH \
--packages org.apache.hadoop:hadoop-aws:3.3.2 \
--conf spark.yarn.appMasterEnv.LD_LIBRARY_PATH=$LD_LIBRARY_PATH \
--conf spark.yarn.appMasterEnv.PYTHONPATH=$PYTHONPATH \
--conf spark.executor.memory=16g \
--conf spark.executor.instances=24 \
--conf spark.executor.cores=4 \
--conf spark.driver.memory=32g \
--conf spark.ui.showConsoleProgress=false \
--conf spark.hadoop.fs.s3a.path.style.access=true \
--conf spark.hadoop.fs.s3a.fast.upload=true \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
script-name.py"

eval $spark_cmd

Connecting HDFS in Power BI

Example of URL to load a file stored in HDFS: https://ithdp6013.cern.ch:50070/webdfs/v1/user/malandes/hpc-2023-test.csv/part-00000-812eef77-4d73-46c6-8993-ec82c70584b9-c000.csv

-- MariaALANDESPRADILLO - 2024-02-15

Topic revision: r6 - 2024-05-14 - MariaALANDESPRADILLO

Main

Webs

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
Main All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback