Batch Accounting Visualisation

Hadoop and Spark

Get started

Monitoring and Debugging

Kerberos

  • When using Hadoop/yarn you need to have a Kerberos TGT. If this is to be executed from a script, you can use a keytab file like this:
    • Run: cern-get-keytab --keytab sparktest.keytab --login malandes --user
    • Then get the Kerberos TGT with: kinit -kt /path/to/sparktest.keytab malandes
  • Then, in the script, the following lines must be included:
export KRB5CCNAME=FILE:$XDG_RUNTIME_DIR/krb5cc
kinit -kt /afs/cern.ch/user/m/malandes/spark/sparktest.keytab malandes

Running on the Hadoop cluster

  • From the client node:
  • To launch the spark job:
#!/bin/bash
source /cvmfs/sft.cern.ch/lcg/views/LCG_105a_swan/x86_64-centos7-gcc11-opt/setup.sh
source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-swan-setconf.sh analytix 3.3 spark3

export KRB5CCNAME=FILE:$XDG_RUNTIME_DIR/krb5cc
kinit -kt /afs/cern.ch/user/m/malandes/spark/sparktest.keytab malandes

spark_cmd="spark-submit \
--master yarn \
--deploy-mode client \
--keytab /afs/cern.ch/user/m/malandes/spark/sparktest.keytab \
--principal malandes@CERN.CH \
--packages org.apache.hadoop:hadoop-aws:3.3.2 \
--conf spark.yarn.appMasterEnv.LD_LIBRARY_PATH=$LD_LIBRARY_PATH \
--conf spark.yarn.appMasterEnv.PYTHONPATH=$PYTHONPATH \
--conf spark.executor.memory=16g \
--conf spark.executor.instances=24 \
--conf spark.executor.cores=4 \
--conf spark.driver.memory=32g \
--conf spark.ui.showConsoleProgress=false \
--conf spark.hadoop.fs.s3a.path.style.access=true \
--conf spark.hadoop.fs.s3a.fast.upload=true \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
script-name.py"

eval $spark_cmd

Connecting HDFS in Power BI

  • Example of URL to load a file stored in HDFS: https://ithdp6013.cern.ch:50070/webdfs/v1/user/malandes/hpc-2023-test.csv/part-00000-812eef77-4d73-46c6-8993-ec82c70584b9-c000.csv

-- MariaALANDESPRADILLO - 2024-02-15

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2024-05-14 - MariaALANDESPRADILLO
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback