ADCoSSeniorWIP   Follow ADCShifts on Twitter

Responsibilities

ADCoS senior shifter is responsible for:

Requirements

  • ADCoS Shifter has to be an ATLAS member.
  • All current ADCoS Shifter are registered in https://e-groups.cern.ch/e-groups/Egroup.do?egroupName=atlas-project-adc-operations-shifts
  • Grid Certificate requirements
    • Every ADCoS Shifter on duty (Trainee, Senior, or Expert) is required to have a valid grid certificate registered to VO atlas at the time of the shift, see WorkBookStartingGrid. When an ADCoS Shifter starts his/her shift without the valid grid certificate registered to VO atlas his/her shift booking will be cancelled and he/she will not get any OTP credit for the shift. Recurring certificate issues may result in discontinuation of possibility to sing up for ADCoS Shifts.
    • Valid grid certificate of the shifter must be in /atlas/team VOMS group, and that addition is done by ADCoS Coordinators at the Step 2 of Trainee Shifter setup procedure. If for some reason your certificate is not yet in /atlas/team VOMS group, ask in advance ADCoS Coordinators to add it. Particularly don't forget to do that when you get a completely new certificate.
    • Check whether you are able to submit ATLAS TEAM GGUS ticket, https://ggus.eu/?mode=ticket_team (you should see TEAM option on you screen). Valid grid certificate (1) in your browser (2) with /atlas/team VOMS group (3) known to GGUS (4) is required for openning a TEAM ticket.
  • OTP requirements
    • It is strictly forbidden to book more than 1 shift within 24 hours. Overcoming this rule may result in discontinuation of possibility to sign up for ADCoS Shifts.
    • ADCoS Shifter does book shifts in her/his name and does shift as the person who booked the shift. It is forbidden to book shift in one persons name on behalf of a different persons. The ADCoS Coordinator can book shifts on behalf of different persons, such a shift will be booked in name of the Shifter.
    • In case of an emergency situation leading into impossibility to take shift ADCoS Shifter immediately notifies ADCoS Coordinator, and the ADCoS Shift Captain of the shift timezone. ADCoS Coordinator then cancels shift booking in OTP. ADCoS Coordinator or ADCoS Shift Captain may announce a Call for shifters to find a replacement Shifter. Contact can be found on Team page
    • Booked shifts can only be cancelled 45 days in advance, after this time interval shifter has to find a replacement

CHECKLIST

  • For ADCOS Shifters page has useful links on one page.
  • In case of shift related problem, contact current ADCoS expert shifter on duty (via VCR; if the expert shifter is not present then via email). If the expert shifter is not responding or in case there is no ADCoS expert on duty, contact ADCoS Coordination. For organization related problems, contact ADCoS Coordination.
  • Before coming to your shift
  • In the beginning of your shift
  • during your shift
    • check DDM failures - open DDM Dashboard: http://dashb-atlas-ddm.cern.ch/ddm2 (please check DDMDashboardHowTo if you're new to the Team). Spot most problematic clouds in DDM dashboard: (begin with those in RED, then YELLOW and then BLUE). Click on the Tier-1 name to get a breakdown for the sites. Chase the site(s) that is causing the low efficiency at the cloud. Correlate those sites with existing reports (tickets and elogs), downtimes, and blacklistings. If the errors persist and the site was not reported, is not in downtime or blacklisted, then shifter should report the issue.
    • Check deletion - open http://bourricot.cern.ch/dq2/deletion/#period=4 and in CLOUDS table click on name each cloud. If a site has more than 100 errors over the last 4 hours, check if the error rate is constant over these 4 hours (click on the name of the cloud and site on the top-left part of the page and look at the error rate plot). Report to ADCoS expert who will check if it is worthwhile to contact the site and fill GGUS ticket if necessary (for more details see https://twiki.cern.ch/twiki/bin/view/Main/ADCoSWIP#Checking_the_deletion_error_rate). Also, check deletion backlog.
    • Check jobs in transferring state for long time. If there is high number of jobs transferring to a site, shifter submits an eLog with all informations (s)he has found
    • Check production failures - open BigPanda error page and job distribution in regions (Region in this context means home cloud of the sites. As many sites participate in multicloud production (see http://bigpanda.cern.ch/sites/), jobs from several clouds can run on given site. This view shows all jobs from all clouds which are running on the given site. In case it is necessary to check jobs running on given site for given cloud, there is cloud view ). First look for sites with a lot of failing jobs. Correlate those sites with existing reports (tickets and elogs), downtimes, and blacklistings. If the errors is caused by the site and persists and the site was not reported, is not in downtime or blacklisted, then shifter should report the issue.
    • Once per shift please check Frontier status. If there is a degradation, the site hosting frontier is not in downtime, and shows no activity in MRTG, shifter should submit an urgent GGUS ticket to the site and cc in atlas-frontier-support@cernNOSPAMPLEASE.ch. Shifter should also checkSquids. If there is a degradation, check Frontier section for more details.
    • Remember to report every action in eLog. (for 'new' entry, click on existing entry first. If you solve an issue, put [SOLVED] in the eLog subject)
  • At the end of your shift
    • Senior shifter is obliged to submit daily shift summary report. Trainee shifters usually do not submit daily report (if the senior shift is not covered and trainee is already experienced, (s)he can submit shift summary).
    • Senior shifter is obliged to submit trainee evaluation report if trainee was present.

Most Common Mistakes by Shifters

  • Opening regular GGUS ticket rather than GGUS Team ticket.
  • Opening duplicate GGUS ticket . Please don't forget to check the list of open tickets before submitting a new GGUS.
  • Reopening the closed GGUS ticket or updating the existing one, rather than opening a new GGUS ticket when the issue/problem is different from the one in the existing GGUS ticket. Please check with the expert shifter if you are not sure if it's a new problem or different manifestation of the one which has already been reported.
  • Submitting a GGUS ticket on the site in downtime . Please always check the AGIS Downtime Calendar before opening a new ticket.
  • Forgetting to write the site name in the subject of the GGUS ticket. It is strongly advised to start the subject with the site name. That will make browsing by ticket subjects (GGUS/ELOG) much easier. On the other hand, if the site name is at the end of a lengthy subject line, it may not show on the summary list of GGUS tickets.
  • NEW Forgetting to add cloud support atlas-adc-cloud-[CLOUD]@cern.ch in CC field of the GGUS ticket, or adding an address atlas-support-cloud-[CLOUD]@cern.ch which GGUS can't process. Please always add the cloud support atlas-adc-cloud-[CLOUD]@cern.ch in GGUS CC.
  • NEW Coming to shift with expired grid certificate, or with a new certificate not in /atlas/team VOMS group, hence having problem opening GGUS Team ticket. Before coming to your shift please verify that you are able to open GGUS Team ticket.
  • Forgetting to put an ELOG entry after opening a new GGUS/Jira ticket. Major status updates, as well as closing the ticket, need an ELOG entry as well.
  • Opening a new ELOG thread on the evolving issue, which already has entry(s) in ELOG. Instead please continue the existing thread.
  • Forgetting to submit an evaluation report of the participant trainee shifter.
  • Using email address from TWiki without editing it to exclude SPAMNOT . Remove the word SPAMNOT, otherwise the email will bounce back.
  • NEW Submitting ticket about mcXX_valid tasks to validation jira. As this twiki clearly states, only tasks starting with valid should be reported in validation jira. mcXX_valid tasks should be reported in ADCSUPPORT jira

TEAM MEMBERS


Major updates:
-- MichalSvatos - 2014

%RESPONSIBLE% MichalSvatos
%REVIEW% Never reviewed

-- MichalSvatos - 30 Sep 2014

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2014-12-04 - MichalSvatos
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback