HPC Linux Systems Administrator 

Organization

:IC-Information Technology

Description

 

Berkeley Lab’s Information Technology (IT) Division is looking for a versatile HPC Linux Systems Administrator to provide computing support to the Berkeley Lab research community.  The Scientific Computing Group (SCG) within the IT Division manages the Lab’s High Performance Computing infrastructure and provides state of the art Linux solutions in support of the science at Berkeley Lab. We help to enable some of the most advanced fundamental research in the world by providing the computing tools, networks, and expertise to enable pioneering science.

In this role, you will participate in building, integrating and supporting Linux-based resources and end-users to meet the computing needs for various scientific disciplines.  In addition, you may also support large high performance computing cluster systems depending on the individual's experience, aptitude and skillset.  Under the supervision of the Group Lead or senior team members, prospective applications should exhibit a passion for learning, the ability to integrate new computing technologies, and a deep desire to support scientific research. This position will be filled at a level 2 or 3, dependent on experience.

Specific Responsibilities:

  • Provide Linux systems administration and user support for LBNL scientific research groups.  

  • Provide Linux system and HPC cluster maintenance and installations, operating system upgrades, system security hardening and intrusion detection, storage and file system management, system hardware and peripheral management, security configuration, customization of user group working environment, troubleshooting, network monitoring, and crash recovery.

  • Assist users with program compilation, commercial and public domain software installation, and use of Linux tools.

  • Configure, administer, and troubleshoot desktop, server and storage infrastructures as well as racking, installing, and maintaining systems in a data center.

  • Plan, organize, prioritize and complete assigned tasks and projects in a timely manner.

  • Frequently and clearly communicate task or project status to customers to either set or negotiate expectations.

  • Market IT Division services to the scientific community by providing excellent customer service coupled with competent technical support skill.

  • Participate in develop system administration, security, and network policies, documentation, and tools oriented towards efficient systems management.

In addition to the above, Level 3 - Specific Responsibilities:

  • Participate in building, integrating and supporting Linux-based resources and end-users to meet the computing needs for various scientific disciplines.  

  • Provide cluster support to LBNL and UC researchers at remote sites including travel to remote sites if necessary, initial installation, integration, and the on-going maintenance of Linux High Performance Computing cluster systems.

  • Lead SCG technical efforts in one or more areas of HPC technologies such as job schedulers, high performance interconnects, parallel filesystems, cybersecurity, cluster management, VM infrastructure, networking, performance tuning, support of scientific applications, or data center planning.

  • Lead group projects, of small to medium size and complexity, to implement and deploy new computing technologies and associated services to the research community.

Required Qualifications:

  • Bachelor’s degree or equivalent experience and a minimum of 5 years of full time professional Linux system administration experience in a large distributed computing environment. Experience providing systems and end-user support for multiple scientific or computational research groups.

  • Expert-level experience with Red Hat Enterprise Linux (including derivatives such as CentOS and Scientific Linux), Debian, Ubuntu and use of large scale system administration tools such as Kickstart, CFEngine, Puppet, or in-house developed systems management tools.

  • Ability to support of common services such as NFS, LDAP, NIS, CIFS, MySQL, Apache.

  • Moderate knowledge of Linux internals, TCP/IP networking, software programming, and cybersecurity concepts.

  • Must demonstrate technical understanding of Linux internals including the boot process, kernel versions, and the differences between major Linux distributions.

  • Experience with building, patching, and modifying Linux RPMs is required.

  • Ability to quickly troubleshoot computer and storage hardware problems such as RAID devices, and be familiar with procedures to expedite or coordinate vendor service and bring resolution to outstanding problems.

  • Must be able to demonstrate programming proficiency in a procedural language such as C, C++, Java, and/or Fortran; and scripting languages such as Perl, Python.

  • Must have experience with popular compilers (e.g. GCC, Intel), program debugging tools, use of Makefiles, use of software repositories such as GitHub or Subversion.

  • Experience with implementing solutions based on Virtual Machines (VM) technologies such as KVM, VMWare, OpenStack etc. as well as container technologies such as Docker and Singularity.

  • Excellent interpersonal, communications and customer service skills and exhibit tact and good judgement.

  • Must be able to work with multiple end-user groups where each group may have different needs and requirements.

  • Ability to plan, organize, prioritize, and complete assigned tasks and projects with general supervision while providing timely updates on work progress to end-users and co-workers.

  • Ability to climb stairs, ladders, scaffolds; work at heights on above rack cabling; work in confined spaces, under florescent lights; ability to bend, stoop, kneel, crawl; manual dexterity in both hands; able to lift 60 lbs. to chest height; distinguish colors.

In addition to the above, Level 3 - Required Qualifications:

  • Bachelor’s degree or equivalent experience and a minimum of 8 years of full time professional Linux system administration experience in a large distributed computing environment including 2 years experience providing support for Linux HPC clusters used for scientific research.

  • In-depth expertise in two or more areas of HPC technologies such as Linux operating systems, job schedulers, high performance interconnects, parallel filesystems, cybersecurity, cluster management, VM infrastructure, networking, performance tuning, support of scientific applications, or data center planning.

  • Ability to plan, organize and successfully implement group projects for deploying new technologies and services.

Additional Desired Qualifications (Both Levels):

  • Experience supporting HPC systems and end-users.

  • Expertise with HPC Linux clustering technology (Job schedulers, MPI, Infiniband, parallel filesystems, parallel programming).

  • Experience with software engineering or development.

  • Previous experience supporting research at a National Lab or academic institution.

The posting shall remain open until the position is filled, however for full consideration, please apply by close of business on October 12, 2017.

Notes:

  • This is a full time career appointment.

  • Classification will depend upon the applicant's level of skills, knowledge, and abilities.

  • Full-time, M-F, exempt (monthly paid) from overtime pay.

  • Salary is commensurate with experience.

  • This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.

  • Work will be primarily performed at: Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA.

Learn About Us:

Berkeley Lab (LBNL) addresses the world’s most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab’s scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the U.S. Department of Energy’s Office of Science.


Working at Berkeley Lab has many rewards including a competitive compensation program, excellent health and welfare programs, a retirement program that is second to none, and outstanding development opportunities.  To view information about the many rewards that are offered at Berkeley Lab- Click Here.


Equal Employment Opportunity: Berkeley Lab is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status. Berkeley Lab is in compliance with the Pay Transparency Nondiscrimination Provision under 41 CFR 60-1.4.  Click here to view the poster and supplement: "Equal Employment Opportunity is the Law."