Condominium Cluster Computing
(Available for LBNL researchers)

Overview

In recognition of the increasing importance of research computing across many scientific disciplines, LBNL has made a significant investment in developing the LBNL Condo Cluster program, as a way to grow and sustain midrange computing for Berkeley Lab.  The Condo program is intended to provide Berkeley Lab researchers with state-of-the-art, professionally-administered computing systems and ancillary infrastructure, with the intent of improving competitiveness on grants, and achieving economies of scale with centralized computing systems and data center facilities.

The model for sustaining the Condo program is premised on faculty and principal investigators using equipment purchase funds from their grants or other available funds to purchase compute nodes (individual servers) which are then added to the Lab's Lawrencium compute cluster. This allows PI-owned nodes to take advantage of the high speed Infiniband interconnect and high performance Lustre parallel filesystem storage associated with Lawrencium. Operating costs for managing and housing PI-owned compute nodes are waived in exchange for letting other users make use of any idle compute cycles on the PI-owned nodes. These PI owners have priority access to computing resources equivalent to those purchased with their funds, but now can access more nodes for their research if needed. This provides the PI with much greater flexibility as compared to owning a standalone cluster.

This program is intended for PIs that would otherwise purchase a small (4 nodes) to medium scale (72 nodes) standalone Linux cluster. Projects with larger compute needs or many users or groups should consider setting up a dedicated cluster so that they can better prioritize shared access between their users.

Program Details

Compute node equipment is purchased and maintained based on a 4-year lifecycle at which point the PI owning the nodes will be notified that the nodes will have to be upgraded during year 5. If the hardware is not upgraded by the end of 5 years, the PI may donate the equipment to Condo or take possession of the equipment (removal of the equipment from LC3 and transfer to another location is at the PI's expense); nodes left in the cluster after five years may be removed and disposed of at the discretion of the HPCS program manager

All Lawrencium and condo users have a 10GB home directory on the Lab's shared HPC infrastructure and are charged $25/mo. for account maintenance which includes backups of their home directory. Users or projects needing more space for persistent data can purchase storage shelves that can be hosted by the Lab's HPC infrastructure. Storage shelves are purchased and maintained on a 5-yr lifecycle after which the PI must renew the storage purchase at the then-prevailing price or remove the data within 3 months.

Once a PI has decided to participate, the PI or his designated person works with the HPC Services manager and operations team to procure the desired number of compute nodes and storage. Generally, it takes about three months from start to finish. In the interim, a test condo queue with a small allocation will be setup for the PI's users in anticipation of the new equipment. Users may submit jobs to the general Lawrencium queues on the cluster, but use will incur the cpu usage fees of $0.01 per service unit. Jobs are subject to general queue limitations and guaranteed access to contributed cores is not provided until purchased nodes are provisioned.

Recommended Equipment

Compute node with the following specifications:

General Computing Node (Current Condo node configuration)
ProcessorsDual-socket, 20-core, 2.1GHz Intel Cascade Lake Xeon 6230 processors (40 cores/node)
Memory192GB (12 X 16GB) 2933Mhz DDR4 RDIMMs
Interconnect100Gb/s Mellanox ConnectX5 EDR Infiniband interconnect
Hard Drive3 x 4TB 7.2K RPM SATA HDD (Local swap and log files)
Warranty5 yr
General Computing Node - Coming Soon in Fall 2021
ProcessorsDual-socket, 28-core, 2.0GHz Intel Ice Lake Xeon 6330 processors (56 cores/node)
Memory256GB (16 X 16GB) 3200Mhz DDR4 RDIMMs
Interconnect100Gb/s Mellanox ConnectX-6 HDR-100 Infiniband interconnect
Hard Drive960GB SSD (Local swap and log files)
Warranty5 yr

GPU Computing Node
Processors Single-socket, 16-core, 3.0GHz AMD 7302P processor (16 cores/node)
Memory 512GB (8 X 64GB) 3200Mhz DDR4 RDIMMs
Interconnect 100Gb/s Mellanox ConnectX6 HDR Infiniband interconnect
GPU 4 ea. Nvidia Tesla A40 GPU accelerator boards
Hard Drive 240GB SSD (Local swap and log files) and 7.6TB NVMe
Warranty 3 yrs


Condo Storage: PIs can purchase an increment of disks consisting of 8x14TB Nearline SAS 7200RPM disks to be added to our Hitachi HNAS G370 storage subsystem. This will provide 84TB usable persistent storage.

Prospective condo owners should contact HPC Services Manager Gary Jung prior to purchasing any equipment to insure compatibility.

Lawrencium Condo Owners
David Prendergast,  Molecular Foundry
Jeff Neaton, Molecular Foundry
Quanlin Zhou, Earth and Environmental Sciences Area
Jens Birkholzer, Energy Geosciences Sciences Area
Curtis Oldenburg, Earth and Environmental Sciences Area
Barry Freifeld, Earth and Environmental Sciences Area
Peter Lau, Earth and Environmental Sciences Area
David Romps, Earth and Environmental Sciences Area
Kristin Persson, Energy Technologies Area
Anubhav Jain, Energy Technologies Area
Jeff Porter, Nuclear Sciences Division
Gerbrand Ceder, Material Sciences Division
Martin Head-Gordon, Chemical Sciences Division
Joel Moore, Material Sciences Division
Mark Asta, Material Sciences Division
Phillip Geissler, Chemical Sciences Division
Teresa Head-Gordon, Chemical Sciences and Physical Bioscience Divisions
Erik Neuscamman, Chemical Sciences
Kranthi Mandadapu, Chemical Sciences Division
Robert Lucchese, Chemical Sciences Division
William McCurdy, Chemical Sciences Division
Kjiersten Fagnan, Joint Genome Institute