Load sharing Facility- IBM
Position: Load sharing facility IBM Location: San Jose, CA- Remote Contract: W2 Key Job Responsibilities
- Cluster Management: Install, configure, and maintain IBM Spectrum LSF clusters to optimize resource utilization.
- Workload Optimization: Manage job queues, policy-driven scheduling, and workload balancing across server hosts.
- Troubleshooting: Monitor system performance (LIM, MBD, SBD daemons) and resolve issues related to job submission, execution, and host availability.
- Automation & Scripting: Develop tools (Python, shell scripts) to streamline cluster management and improve efficiency.
- License Management: Optimize software license configuration to ensure efficient EDA tool utilization.
- Collaboration: Work with engineering, DevOps, and data science teams to align HPC infrastructure with business needs.
Required Skills and Qualifications
- Experience: Generally 4–12+ years in IT architecture, system engineering, or HPC environments.
- Technical Knowledge: Deep understanding of IBM Spectrum LSF, job scheduling, and workload management.
- OS Proficiency: Strong Linux/Unix systems administration skills.
- Automation Tools: Experience with scripting (Python, shell) and automation tools like Ansible or Terraform.
- Education: Bachelor’s or Master’s degree in Computer Science or Engineering
Apply tot his job Apply To this Job