Hadoop Support Admin Lead
Hadoop admin support lead are responsible for Hadoop ecosystem & Sentience ecosystem. We monitor and improve the system and suggest improvements for implementation by others. We are involved in incident and change management. We also act as consultants for engineers and product managers when new products and services are getting ready to launch. You will work directly with our Engineering and DevOps teams to support our next generation 'always available' Sentience Platform.
The qualified candidate will be comfortable working in an environment that is both matrixed and with some direct oversight of on and offshore teams in other technical domains. The position reports to the Operations Manager and will be based out of Atlanta (United states)
- Good IT experience and hands on experience in Hadoop HDFS Clusters.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux/AD users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users. •
- Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, and other tools.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Rolling Patch upgrades for Hadoop Cluster without causing any downtime.
- Screen Hadoop cluster job performances and capacity planning
- Monitor Hadoop cluster connectivity and security
- Manage and review Hadoop log files.
- File system management and monitoring.
- HDFS support and maintenance.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Point of Contact for Vendor escalationAble to manage Docker swarm and kubernetes cluster.
- Able to manage and deploy API services as when required in Swarm and Kubernetes Cluster
- Responsible for backup, recovery and Maintenance
YOU MUST HAVE
- Bachelors degree in Engineering or Computer Science
- Practical expertise in managing and leading application reliability practices for Industrial products
- Ability to work across teams to continuously analyze system performance in production, troubleshoot consumer and engineering reported issues, and proactively identify areas in need of optimization
- Previous experience with automation and driving real-time monitoring solutions that provide visibility into cluster health and key performance indicators
- Technical understanding of core Hadoop architect, cloud services, platforms and micro-services.
- Working understanding of IT service management (Incident, Problem, Change and Knowledge management)
- Ability to lead a technical team of support engineers through day to day operations and critical incidents
- Prior experience with agile methodologies, performance engineering and automation tools
- Clear communication skills.
- Working knowledge of Hadoop eco system, Spark, Redhat Openshift platform, PowerShell, python, pig hive and sqoop.
NICE TO HAVE
- Development experience on spark, python and hive •
- Deep understanding of the business landscape and how site reliability influences our products and customers
- Hands -on experience installing, configuring and administering Hadoop cluster