The #1 Site for IT and Engineering Jobs - search all IT and Engineering  jobs.
Mfz6r76k64ryng4hsmb

NOC Engineer

Job Description

Job Description

We are looking for highly motivated System Administrator/DevOps engineers to design, develop and implement a global, dynamic, state-of-the-art Service Reliability Operations Center (SROC) to provide extraordinary levels of support for our Cloud products and services. As a key member of the SROC team, you will partner with other key members of our organization including Site Reliability Engineering, Security Operations Center, DevOps teams, and other datacenter operations partners to help make our services capable of providing near 100% availability. On the rare occasion that an incident occurs, you will be our front line to decrease the frequency and duration of any issue. Working in partnership with the development community the SROC team will develop where monitors, alarms and alerts should be placed to help make the service more reliable and improve our customer experience. Additionally, you will be very involved in selecting the technologies that we will use in the SROC to help monitor, run and measure the effectiveness of the environment.

What you will be doing: * The team will provide their services 24/7 with a follow-the-sun environment which will span continents. * The heart of the SROC service will be monitoring and running a growing production compute and storage environment. * Every SROC team member will utilize alerts and alarms to help prevent issues and incidents when possible. You may also work with the developer community to develop and execute predictive support or diagnostic routines. * You will utilize systems administration tasks, network administration tasks, security incident monitoring to drive your actions. * SROC Team members will work with developers to learn how the service works, then translate that understanding into runbooks which the entire team will use. As new features and functionality are added, you will also update and evolve the runbooks as needed. * You will help discover incidents and issues, including initiating the incident management procedure. * You will bring in subject matter authorities or service owners as needed to resolve issues. Feedback will help us continually improve our service. * Your interpersonal skills will help keep the team engaged through resolution and ensure our client's believe we value their time and effort. * You may perform other tasks that will help us provide extraordinary service levels for our customers.

What we need to see:

* Minimum of 3 year's experience administering open system servers in a Production environment.

* At least 2 year's experience working in demanding Internet, Cloud, or Telecommunications environments in a Systems Administration, DevOps, SRE, or NOC role.

* Expertise using monitoring tools and problem ticketing systems.

* Strong problem-solving, analytical, and troubleshooting abilities.

* Strong server administration experience. Shell scripting, automation, DNS, DHCP, storage concepts, basic networking, IP Tables, etc. RHCE or equivalent level of knowledge.

* Experience scripting in Python preferred, but not required.

* Experience running virtual machines under open source or commercial hypervisors.

* Experience operating services running on public or private clouds.

* Knowledge and understanding of application containers and container orchestration systems. * Basic understanding of Git.

* Experience analyzing system and network performance using monitoring alerts, data, and graphs.

* Demonstrate ability to master and maintain complicated environments.

we offer highly competitive salaries and a comprehensive benefits package. We have some of the most forward-thinking and talented people in the world working for us and, due to unprecedented growth, our world class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

Job Requirements

 

Job Snapshot

Location US-TX-Austin, TX
Employment Type Full-Time
Pay Type Year
Pay Rate N/A
Store Type IT & Technical
Apply

Company Overview

AIC (part of ACS Group)

Analysts International Corporation (AIC) is an IT services firm fully dedicated to the success and satisfaction of its customers. From IT staffing to project-based solutions, AIC provides a broad range of services designed to help businesses and government agencies drive value, control costs and deliver on the promise of a more efficient and productive enterprise. Learn More

Contact Information

US-TX-Austin, TX
AIC
Icon-social-facebookIcon-social-linkedinIcon-social-twitter
Snapshot
AIC (part of ACS Group)
Company:
US-TX-Austin, TX
Location:
Full-Time
Employment Type:
Year
Pay Type:
N/A
Pay Rate:
IT & Technical
Store Type:

Job Description

Job Description

We are looking for highly motivated System Administrator/DevOps engineers to design, develop and implement a global, dynamic, state-of-the-art Service Reliability Operations Center (SROC) to provide extraordinary levels of support for our Cloud products and services. As a key member of the SROC team, you will partner with other key members of our organization including Site Reliability Engineering, Security Operations Center, DevOps teams, and other datacenter operations partners to help make our services capable of providing near 100% availability. On the rare occasion that an incident occurs, you will be our front line to decrease the frequency and duration of any issue. Working in partnership with the development community the SROC team will develop where monitors, alarms and alerts should be placed to help make the service more reliable and improve our customer experience. Additionally, you will be very involved in selecting the technologies that we will use in the SROC to help monitor, run and measure the effectiveness of the environment.

What you will be doing: * The team will provide their services 24/7 with a follow-the-sun environment which will span continents. * The heart of the SROC service will be monitoring and running a growing production compute and storage environment. * Every SROC team member will utilize alerts and alarms to help prevent issues and incidents when possible. You may also work with the developer community to develop and execute predictive support or diagnostic routines. * You will utilize systems administration tasks, network administration tasks, security incident monitoring to drive your actions. * SROC Team members will work with developers to learn how the service works, then translate that understanding into runbooks which the entire team will use. As new features and functionality are added, you will also update and evolve the runbooks as needed. * You will help discover incidents and issues, including initiating the incident management procedure. * You will bring in subject matter authorities or service owners as needed to resolve issues. Feedback will help us continually improve our service. * Your interpersonal skills will help keep the team engaged through resolution and ensure our client's believe we value their time and effort. * You may perform other tasks that will help us provide extraordinary service levels for our customers.

What we need to see:

* Minimum of 3 year's experience administering open system servers in a Production environment.

* At least 2 year's experience working in demanding Internet, Cloud, or Telecommunications environments in a Systems Administration, DevOps, SRE, or NOC role.

* Expertise using monitoring tools and problem ticketing systems.

* Strong problem-solving, analytical, and troubleshooting abilities.

* Strong server administration experience. Shell scripting, automation, DNS, DHCP, storage concepts, basic networking, IP Tables, etc. RHCE or equivalent level of knowledge.

* Experience scripting in Python preferred, but not required.

* Experience running virtual machines under open source or commercial hypervisors.

* Experience operating services running on public or private clouds.

* Knowledge and understanding of application containers and container orchestration systems. * Basic understanding of Git.

* Experience analyzing system and network performance using monitoring alerts, data, and graphs.

* Demonstrate ability to master and maintain complicated environments.

we offer highly competitive salaries and a comprehensive benefits package. We have some of the most forward-thinking and talented people in the world working for us and, due to unprecedented growth, our world class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

Job Requirements

 
Sologig Advice

For your privacy and protection, when applying to a job online: Never give your social security number to a prospective employer, provide credit card or bank account information, or perform any sort of monetary transaction.Learn More

By applying to a job using sologig.com you are agreeing to comply with and be subject to the workinretail.com Terms and Conditions for use of our website. To use our website, you must agree with theTerms & Conditionsand both meet and comply with their provisions.
NOC Engineer Apply now