Epicareer Might not Working Properly
Learn More
A

Senior Big Data & Databricks Engineer - PySpark, Hadoop & Real-Time Analytics

Salary undisclosed

Apply on


Original
Simplified

Job Description

Job Description

Who We Are

Artmac Soft is a technology consulting and service-oriented IT company dedicated to providing innovative technology solutions and services to customers.

Job Description

Job Title : Senior Big Data & Databricks Engineer - PySpark, Hadoop & Real-Time Analytics

Job Type : C2C

Experience : 10-12 years

Location : Atlanta, New York

Responsibilities :

  • Proven experience as a Big Data Engineer with expertise in the Hadoop ecosystem and real-time analytics.
  • Strong proficiency in Spark (PySpark/Scala), Hive, and related big data technologies.
  • Experience in building and deploying data pipelines and stream processing applications.
  • Familiarity with job scheduling and optimization techniques within the Hadoop ecosystem.
  • Solid understanding of Unix/Linux environments, including Shell scripting.
  • Experience with cloud technologies, specifically Azure, is preferred.
  • Excellent problem-solving skills and the ability to work independently and collaboratively in a fast-paced environment.
  • Experience with MongoDB or similar NoSQL databases.
  • Knowledge of data warehousing concepts and methodologies.
  • Familiarity with machine learning frameworks and applications.
  • Collaborate with business stakeholders to gather and understand requirements, translating them into technical specifications for data solutions.
  • Design, develop, and implement complex data pipelines utilizing technologies such as PySpark, Scala Spark, Hive, Hadoop CLI, MapReduce, Storm, Kafka, and Lambda Architecture.
  • Create and submit Spark jobs while ensuring high-performance tuning and scalability of data processes.
  • Work with real-time stream processing technologies, including Spark Structured Streaming and Kafka, to deliver timely insights.
  • Leverage expertise in Python/Spark and their related libraries and frameworks to optimize data processing tasks.
  • Manage job scheduling challenges in Hadoop, ensuring reliability and efficiency in data workflows.
  • Develop and execute unit and integration testing for data pipelines, handling large data volumes to derive actionable insights.
  • Optimize code for efficiency to meet stipulated SLAs, focusing on performance and resource management.
  • Demonstrate strong Unix/Linux expertise, comfortable with the Linux operating system and Shell Scripting.
  • Utilize Azure cache to enhance performance and speed of data processing tasks.
    Qualification:
    • Bachelor's degree in Computer Science, Information Security, or a related field.
      Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
      Report this job