Epicareer Might not Working Properly
Learn More
E

Spark Developer / Engineer

Salary undisclosed

Checking job availability...

Original
Simplified

Job Title: Spark Developer / Engineer (2 positions)

Location: US Remote, work during PST time zone

Duration: 6-12 Months

Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.

Key Responsibilities:

Understanding the Existing Scalding Codebase

o Analyze the current Scalding-based data pipelines.

o Document existing business logic and transformations.

Migrating the Logic to Spark

o Convert existing Scalding jobs into Spark (PySpark/Scala) while ensuring optimized performance.

o Refactor data transformations and aggregations in Spark.

o Optimize Spark jobs for efficiency and scalability.

Ensuring Data Parity & Validation

o Develop data parity tests to compare outputs between Scalding and Spark implementations.

o Identify and resolve any discrepancies between the two versions.

o Work with stakeholders to validate correctness.

Writing Unit Tests & Improving Code Quality

o Implement robust unit and integration tests for Spark jobs.

o Ensure code meets engineering best practices (modular, reusable, and well-documented).

Required Qualifications:

  • Experience in big data processing with Apache Spark (PySpark or Scala).
  • Strong experience with data migration from legacy systems to Spark.
  • Proficiency in Scalding and MapReduce frameworks.
  • Experience with Hadoop, Hive, and distributed data processing.
  • Hands-on experience in writing unit tests for Spark pipelines.
  • Strong SQL and data validation experience.
  • Proficiency in Python, Scala
  • Knowledge of CI/CD pipelines for data jobs.
  • Familiarity with Apache Airflow orchestration tool.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job

Job Title: Spark Developer / Engineer (2 positions)

Location: US Remote, work during PST time zone

Duration: 6-12 Months

Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.

Key Responsibilities:

Understanding the Existing Scalding Codebase

o Analyze the current Scalding-based data pipelines.

o Document existing business logic and transformations.

Migrating the Logic to Spark

o Convert existing Scalding jobs into Spark (PySpark/Scala) while ensuring optimized performance.

o Refactor data transformations and aggregations in Spark.

o Optimize Spark jobs for efficiency and scalability.

Ensuring Data Parity & Validation

o Develop data parity tests to compare outputs between Scalding and Spark implementations.

o Identify and resolve any discrepancies between the two versions.

o Work with stakeholders to validate correctness.

Writing Unit Tests & Improving Code Quality

o Implement robust unit and integration tests for Spark jobs.

o Ensure code meets engineering best practices (modular, reusable, and well-documented).

Required Qualifications:

  • Experience in big data processing with Apache Spark (PySpark or Scala).
  • Strong experience with data migration from legacy systems to Spark.
  • Proficiency in Scalding and MapReduce frameworks.
  • Experience with Hadoop, Hive, and distributed data processing.
  • Hands-on experience in writing unit tests for Spark pipelines.
  • Strong SQL and data validation experience.
  • Proficiency in Python, Scala
  • Knowledge of CI/CD pipelines for data jobs.
  • Familiarity with Apache Airflow orchestration tool.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job