Senior Python Data Engineer

Full Time, onsite
Clairvoyant AI, Inc.
Remote, United States of America

Salary undisclosed

Apply on

Dice

Original

Simplified

Hi,

We have urgent requirements for our direct client, please go through the below Job Description. If you are interested please send me your updated word format resume to and reach me @ .

Job Title: Senior Python Data Engineer

Location: Remote

Duration: Full Time

Job Description

We are seeking a highly skilled Senior Python Data Engineer to join our dynamic team. The ideal candidate will possess a strong programming background in advanced Python, with a focus on data engineering frameworks and libraries. You will be responsible for designing, building, and maintaining robust data ingestion pipelines, ensuring seamless integration of data from various sources.

Key Responsibilities

Data Pipeline Development: Design, implement, and optimize data ingestion pipelines using advanced Python (NumPy, Pandas, Dask) to ensure efficient data flow and processing.
Data Storage Management: Work extensively with Parquet files for efficient data storage and retrieval, including partitioned Parquet files, ensuring optimal compression and schema evolution.
Collaboration: Work closely with geographically distributed teams and clients to gather requirements, provide technical solutions, and ensure data quality.
Team Leadership: Lead a team of data engineers by assigning tasks, reviewing code, and mentoring junior team members.
Design Participation: Engage in architectural discussions and design sessions, contributing to the overall data pipeline architecture.
REST API Development: Build and maintain REST APIs, ensuring API security through key validation, authorization, and authentication mechanisms.
Data Manipulation: Set up and manipulate Python data structures such as lists, strings, dictionaries, and tuples. Use strong expertise in Pandas and NumPy for data manipulation.
Data Exploration & Visualization: Conduct data exploration, visualization, and comparison of metrics for large CSV and Parquet files.
Debugging and Optimization: Troubleshoot complex data pipeline issues, utilizing logging and monitoring tools (like ELK Stack, Grafana) to optimize performance for scalability and efficiency.
Data Storage Solutions: Design and implement data storage solutions using SQL (PostgreSQL, MySQL) and NoSQL databases (MongoDB, Cassandra).
Data Transformation: Use advanced techniques such as joins, merges, pivot tables, grouping, and window functions in Python or SQL.
Documentation: Maintain thorough documentation of data pipelines, architectures, and processes for future reference and onboarding.

Required Qualifications (Must-Have)

Programming Skills: Advanced proficiency in Python, particularly with libraries such as NumPy and Pandas for data manipulation and analysis.
Parquet Experience: Strong experience with Parquet files, including reading, writing, and optimizing for performance and storage efficiency.
Data Structure Manipulation: Ability to set up and manipulate Python data structures such as lists, strings, dictionaries, and tuples.
Data Exploration: Familiarity with data exploration, visualization, and comparing metrics of large CSV and Parquet files, including partitioned Parquet files.
Advanced Data Techniques: Strong skills in joins, merges, pivot tables, grouping, and window functions in Python or SQL.
Version Control: Strong understanding of GIT, including git push and git clone for collaborative development.
Linux Proficiency: Experience with Linux commands and shell scripting for data operations.
Data Pipeline Experience: Proven experience in building and managing data ingestion pipeline scripts, including batch and real-time processing.
REST API Knowledge: Familiarity with building REST APIs and securing them through API key validation and authentication mechanisms.
Debugging Skills: Demonstrated ability to handle complex data pipeline architecture with excellent debugging skills.
Leadership Experience: Prior experience leading a technical team and mentoring junior engineers.

Preferred Qualifications (Good-to-Have)

Cloud Platform Knowledge: Experience with cloud platforms, preferably AWS (S3, Lambda, Redshift), for data storage and processing.
Workflow Orchestration: Familiarity with Apache Airflow or similar workflow orchestration tools for scheduling and monitoring workflows.
Containerization: Knowledge of containerization technologies (Docker, Kubernetes) for deploying data pipelines in a scalable manner.
Object-Oriented Programming: Good experience with object-oriented programming patterns, multithreading, and multiprocessing.
Spark Applications: Experience developing Spark applications using Python, including familiarity with Apache Spark (Spark SQL, Spark Streaming, DataFrames, RDD, PySpark).
Communication Skills: Excellent verbal and written communication skills, with the ability to convey technical concepts to non-technical stakeholders.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Report this job

Similar Jobs

1d ago

Medicare Advantage Provider Directory Operations and Accuracy Senior Manager - Cigna Healthcare - Remote

GE Vernova