Senior Data Architect & Engineering Lead (Remote from anywhere in US)
We are seeking a highly skilled and experienced professional (Senior Data Architect & Engineering Lead) to work remotely from anywhere in the US on our W2, leading the design, implementation, and management of end-to-end enterprise-grade data solutions for our Huntsville, Texas client. This role involves expertise in building and optimizing data warehouses, data lakes, and Lakehouse platforms, with a strong emphasis on data engineering, data science, and machine learning. You will work closely with cross-functional teams to create scalable and robust architecture that supports advanced analytics and machine learning use cases while adhering to industry standards and best practices.
Responsibilities Include:
- Architect, design, and manage the entire data lifecycle from data ingestion, transformation, storage, and processing to advanced analytics and machine learning databases and large-scale processing systems.
- Implement robust data governance frameworks, including metadata management, lineage tracking, security, compliance, and business glossary development.
- Identify, design, and implement internal process improvements, including redesigning infrastructure for greater scalability, optimizing data delivery, and automating manual processes.
- Ensure high data quality and reliability through automated data validation and testing and provide high quality clean, and usable data from data sets of varying states of disorder.
- Develop and enforce architecture standards, patterns, and reference models for large-scale data platforms.
- Architect and implement Lambda and Kappa architectures for real-time and batch data processing workflows along with strong data modeling capabilities.
- Ability to identify and implement the most appropriate data management system and enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.
Job Qualifications
- Ability to define and manage data flows and data pipelines with related scripting knowledge for importing.
- Strong data modeling capabilities.
- Ability to identify and implement the most appropriate data management system.
- Provide high quality clean and usable data from data sets of varying states of disorder.
- Data/Query validation proficiency; and
- Demonstrate a wide range of technical competencies including tools such as SAP, Oracle, Cassandra, MySQL, Redis, PostgreSQL, MongoDB, and Hive.
Requirements
- Proficient in SQL, Python, and big data processing frameworks (e.g., Spark, Flink).
- Strong experience with cloud platforms (AWS, Azure, Google Cloud Platform) and related data services.
- Hands-on experience with data warehousing tools (e.g., Snowflake, Redshift, BigQuery) , Databricks running on multiple cloud platforms (AWS, Azure and Google Cloud Platform) and data lake technologies (e.g., S3, ADLS, HDFS).
- Expertise in containerization and orchestration tools like Docker and Kubernetes.
- Knowledge of MLOps frameworks and tools (e.g., MLflow, Kubeflow, Airflow).
- Experience with real-time streaming architectures (e.g., Kafka, Kinesis).
- Familiarity with Lambda and Kappa architectures for data processing.
- Enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.
Preferred
- Certifications in cloud platforms or data-related technologies.
- Familiarity with graph databases, NoSQL, or time-series databases.
- Knowledge of data privacy regulations (e.g., GDPR, CCPA) and compliance requirements.
- Experience in implementing and managing business glossaries, data governance rules, metadata lineage, and ensuring data quality.
- Highly experienced with AWS cloud platform and Databricks Lakehouse.
- Education: Bachelor s or Master s degree in Computer Science, Data Science, Engineering, or a related field.
We are seeking a highly skilled and experienced professional (Senior Data Architect & Engineering Lead) to work remotely from anywhere in the US on our W2, leading the design, implementation, and management of end-to-end enterprise-grade data solutions for our Huntsville, Texas client. This role involves expertise in building and optimizing data warehouses, data lakes, and Lakehouse platforms, with a strong emphasis on data engineering, data science, and machine learning. You will work closely with cross-functional teams to create scalable and robust architecture that supports advanced analytics and machine learning use cases while adhering to industry standards and best practices.
Responsibilities Include:
- Architect, design, and manage the entire data lifecycle from data ingestion, transformation, storage, and processing to advanced analytics and machine learning databases and large-scale processing systems.
- Implement robust data governance frameworks, including metadata management, lineage tracking, security, compliance, and business glossary development.
- Identify, design, and implement internal process improvements, including redesigning infrastructure for greater scalability, optimizing data delivery, and automating manual processes.
- Ensure high data quality and reliability through automated data validation and testing and provide high quality clean, and usable data from data sets of varying states of disorder.
- Develop and enforce architecture standards, patterns, and reference models for large-scale data platforms.
- Architect and implement Lambda and Kappa architectures for real-time and batch data processing workflows along with strong data modeling capabilities.
- Ability to identify and implement the most appropriate data management system and enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.
Job Qualifications
- Ability to define and manage data flows and data pipelines with related scripting knowledge for importing.
- Strong data modeling capabilities.
- Ability to identify and implement the most appropriate data management system.
- Provide high quality clean and usable data from data sets of varying states of disorder.
- Data/Query validation proficiency; and
- Demonstrate a wide range of technical competencies including tools such as SAP, Oracle, Cassandra, MySQL, Redis, PostgreSQL, MongoDB, and Hive.
Requirements
- Proficient in SQL, Python, and big data processing frameworks (e.g., Spark, Flink).
- Strong experience with cloud platforms (AWS, Azure, Google Cloud Platform) and related data services.
- Hands-on experience with data warehousing tools (e.g., Snowflake, Redshift, BigQuery) , Databricks running on multiple cloud platforms (AWS, Azure and Google Cloud Platform) and data lake technologies (e.g., S3, ADLS, HDFS).
- Expertise in containerization and orchestration tools like Docker and Kubernetes.
- Knowledge of MLOps frameworks and tools (e.g., MLflow, Kubeflow, Airflow).
- Experience with real-time streaming architectures (e.g., Kafka, Kinesis).
- Familiarity with Lambda and Kappa architectures for data processing.
- Enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.
Preferred
- Certifications in cloud platforms or data-related technologies.
- Familiarity with graph databases, NoSQL, or time-series databases.
- Knowledge of data privacy regulations (e.g., GDPR, CCPA) and compliance requirements.
- Experience in implementing and managing business glossaries, data governance rules, metadata lineage, and ensuring data quality.
- Highly experienced with AWS cloud platform and Databricks Lakehouse.
- Education: Bachelor s or Master s degree in Computer Science, Data Science, Engineering, or a related field.