Apply on
Role:: Sr Data Engineer
Location: Quincy, MA - hybrid
Duration:: Long Term
Job Description:
Key
Responsibilities:
Data Integration & API Development: Integrate cybersecurity data sources
and build data APIs for real-time insights and streamlined access.
Pipeline Engineering: Design and optimize large-scale ETL/ELT pipelines on
Databricks using Python and PySpark.
Data Quality & Governance: Implement automated data checks, manage data
lineage, and ensure compliance with cybersecurity standards.
Analytics & Visualization: Create data models, visualizations, and
dashboards for threat and vulnerability insights using Databricks and React.js.
Data Architecture: Develop secure, scalable environments on AWS (S3, ELB,
Lambda) for data storage and processing.
CI/CD for Data & Security: Use CI/CD pipelines to automate testing and
deployment in alignment with Agile practices.
ML Integration: Deploy ML models for threat detection, anomaly detection, and
risk scoring.
Mentorship: Guide junior engineers, establish best practices, and foster a
high-performance, innovative culture.
Role
Responsibilities:
Data Integration, API Development: Integrate diverse cybersecurity data sources
using variety of API mechanisms and to standardize and
streamline data
across the data and user planes.
Build and maintain data APIs for seamless access to data pipelines, enabling
real-time insights for applications, machine learning models, and analytical layers.
Data Pipeline Engineering Optimization: Design, develop, and optimize
large-scale ETL/ELT pipelines on Databricks to efficiently process and
transform cybersecurity
data. Utilize Python,PySpark, and Databricks to automate and standardize data
workflows across stages (raw,cleaned, curated), ensuring scalability
and high performance.
Data Quality & Governance: Implement automated data quality checks,
leveraging Databricks DQM tools and CI/CD pipelines to uphold data integrity
and governance
standards. Ensure datlineage, metadata management, and compliance with
cybersecurity and privacy regulations, applying rigorous quality standards
across data ingestion and processing workflows.
Data Analytics & Visualization: Design centralized data models and perform
in-depth data analysis to support cybersecurity and risk management
objectives.
Develop visualizations and dashboards using tools like Databricks, encapsulate
data to spin up to React.js application layerto provide
stakeholders
with actionable insights into threat landscapes, vulnerability trends, and
performance metrics across the platform.
Scalable & Secure Data Architecture: Architect and manage secure,
high-performance data environments on Databricks, utilizing AWS services such
as
S3
ELB, and Lambda. Ensure data availability, consistency, and security, aligning
with AWS best practices and data encryption standards to safeguard
Sensitive
cybersecurity data.
Agile Product; Engineering Continuous Delivery: Collaborate with advanced Agile
Product; Engineering cross-functional teams to deliver data-driven
insights
through analytics tools and custom visualizations that inform strategy and
decision-making. Empower stakeholders with timely, actionable
intelligence
from complex data analyses, enhancing their ability to respond to evolving
cybersecurity risks.
Data Science & ML Integration: deploy machine learning models, including
predictive analytics,anomaly detection, and risk scoring algorithms, into the
CASM
platform. Leverage Python and PySpark to enable real-time and batch processing
of model outputs, enhancing CASM Platforms proactive threat
detection
and response capabilities.
Mentorship; Best Practices Promotion: Mentor junior engineers, establishing
best practices in data engineering, DevOps, data science, and analytics.
Encourage
high standards in model deployment, data security, performance optimization,
and visualization practices, fostering aculture of innovation and
excellence.
Education
Qualifications:
Education: B.S., M.S., or Ph.D. in Computer Science,
Data Science, Information Systems, or a related field, or equivalent
professional experience.
Technical Expertise: 8%2B years in data engineering
with strong skills in Python, PySpark, SQL, and extensive, hands-on
experience with Databricks and big data frameworks. Expertise in integrating
data science workflows and deploying ML models for real-time and batch
processing within a cybersecurity context.
Cloud Proficiency: Advanced proficiency in AWS,
including EC2, S3, Lambda, ELB, and container orchestration (Docker,
Kubernetes). Experience in managing large-scale data environments on AWS,
optimizing for performance, security, and compliance.
Security Integration: Proven experience implementing
SCAS, SAST, DAST/WAS, and secure DevOps practices within an SDLC framework to
ensure data security and compliance in a high-stakes cybersecurity environment.
Data Architecture: Demonstrated ability to design
and implement complex data architectures,including data lakes, data warehouses,
and lake house solutions. Emphasis on secure, scalable, and highly available
data structures that support ML-driven insights and real-time analytics.
Data Quality ; Governance: Hands-on experience with
automated data quality checks, data lineage, and governance standards.
Proficiency in Databricks DQM or similar tools to enforce data integrity and
compliance across pipelines.
Data Analytics; Visualization: Proficiency with
analytics and visualization tools such as Databricks, Power BI, and Tableau to
generate actionable insights for cybersecurity risks, threat patterns, and
vulnerability trends. Skilled in translating complex data into accessible
visuals and reports for cross-functional teams.
CI/CD and Automation: Experience building CI/CD
pipelines that automate testing, security scans, and deployment processes.
Proficiency in deploying ML models and data processing workflows using CI/CD,
ensuring consistent quality and streamlined delivery.
Agile Experience: Deep experience in Agile/Scrum
environments, with a thorough understanding of Agile core values and
principles, effectively delivering complex projects with agility and
cross-functional collaboration.
Preferred
Experience :
Advanced Data Modeling; Governance: Expertise in
designing data models for cybersecurity data analytics, emphasizing data
lineage, federation, governance, and compliance. Experience ensuring security
and privacy within data architectures.
Machine Learning; Predictive Analytics: Experience
deploying ML algorithms, predictive models, and anomaly detection frameworks to
bolster CASM platforms cybersecurity capabilities.
High-Performance Engineering Culture: Background in
mentoring engineers in data engineering best practices, promoting data science,
ML, and analytics integration, and fostering a culture of collaboration and
continuous improvement.