Data Engineer is needed to perform the following duties:
· Designing/Architecture of enterprise grade data warehouses to inform applications.
· Good Architecture skills and aware of best design practices and software architectures.
· Modeling how data is ingested into the analytics platform, which includes extracting, transforming, and preparing data.
· Analyzing client datasets, comparing them with existing data sets, generating reports, and analyzing behaviors represented in the datasets.
· Providing technical solutions for business problems.
· Performing data upgrade and schema changes in database tables as needed.
· Good working knowledge in ETL Tools.
· Gathering data and business requirements from the customer and design entity relationship models.
· Translating requirements from product managers into technical implementations.
· Creating innovative solutions to respond to data content/data formatting challenges.
· Ensuring that the data is used as a strategic enabler and that the models that are built respond to the needs of the business.
· Actively participating in each level of data engineering lifecycle, namely data identification, data acquisition & filtering, data extraction, data aggregation & representation and utilization of analysis results.
· Utilizing high-level programming languages and shell-scripting.
· Working within UNIX/Linux OS and Big Data environments, including Hadoop, HDFS, and Pig/Hive.
· Improving and maintaining existing architecture/systems.
· Designing and executing Hive and Spark scripts to run and analyze inbound data transformation.
· Running ad-hoc queries for data analysis, build and run scripts on cloud (AWS).
· Constructing visualizations and dashboards to drive key management decisions.
· Creating explainable and auditable models using machine learning to develop understandable CPG consumer behavior audiences that meet GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act) requirements.
· Creating and analyzing tables in Athena on top of large S3 dataset.
· Working on cloud big data platform, EMR, to process vast amounts of data.
· Recommending data integration points, architecture, and workflow design changes.
· Providing data analysis focusing on the business needs of our customers, both internal and external.
· Creating predictive models in SPARK ML to drive business decisions.
· Performing data mapping and extraction from the customer’s databases to the regional data hubs.
· Cleansing, pre-processing & ingesting data split into several layers into the Hadoop data lake using HDFS and Hive.
· Coding Spark scripts to merge stored data sets from disparate sources.
· Creating taxonomy mappings for various external and internal clients.
· Ensuring version control of all Test, UAT and Production ETL code and config files.
Bachelor's Degree is required in Computer Science or Computer Engineering or Information Systems.