Data Engineer

Full Time, onsite
Scanbuy
New York, United States of America

Salary undisclosed

Checking job availability...

Original

Simplified

Data Engineer is needed to perform the following duties:

· Designing/Architecture of enterprise grade data warehouses to inform applications.

· Good Architecture skills and aware of best design practices and software architectures.

· Modeling how data is ingested into the analytics platform, which includes extracting, transforming, and preparing data.

· Analyzing client datasets, comparing them with existing data sets, generating reports, and analyzing behaviors represented in the datasets.

· Providing technical solutions for business problems.

· Performing data upgrade and schema changes in database tables as needed.

· Good working knowledge in ETL Tools.

· Gathering data and business requirements from the customer and design entity relationship models.

· Translating requirements from product managers into technical implementations.

· Creating innovative solutions to respond to data content/data formatting challenges.

· Ensuring that the data is used as a strategic enabler and that the models that are built respond to the needs of the business.

· Actively participating in each level of data engineering lifecycle, namely data identification, data acquisition & filtering, data extraction, data aggregation & representation and utilization of analysis results.

· Utilizing high-level programming languages and shell-scripting.

· Working within UNIX/Linux OS and Big Data environments, including Hadoop, HDFS, and Pig/Hive.

· Improving and maintaining existing architecture/systems.

· Designing and executing Hive and Spark scripts to run and analyze inbound data transformation.

· Running ad-hoc queries for data analysis, build and run scripts on cloud (AWS).

· Constructing visualizations and dashboards to drive key management decisions.

· Creating explainable and auditable models using machine learning to develop understandable CPG consumer behavior audiences that meet GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act) requirements.

· Creating and analyzing tables in Athena on top of large S3 dataset.

· Working on cloud big data platform, EMR, to process vast amounts of data.

· Recommending data integration points, architecture, and workflow design changes.

· Providing data analysis focusing on the business needs of our customers, both internal and external.

· Creating predictive models in SPARK ML to drive business decisions.

· Performing data mapping and extraction from the customer’s databases to the regional data hubs.

· Cleansing, pre-processing & ingesting data split into several layers into the Hadoop data lake using HDFS and Hive.

· Coding Spark scripts to merge stored data sets from disparate sources.

· Creating taxonomy mappings for various external and internal clients.

· Ensuring version control of all Test, UAT and Production ETL code and config files.

Bachelor's Degree is required in Computer Science or Computer Engineering or Information Systems.