The Artificial Intelligence Enterprise Solutions (AIES) Technology Data Engineering Team is looking for a highly motivated and experienced Senior Data Engineer. The right candidate will have expert level experience in supporting Artificial Intelligence / Machine Learning (AI/ML) Platforms, products and data ingestion/provisioning activities from different data sources to and from the Enterprise Data Lake. As a senior data engineer, you will be working with business & data science teams to get the business data requirements, perform data engineering/provisioning activities to support building, exploring, training and running Business models. The senior data engineer will use ETL tools like Informatica, Ab Initio, and data warehouse tools to deliver critical Model Operationalization services to the Enterprise.
In this role Senior Data Engineer will be responsible for:
Data modeling, coding, analytical modeling, root cause analysis, investigation, debugging, testing and collaboration with the business partners, product managers, architects & other engineering teams.
Adopting and enforcing best practices related to data ingestion and extraction of data from the big data platform.
Extract business data from multiple data sources and store in MapR HDFS location.
Work with Data Scientists and build scripts to meet their data needs
Work with Enterprise Data Lake team to maintain data and information security for all use cases
Build automation script using AUTOSYS to automate the loads
Design and develop scripts and configurations to successfully load data using Data Ingestion Frameworks or Ab initio
Coordinate user access requests for data loaded in Data Lake
Post-production support of the AIES Open Source Data Science (OSDS) Platform
Supporting end-to-end Platform application delivery, including Infrastructure provisioning & automation and integration with Continuous Integration/Continuous Development (CI/CD) platforms, using existing and emerging technologies
Possession of excellent analytical and problem-solving skills with high attention to detail and accuracy
Demonstrated ability to transform business requirements to code, metadata specifications, specific analytical reports and tools
Good verbal, written, and interpersonal communication skills
Experience with SDLC (System Development Life Cycle) including understanding of project management methodologies used in Waterfall or Agile development projects
Strong Hadoop scripting skills to process petabytes of data
5+ years of ETL (Extract, Transform, Load) Programming with tools including Informatica
2+ years of Unix or Linux systems with scripting experience in Shell, Perl or Python
Experience with Advanced SQL (preferably Teradata)
Experience working with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hive, HBase, Pig, Apache Spark, etc.).
Experience with Java and Scala
Experience with Ab Initio
Experience with analytic databases, including Hive, Presto, and Impala
Experience with multiple data modeling concepts, including XML and JSON
Experience with loading and managing data using technologies such as Spark, Scala, NoSQL (MongoDB, Cassandra) and columnar MPP SQL stores (Redshift, Vertica)
Experience with Change and Release Management Processes
Experience with stream frameworks including Kafka, Spark Streaming, Storm or RabbitMQ
Experience working with one or more of the following AWS Cloud services: EC2, EMR, ECS, S3, SNS, SQS, Cloud Formation, Cloud watch