Principal Big Data Engineer

Comcast Philadelphia, PA

About the Job

Business Unit:

Job Description

Softwareengineering and data science skills, combined with the demands of a highly-visible enterprise metadata repository, make this an exciting challenge for the right candidate.

Are you passionate about digital media, entertainment, and software services? Do you likebigchallenges and working within ahighly-motivatedteam environment?

As aMetadataEngineer inthe dx Data Experience team, you will design, develop and support metadata repositories using real-time distributed computing architectures.Our mission is to enable many diverse users with the tools and information to gather, organize, make sense of Comcast data, and make it universally accessible to empower, enable, and transform Comcast into an insight-driven organization. Thedxbig dataorganization is afast-moving team of world-class experts who are innovating in end-to-end data delivery. We are a team that thrives onbigchallenges, results, quality, and agility.

Who does theMetadataengineer work with?

DX MetadataEngineering is a diverse collection of professionals who work with a variety of teams ranging from: other software engineering teams whose Metadata repositories integrate with the Centralized Metadata Store, Portal engineers who develop a UI to support data discovery, software engineers on other DX platforms that ingest, transform, and retrieve data whose metadata is stored in the Centralized Metadata Store, data stewards/data architects who collect and disseminate metadata information,and users who rely on Metadata for data discovery and lineage.

What are some interesting problems you'll be working on?

You will design and develop the metadata and business glossary collection and enrich the system that allows real-time update of the enterprise and satellite metadata repositories using best-of-breed and industry-leading technologies. These repositories contain metadata and lineage for a widely diverse and ever-growing complement of datasets (e.g., Hortonworks, AWS S3, Streaming Data (Kafka/kinesis), streaming data transformation, ML pipeline, Teradata, and RDBMS's) You will design and develop cross-domain cross-platform lineage tooling by using advanced statistical methods and Machine Intelligence algorithms. You will develop tools to discover data across disparate metadata repositories, develop tools for data governance, and implement processes to rationalize data across the repositories.

Where can you make an impact?

The dx Teamis building the enterprise metadata repository needed to drive the next generation ofdata platforms and data processing capabilities.Building data applications, identifying trouble spots, and optimizing the overall user experience is a challenge that can only be met with a robustmetadata repository capable of providing insightsinto the data and its lineage.

Success in this role is best enabled by a broad mix of skills and interests ranging from traditional distributed systems software engineering prowess to the multidisciplinary field of data science.

Responsibilities:

-Lead design and development of metadata assets, primarily using Java 8. Oversee the research, writing, and editing of documentation and technical requirements including software designs, evaluation plans, test results, technical manuals, and formal recommendations and reports.

-Employ rigorous continuous delivery practices managed under an agile software development approach.

-Assist with technical leadership throughout the design process and assist in guidance with regards to practices, procedures, and techniques.

-Prototype ideas for new tools, and services. Build capabilities that capture, organize and disseminate metadata databothinreal-time and batch processing

-Present and defend design and technical choices to internal and external audiences.

-Track and evaluate performance metrics.

-Ensure team delivers software on time, to specification, and within budget.

-Work with team to determine if applications fit specification and technical requirements. Tests and evaluates systems, subsystems, and components.

-Develop necessary artifacts for day to day operationof the metadata repositories and Ensure a smooth transition to production

-Enhance our DevOps practicesto deploy and operate our systems

-Automate and streamline our operations and processes

-Build and maintain tools for deployment,monitoring, and operations

-Act as technical contact and liaison for outside vendors and customers.

-Consistent exercise of independent judgment and discretion in matters of significance.

-Keep abreast of technological developments within the industry.

-Monitor and evaluate competitive applications and products. Reviews literature, patents, and current practices relevant to the solution of assigned projects.

-Troubleshoot and resolve issues in our development, test andproduction environments

-Train junior Software Development Engineers on internally developed software applications, and promote and enforce best practices in software design, development, and debugging in an Object-Oriented environment.

Here are some ofthespecific technologies we use:

-Metadata Repositories-Apache Atlas and Informatica MDM

-Spark(AWS EMR, Databricks)

-Kafka, AWS Kinesis

-AWS Glue, AWS Lambda

-Cassandra, RDBMS, Teradata, AWS DynamoDB

-Elasticsearch, Solr, Logstash, Kibana

-Java, Scala, Go, Python, R

-Git,Maven, Gradle, Jenkins

-Puppet, Docker, Terraform, Ansible, AWS CloudFormation

-Linux

-Kubernetes

-Manta

-Hadoop (HDFS, YARN, ZooKeeper, Hive), Presto

-Jira

Skills & Requirements:

-8+years of experiencein designing and building metadata solutions

-Bachelors or Masters inComputer Science, Statisticsor related discipline

-Experience in software development of large-scale distributed systems including a proven track record of delivering backend systems that participate in a complex ecosystem.

-Experienceinmetadata-related technologies and Apache Atlas frameworks preferred

-Experience in using and contributing to Open Source software preferred.

-ProficientinUnix/Linux environments

-Knowledge of network engineering and security

-Test-driven development/testautomation, continuous integration, and deployment automation

-Enjoyworking withdata analysis, data quality and reporting

-Excellentcommunicator, able to analyze and articulate complex issues and technologies understandably and engagingly

-Great design and problem-solving skills

-Adaptable, proactive and willing to take ownership

-Keen attention to detail and high level of commitment

-Thrivesin a fast-paced agile environment. Requirements change quickly, and our team needs to constantly adapt to moving targets

About Comcastdx (Data Experience):

dx(Data Experience) is a results-driven, data platform research and engineering team responsible for the delivery of multi-tenant data infrastructure and platforms necessary to support our data-driven culture and organization. We have an overarching objective to gather, organize, and make sense of Comcast data with the intention to reveal business and operational insight, discover actionable intelligence, enable experimentation, empower users, and delight our stakeholders. Members of the dx team define and leverage industry best practices, work on extremely large-scale data problems, design and develop resilient and highly robust distributed data organizing and processing systems and pipelines as well as research, engineer, and apply data science and machine intelligence disciplines.

Our mission is to enable many diverse users with the tools and information to gather, organize, make sense of Comcast data, and make it universally accessible to empower, enable, and transform Comcast into an insight-driven organization.





Comcast is an EOE/Veterans/Disabled/LGBT employer