Data Engineer

Posted 16 Days Ago
Mountain View, CA
Hybrid
Senior level
Artificial Intelligence • Cloud • Machine Learning • Software • Database
The Role
The Data Engineer will be responsible for building end-to-end production-grade data solutions, scalable ETL pipelines, and managing effective data storage and security. They will work closely with the Machine Learning and Data Platform team to ensure quality data ingestion and transformation while engaging with various stakeholders.
Summary Generated by Built In

Come and change the world of AI with the Kumo team!


Companies spend millions of dollars to store terabytes of data in data lakehouses, but only leverage a fraction of it for predictive tasks. This is because traditional machine learning is slow and time consuming, taking months to perform feature engineering, build training pipelines, and achieve acceptable performance.


At Kumo, we are building a machine learning platform for data lakehouses, enabling data scientists to train powerful Graph Neural Net models directly on their relational data, with only a few lines of declarative syntax known as Predictive Query Language. The Kumo platform enables users to build models a dozen times faster, and achieve better model accuracy than traditional approaches.


We're seeking intellectually curious and highly motivated Data Engineers to become foundational members of our Machine Learning and Data Platform team.

Your Foundation:

  • 1+ years of professional experience in SaaS/Enterprise companies 
  • Strong experience with data ingestion and connectors
  • Experience in building end-to-end production-grade data solutions on AWS or GCP
  • Experience in building scalable ETL pipelines.
  • Ability to plan effective data storage, security, sharing, and publishing within an organization.
  • Experience in developing batch ingestion and data transformation routines using ETL tools.
  • Familiarity with AWS services such as S3, Kinesis, EMR, Lambda, Athena, Glue, IAM, RDS.
  • Proficiency in several programming languages (Python, Scala, Java).
  • Familiarity with orchestration tools such as Temporal, Airflow, Luigi, etc.
  • Self-starter, motivated, with the ability to structure complex problems and develop solutions.
  • Excellent communication skills and ability to explain data and analytics strengths and weaknesses to both technical and senior business stakeholders.

Your Extra Special Sauce:

  • Deep familiarity with Spark and/or Hive
  • Understanding of different storage formats like Parquet, Avro, Arrow, and JSON and when to use each
  • Understanding of schema designs like normalization vs. denormalization.
  • Proficiency in Kubernetes, and Terraform.
  • Azure, ADF and/or Databricks skills
  • Experience with integrating, transforming, and consolidating data from various data systems into analytics solutions
  • Good understanding of databases, SQL, ETL tools/techniques, data profiling and modeling
  • Strong communications skills and client engagement

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Top Skills

Java
Python
Scala
The Company
HQ: Mountain View, CA
38 Employees
On-site Workplace
Year Founded: 2021

What We Do

Democratizing AI on the Modern Data Stack!

The team behind PyG (PyG.org) is working on a turn-key solution for AI over large scale data warehouses. We believe the future of ML is a seamless integration between modern cloud data warehouses and AI algorithms. Our ML infrastructure massively simplifies the training and deployment of ML models on complex data.

With over 40,000 monthly downloads and nearly 13,000 Github stars, PyG is the ultimate platform for training and development of Graph Neural Network (GNN) architectures. GNNs -- one of the hottest areas of machine learning now -- are a class of deep learning models that generalize Transformer and CNN architectures and enable us to apply the power of deep learning to complex data. GNNs are unique in a sense that they can be applied to data of different shapes and modalities.

Similar Jobs

BlackLine Logo BlackLine

Technical Lead, Data Engineer (Snowflake)

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Hybrid
Pleasanton, CA, USA
1810 Employees
201K-269K Annually

UL Solutions Logo UL Solutions

Data Engineer

Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
Hybrid
2 Locations
15000 Employees
90K-120K Annually

Atlassian Logo Atlassian

Principal Data Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees
169K-271K Annually

PwC Logo PwC

Google Cloud Data Engineer [No-Code / Low-Code]-Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote
Hybrid
67 Locations
364000 Employees
100K-232K Annually

Similar Companies Hiring

Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account