Data Engineer
Job Summary:
At AnthologyAI, we believe that behind every data point is a story of hundreds of decisions made by real people. These stories are the pulse of every sector in the global economy, shaping our present and future. Our mission is to democratize access to these stories in the data economy. We are the most efficient, accurate, and actionable source of consumer intelligence. We ethically capture and analyze consumer behaviors 24/7 via our app, Caden, without compromising privacy or security.
We have set out to pioneer a breakthrough platform that combines billions of unbiased first-party consumer data with advanced predictive AI models. It will empower businesses across industries—from retail to banking— regardless of their internal data capabilities, to predict market dynamics and consumer behaviors with unparalleled precision.
We’re led by industry veterans, backed by powerhouse investors (almost $30M total investment), and powered by an extremely talented, experienced and diverse team.
This position will report to the Data Science Manager and will be an integral part of the Data & AI organization. Your expertise and daily contributions will drive the development and enhancement of our products/services that will generate business value to our business, clients and our users.
What You'll Do
- Work within the Data Science team to implement various data pipelines for our end-to-end solutions.
- Provide active input into the design of our data product offerings portfolio and our data dissemination framework.
- Integrate the right tools and methods for data enhancement, data quality, data obfuscation, privacy measurement and enhancement.
- Implement data security and access controls throughout the data pipeline.
- Develop actionable tools for monitoring the health of implemented pipelines and identify and fix issues in real-time.
- Define, manage, and contribute to the architecture of the AnthologyAI data and machine learning deployment pipelines.
What You’ve Done
- 2-3 years of industry experience developing ETL (Data processing pipeline) to integrate large volumes of data from various sources with a variety of database technologies.
- Advanced SQL knowledge and experience in no-SQL, GraphQL, etc.
- Experience in delivering production-ready code (Python, Java, etc.) to retrieve, cleanse, transform the data for analytical/modeling purpose
- Experience working in DataBricks
- Experience using modern big data pipelines (AWS, GCP, DataBricks)
- Experience with BigData frameworks (Hadoop, Hive, Spark, Kafka, Airflow, etc.)
- Ability to think out-of-box and evaluate results based on customer value
- Experience setting up and managing large data pipelines
Required Experience:
- Degree within Computer Science, Data Engineering, or a related field.
- Proven experience in designing, developing, and deploying pipelines in a real-world setting.
- Strong programming skills in Python or similar language
- Knowledge of AI/ML models, Natural language process (NLP) and data mining.
- Proficiency in SQL and working with large and complex datasets.
Nice To Have:
- Experience with Triple stores / ontology databases (RDF, OWL, SPARQL, Jena, etc.) and knowledge graphs
- Developing new metrics
- data/business analysis experience
- Sales eng experience
- Experience working with regulated data (healthcare, finance, etc)
Why AnthologyAI?
- Join a high-growth startup that is at the forefront of innovation
- Opportunity to make a significant impact on the company's strategic and growth trajectory
- Collaborative and inclusive work environment that encourages innovation and growth
- Competitive compensation package that includes equity
- Health & Commuter Benefits
- Flexible PTO
- Hybrid work arrangements
This role will work (hybrid) 3 days a week onsite out of our SoHo office.
The salary range for this position is $115,000- $160,000 per year based on candidate qualifications.
** There is currently no relocation and/or visa (immigration) assistance provided for this position.
Top Skills
What We Do
At AnthologyAI, we've rearchitected efficient consumer data acquisition by putting the user in the middle of the process. Via our app Caden, we allow users to be in control of their personal data, and if they explicitly choose to, they can monetize it by sharing their anonymous data with us. It's a simple value proposition on top of immensely complex technology—the only way for us to ensure security, privacy and accuracy.
The result is, the single most efficient, accurate and actionable source of consumer intelligence ever built. We have our finger on the pulse of global consumerism, 24 hours per day, without PII, identity resolution or any other privacy-invading techniques.
The AI models we've trained our tens of billions of ethically-sourced datapoints on have a complete understanding of consumer behavior. We can predict inflation as easily as we can predict how many people will walk across a certain street corner in Manhattan in six months. We can model likelihood of churn as easily as we can probabilities that a certain film or TV show will do well.
We believe explicit consent is now the name of the game, with terms happening above the board and dark patterns being snuffed out. Ethically sourced first-party data has become the most valuable asset on the internet, hampering personalization, addressability, measurement, activation, and analysis for anyone who doesn't have it.
We've changed the consumer intelligence game by leading with trust, ethics and intelligence, wrapped in bleeding-edge technology.
Why Work With Us
AnthologyAI is on a mission to solve a 25-year-old problem. With all the changes happening in the digital world, from privacy laws to the demise of third-party cookies and the end of tracking on iPhones, we're in the midst of a digital revolution. Working here is more than just a job - it's an opportunity to be part of something bigger.
Gallery
