Software Engineer, Data Acquisition - Paris/London

Posted 11 Days Ago
Be an Early Applicant
Paris, Île-de-France
Entry level
Artificial Intelligence
The Role
As a Web Crawling and Data Indexing Engineer, you will develop web crawlers using Python, and utilize various techniques to collect and process large-scale data. You'll work with cross-functional teams to integrate data from APIs, ensure data quality, and optimize existing infrastructure for efficiency.
Summary Generated by Built In

About Mistral 

- At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world.

- Our mission is to make AI ubiquitous and open. 

- We are creative, low-ego, team-spirited, and have been passionate about AI for years.

- We hire people that foster in competitive environments, because they find them more fun to work in.

- We hire passionate women and men from all over the world.

- Our teams are distributed between France, UK and USA 


Role Summary 

- We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team.

- The ideal candidate will have a strong background in web scraping, data extraction and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources.

- The role is based in Paris or London 


Key Responsibilities 

- Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.

- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.

- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.

- Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.

- Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.

- Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.

- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.


Qualifications & profile 

- Bachelor’s or master’s degree in computer science, information systems, or information technology

- Strong understanding of web technologies, data structures, and algorithms.

- They should have knowledge of database management systems and data warehousing.

- Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential. 

- Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites.

- Knowledge of HTTP and HTTPS protocols

- A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary

- Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data.

- Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup

- Understanding how search engines work and how to optimize web crawling.

- Experience in Machine Learning to improve the efficiency and accuracy of web crawling

- Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data. 


Benefits 

- Daily lunch vouchers 

- Contribution to a Gympass subscription 

- Monthly contribution to a mobility pass 

- Full health insurance for you and your family 

- Generous parental leave policy 

Top Skills

C++
Java
Python
The Company
HQ: Paris
92 Employees
On-site Workplace
Year Founded: 2023

What We Do

Fast, open-source and secure language models. Facilitated specialisation of models on business use-cases, leveraging private data and usage feedback.

Built from a world-class team in Europe, targeting global market. Join the team ! https://jobs.lever.co/mistral/

Similar Jobs

Snap Inc. Logo Snap Inc.

ASIC Verification Engineer

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
2 Locations
5000 Employees

Snap Inc. Logo Snap Inc.

SoC Design Engineer

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
2 Locations
5000 Employees

Dynatrace Logo Dynatrace

Digital Experience Monitoring - Solutions Engineer

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Hybrid
Paris, Île-de-France, FRA
4700 Employees

Dynatrace Logo Dynatrace

Solutions Engineer

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Hybrid
Boulogne-Billancourt, Hauts-de-Seine, Île-de-France, FRA
4700 Employees

Similar Companies Hiring

Voltage Park Thumbnail
Software • Other • Machine Learning • Infrastructure as a Service (IaaS) • Hardware • Cloud • Artificial Intelligence
San Francisco, CA
51 Employees
Eastwall Thumbnail
Software • Information Technology • Consulting • Cloud • Big Data Analytics • Artificial Intelligence • App development
Denver, CO
20 Employees
Smartcat Thumbnail
Natural Language Processing • Machine Learning • Conversational AI • Artificial Intelligence
Boston, Massachusetts
242 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account