Data engineer
The worldwide data management software market is massive (IDC forecasts it to be $137.6 billion by 2026!). At MongoDB we are transforming industries and empowering developers to build amazing apps that people use every day.
We are the leading modern data platform and the first database provider to IPO in over 20 years. Join our team and be at the forefront of innovation and creativity.
MongoDB is growing rapidly and seeking a Data Engineer to be a key contributor to the company’s Internal Data Platform. You will build ETL pipelines that pull data into our Data Lake / Warehouse and that will be used to drive forward our growth as a product and as a company.
You will take on complex data-related problems using very diverse data sets, and will work with stakeholder groups throughout the company to help them make better data-informed decisions.
This role can be based out of our New York City office.
Our ideal candidate has experience with
- Building ETL pipelines at scale that can grow without sacrificing performance
- Data Lake / Warehouse design patterns and concepts, including Delta Lakes
- Several programming languages (Python, Scala, Java, etc.)
- Data processing frameworks such as Spark and Pandas
- Orchestration tools such as Airflow, Luiji, Azkaban, Cask, etc.
- AWS services such as S3, Kinesis, EMR, Lambda, Athena, Glue, IAM, RDS, etc.
- Different storage formats such as Parquet, JSON, Avro, and Arrow
- Streaming data processing frameworks like Kafka, KSQL, and Spark Streaming
- A diverse set of databases (MongoDB, Redshift, etc.)
You might be an especially great fit if you
- Enjoy wrangling huge amounts of data and exploring new data sets
- Value code simplicity and performance
- Obsess over data : everything needs to be accounted for and be thoroughly tested
- Plan effective data storage, security, sharing, and publishing within an organization
- Constantly thinking of ways to squeeze better performance out of data pipelines
Nice to haves
- You are deeply familiar with Spark and / or Hive
- You have expert experience with Airflow
- You understand the differences between different storage formats like Parquet, Avro, Arrow, and JSON and when to use each
- You understand the tradeoffs between different schema designs like normalization vs. denormalization
- In addition to data pipelines, you’re also quite good with Kubernetes, Drone, and Terraform
- You’ve built an end-to-end production-grade data solution that runs on AWS or GCP
- You have experience building machine learning pipelines using tools such as SparkML, Tensorflow, Scikit-Learn, etc.
Responsibilities
As a Data Engineer, you will
- Help drive best practices in continuous integration and delivery
- Help drive optimization, testing, and tooling to improve data quality
- Collaborate with other software engineers, machine learning experts, and stakeholders, taking learning and leadership opportunities that will arise every single day
To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone.
From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys.
Learn more about what it’s like to work at MongoDB , and help us make an impact on the world!
MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process.
To request an accommodation due to a disability, please inform your recruiter.
MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Related Jobs
Data engineer
Data engineer
Data Engineer
Senior data engineer