You are viewing a preview of this job. Log in or register to view more details about this job.

Internship, Software Engineer, AI Inference (Fall 2024)

What to Expect

Consider before submitting an application:

This position is expected to start around September 2024 and continue through the entire Fall 2024 term or into Winter 2025 if available. We ask for a minimum of 12 weeks, full-time and on-site, for most internships.

International Students: If your work authorization is through CPT, please consult your school on your ability to work 40 hours per week before applying. You must be able to work 40 hours per week on-site. Many students will be limited to part-time during the academic year.

In this role, you will be responsible for the internal working of the AI inference stack and compiler running neural networks in millions of Tesla vehicles and Optimus. You will collaborate closely with the AI Engineers and Hardware Engineers to understand the full inference stack and design the compiler to extract the maximum performance out of our hardware.

The inference stack development is purpose-driven: deployment and analysis of production models inform the team's direction, and the team's work immediately impacts performance and the ability to deploy more and more complex models. With a cutting-edge co-designed MLIR compiler and runtime architecture, and full control of the hardware, the compiler has access to traditionally unavailable features, that can be leveraged via novel compilation approaches to generate higher performance models.

What You’ll Do

Take ownership of parts of AI Inference stack (Export/Compiler/Runtime) (flexible, based on skills/interests/needs)
Closely collaborate with AI team to guide them on the design and the development of Neural Networks into production
Collaborate with HW team to understand current HW architecture and propose future improvements
Develop algorithms to improve performance and reduce compiler overhead
Debug functional and performance issues on massively-parallel systems
Work on architecture-specific neural network optimization algorithms for high performance computing

What You’ll Bring

Currently pursuing a degree in Computer Science & Engineering, or a related field
Comfortable with C++ and Python
Capable of delivering results with minimal oversight