You are viewing a preview of this job. Log in or register to view more details about this job.

AI Kernel Engineer

AI Kernel Engineer

Description

The company has developed an innovative General-Purpose Neural Processing Unit (GPNPU) architecture. This co-optimized software and hardware platform is designed to execute neural network (NN) inference workloads across a diverse range of edge and endpoint devices, from battery-powered smart sensors to high-performance automotive and autonomous vehicle systems. Unlike traditional NPUs or accelerators that only handle specific portions of a machine learning graph, this GPNPU architecture can execute both NN graph code and standard C++ DSP and control code.

Role

The AI Kernel Engineer plays a vital role in enabling a wide array of AI kernels and operators to run efficiently on the proprietary platform. Key objectives include developing a high-efficiency kernel library for various AI and Large Language Model (LLM) architectures and performing deep performance analysis to optimize kernels for diverse hardware configurations. This senior technical position requires extensive knowledge of hardware architecture, compiler toolchains, and advanced optimization techniques.

Responsibilities

Develop high-efficiency AI and LLM kernels and operators for optimized inference on the GPNPU platform.
Optimize kernel performance across various hardware configurations and specific workloads.
Profile and analyze kernel performance regarding computation, data movement, and parallelism.
Identify micro-architecture and software bottlenecks and implement effective optimization solutions.
Refine C/C++ code to maximize hardware utilization.
Collaborate across the AI inference stack to support broader team and business priorities.
Contribute to the improvement of the internal toolchain, compiler, and runtime environments.
Provide technical documentation and support to customers and the developer community.

Requirements

Education: Bachelor's or master's degree in computer science, Electrical Engineering, or related field.
Experience: 5+ years of professional experience in AI kernel development and optimization.
Performance Profiling: Proven experience with model and kernel inference performance profiling.
Compute Development: Proficiency in at least one of the following: CUDA, DSP, NEON, or Triton-lang.
Programming Skills: Expert proficiency in C/C++ and Python; experience with assembly language is a significant plus.
Professional Attributes: Strong problem-solving, debugging, and communication skills.