Jobs / Bri***

ML Platform Engineer

Bri*** · McKinney, TX

Visa sponsorship details are locked. Unlock company name and apply link with .

McKinney, TXRemote

Remuneration

Not specified

Location

McKinney, TX

Visa sponsorship

Sponsors visa

Job summary

Bri*** is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications.

Benefits

Employment Terms & Visa PolicyThis is a 100% remote, full-time, direct W2 position with Bright Vision

Qualifications

Bachelor’s or Master’s degree in Computer Science or a related field
Six or more years of experience in distributed systems, infrastructure, or ML platform engineering
Strong proficiency in Python and a systems language such as Go, Rust, or C++
Deep experience operating high-throughput, low-latency services in production
Hands-on experience with LLM or large model inference frameworks such as vLLM or TensorRT-LLM
Strong understanding of GPU architecture, memory hierarchies, and accelerator utilization
Familiarity with Kubernetes, autoscaling, and modern cloud platforms
Experience with observability stacks including metrics, tracing, and structured logging
Solid grounding in performance engineering and capacity planning
Strong communication and incident response
Open-source contributions to model serving infrastructure
Experience with multi-region or globally distributed AI serving

Responsibilities

Design and operate model serving platforms supporting diverse workloads including LLMs, vision models, and recommendation systems
Optimize inference performance using continuous batching, paged attention, speculative decoding, and request multiplexing
Implement multi-tenant routing, rate limiting, and quality-of-service policies across model endpoints
Build autoscaling and capacity management systems that balance latency, throughput, and cost
Tune GPU utilization, memory management, and KV cache strategies for LLM serving workloads
Integrate model serving with API gateways, identity systems, and observability platforms
Implement caching, prompt deduplication, and response reuse strategies where appropriate
Drive end-to-end observability including latency histograms, queue dynamics, GPU utilization, and error tracking
Develop deployment workflows including canary releases, shadow testing, and automated rollback
Operate incident response for high-availability AI services and drive durable reliability improvements
Collaborate with ML and product teams to support new model releases and capability rollouts
Implement security controls including request signing, content filtering, and abuse detection at the serving layer

Skills

Communication

Degrees

AssociateDegree

Industry

AutomotiveEnergyMediaPublic-sector

Company size

Smb