Jobs / BV ***
AI Data Engineer
BV *** · United States · Remote
Visa sponsorship details are locked. Unlock company name and apply link with .
United States100,000-150,000 USD/yearlyRemote
Remuneration
100,000-150,000 USD/yearly
Location
United States · Remote
Eastern Daylight Time (UTC-4)
Visa sponsorship
Sponsors visa
Job summary
Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications.
Benefits
Employment Terms & Visa PolicyThis is a 100% remote, full-time, direct W2 position with Bright Vision
Qualifications
- Bachelor’s or Master’s degree in Computer Science or a related field.
- Six or more years of data engineering experience, with significant work supporting ML or AI workloads.
- Strong proficiency in Python and at least one JVM or systems language.
- Deep experience with modern data processing frameworks such as Spark, Ray, or Beam.
- Hands-on experience operating petabyte-scale storage and pipeline systems.
- Strong understanding of distributed systems, data modeling, and storage formats.
- Experience with dataset versioning, lineage, and reproducibility for ML workflows.
- Familiarity with high-throughput data loading for accelerator-based training.
- Strong software engineering practices including testing, CI/CD, and code review.
- Excellent communication and cross-functional collaboration
- Experience with multimodal datasets at large scale.
- Familiarity with data quality tooling and dataset evaluation methodology.
Responsibilities
- Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows.
- Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals.
- Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale.
- Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training.
- Build high-throughput data loading systems that maximize GPU utilization during training.
- Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems.
- Design storage architectures balancing cost, throughput, and latency across data tiers.
- Build evaluation dataset construction pipelines with strict integrity and contamination controls.
- Implement data privacy, redaction, and consent enforcement throughout the pipeline.
- Collaborate with ML researchers and engineers to align data systems with model development needs.
- Drive observability of data quality, drift, and pipeline health across the AI data estate.
- Optimize cost and performance through compression, format selection, and caching strategies.
Skills
Communication
Degrees
AssociateDegree
Industry
AutomotiveEnergyMediaPublic-sector
Company size
Smb