Extended Research

Reliability for Machine Learning

Ensuring the dependability and predictable performance of machine learning systems in the presence of hardware faults and adversarial conditions.

As machine learning models are deployed in mission-critical applications, their reliability becomes paramount. Our extended research focuses on:

Fault Tolerance: Developing techniques to detect and recover from transient and permanent hardware faults during neural network execution.
Software-level Solutions: Creating robust algorithms that can tolerate underlying hardware inaccuracies without significant loss in model accuracy.
Adversarial Robustness: Understanding and mitigating vulnerabilities of AI systems to intentional perturbations or unexpected inputs.

We strive to build machine learning systems that are not just intelligent, but also thoroughly dependable.

Recent Publications