Reliability for Machine Learning
As machine learning models are deployed in mission-critical applications, their reliability becomes paramount. Our extended research focuses on:
- Fault Tolerance: Developing techniques to detect and recover from transient and permanent hardware faults during neural network execution.
- Software-level Solutions: Creating robust algorithms that can tolerate underlying hardware inaccuracies without significant loss in model accuracy.
- Adversarial Robustness: Understanding and mitigating vulnerabilities of AI systems to intentional perturbations or unexpected inputs.
We strive to build machine learning systems that are not just intelligent, but also thoroughly dependable.