Performance Optimization of RISC-V Based Embedded Systems for Real-Time AI Applications

Authors

  • Ethan J Walker Department of Electrical and Computer Engineering University of Central Florida Orlando, FL, USA
  • Hiroshi Tanaka Department of Computer Science and Engineering San Jose State University San Jose, CA, USA

DOI:

https://doi.org/10.66280/cis.v1i1.113

Keywords:

RISC-V, embedded AI, real-time systems, hardware–software co-design, inference optimization, edge intelligence.

Abstract

Real-time artificial intelligence at the edge increasingly depends on low-power embedded pro-
cessors that can deliver bounded latency under strict energy and memory constraints. RISC-V
is particularly attractive in this context because its open instruction set architecture permits
domain-specific extensions, fine-grained microarchitectural tuning, and portable software stacks
across vendors. This paper presents a complete optimization study for a RISC-V based embed-
ded AI platform targeting low-latency visual and sensor inference. We propose a hardware–
software co-design pipeline that combines quantization-aware deployment, scratchpad-aware
memory scheduling, vectorized kernel fusion, interrupt-aware real-time scheduling, and a lightweight custom instruction extension for convolution accumulation. The target system integrates a 64-
bit dual-core RISC-V processor with RVV 1.0 support, a configurable tensor accelerator, DMA- backed scratchpad memory, and a deterministic inference runtime. Experiments are conducted on three representative workloads: keyword spotting, human activity recognition, and compact object detection. We compare the proposed design against an unoptimized RISC-V baseline, an ARM Cortex-A55 embedded reference, and software-only optimization variants. Across workloads, the proposed system reduces end-to-end latency by 38.7% relative to the optimized software-only RISC-V baseline and by 56.4% relative to the unoptimized deployment, while improving energy efficiency by up to 1.84× and preserving task accuracy within 0.6 percentage points. An ablation study demonstrates that memory scheduling and vector-kernel fusion are the dominant contributors to performance, while the custom instruction path is most beneficial for convolution-heavy models. The results indicate that carefully co-designed RISC-V platforms can satisfy practical real-time AI constraints without sacrificing flexibility or openness.

References

[1] A. Waterman and K. Asanovi, The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Technical Report UCB/EECS-2014-54, University of California, Berkeley, 2014.

[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[3] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollr, “Focal Loss for Dense Object Detec- tion,” in Proc. IEEE International Conference on Computer Vision (ICCV), 2017.

[4] S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in Proc. International Conference on Learning Representations (ICLR), 2016.

[5] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017.

[6] C. L. Liu and J. W. Layland, “Scheduling Algorithms for Multiprogramming in a Hard-Real- Time Environment,” Journal of the ACM, vol. 20, no. 1, pp. 46–61, 1973.

[7] K. Asanovi et al., “The RISC-V Instruction Set Manual, Volume I: Unprivileged Architec- ture,” RISC-V International, 2021.

[8] A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv:1704.04861, 2017.

[9] P. Warden, “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition,” arXiv:1804.03209, 2018.

[10] D. Anguita, A. Ghoniem, L. Oneto, and X. Parra, “A Public Domain Dataset for Human Activity Recognition Using Smartphones,” in Proc. European Symposium on Artificial Neural Networks, 2013.

[11] B. Jacob et al., “Quantization and Training of Neural Networks for Efficient Integer- Arithmetic-Only Inference,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[12] T. Chen et al., “TVM: An Automated End-to-End Optimizing Compiler for Deep Learning,” in Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018.

[13] Y. Li, B. Zhang, Z. Zhang, N. Suri, and W. Shi, “A Survey of Edge Intelligence: Architectures, Applications, and Optimization Strategies,” ACM Computing Surveys, vol. 54, no. 5, pp. 1–36, 2021.

[14] A. Gonzlez, F. Botta, and J. Abella, “Evaluation of the RISC-V Vector Extension for Em- bedded AI Workloads,” in Proc. International Conference on Embedded Computer Systems, 2021.

[15] S. Mittal, “A Survey of FPGA-Based Accelerators for Convolutional Neural Networks,” Neural Computing and Applications, vol. 32, pp. 1109–1139, 2020.

[16] T.-Y. Lin, P. Dollr, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[17] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.

[18] N. P. Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” in Proc. International Symposium on Computer Architecture (ISCA), 2017.

[19] A. Vaswani et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.

Downloads

Published

2026-05-16

How to Cite

Ethan J Walker, & Hiroshi Tanaka. (2026). Performance Optimization of RISC-V Based Embedded Systems for Real-Time AI Applications. Computational Intelligence Systems, 4(1). https://doi.org/10.66280/cis.v1i1.113