Fin-LLM-Inference: A High-Throughput Distributed System for Real-Time Financial Time Series Forecasting via Heterogeneous LLM-Augmented Reasoning Pipelines

Brandon Prescott; Christopher Sinclair

Authors

Brandon Prescott Department of Electrical Engineering and Computer Science, University of New Mexico
Christopher Sinclair Department of Management Information Systems, University of Delaware

Keywords:

Distributed Systems, Financial Time Series, Large Language Models, Heterogeneous Computing, High-Throughput Inference, Socio-Technical Infrastructure, Algorithmic Governance

Abstract

The integration of Large Language Models (LLMs) into financial time series forecasting represents a transformative shift from purely frequentist econometric models to context-aware reasoning systems. However, the high-throughput requirements of modern capital markets create a significant tension with the computational latency inherent in transformer-based architectures. This paper introduces Fin-LLM-Inference, a high-throughput distributed system designed for real-time financial forecasting using heterogeneous LLM-augmented reasoning pipelines. We propose a multi-tiered architecture that strategically partitions reasoning tasks between optimized edge-based distilled models and robust cloud-based reasoning engines. By aligning hardware-aware optimizations with the unique non-stationarity of financial data, the system achieves a balance between predictive depth and execution speed. Our analysis focuses on the system-level trade-offs involving inference latency, model consistency, and architectural robustness. Furthermore, we examine the socio-technical implications of deploying such systems, including algorithmic governance, environmental sustainability in massive-scale AI clusters, and the policy challenges associated with automated financial decision-making. We argue that the future of financial intelligence lies in the seamless coordination of heterogeneous compute resources that can interpret both microstructure signals and macroeconomic narratives. This research provides a comprehensive blueprint for the next generation of resilient, scalable, and fair financial AI infrastructure, concluding with a forward-looking discussion on the regulatory landscape for autonomous financial agents.

References

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308-318.

Acharya, V. V., & Richardson, M. (2009). Causes of the financial crisis. Critical Review, 21(2-3), 195-210.

Arumugam, R., & Bhargavi, R. (2019). A survey on modern trainable systems for time series forecasting. IEEE Access, 7, 70113-70135.

Bommasani, R., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

Brown, T., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.

Cartea, A., Jaimungal, S., & Penalva, J. (2015). Algorithmic and High-Frequency Trading. Cambridge University Press.

Chen, L., & Zheng, Z. (2023). LLM-augmented financial analysis: Challenges and opportunities. Journal of Financial Data Science, 5(4), 12-28.

Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 987-1007.

Fu, L., Chen, X., Gao, K., Huang, X., & Tong, K. (2025, October). Memory-Augmented Knowledge Fusion with Safety-Aware Decoding for Domain-Adaptive Question Answering. In 2025 6th International Conference on Machine Learning and Computer Application (ICMLCA) (pp. 1-6). IEEE.

Ghoshal, B., & Tucker, A. (2022). Scalable inference for deep learning in finance. Quantitative Finance, 22(10), 1845-1860.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

Goyal, N., et al. (2023). High-throughput inference for large language models: A systems perspective. ACM SIGOPS Operating Systems Review, 57(1), 45-56.

Hendershott, T., Jones, C. M., & Menkveld, A. J. (2011). Does algorithmic trading improve liquidity? The Journal of Finance, 66(1), 1-33.

Kaplan, J., et al. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.

Kirilenko, A. S., et al. (2017). The Flash Crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3), 967-998.

Lo, A. W. (2017). Adaptive Markets: Financial Evolution at the Speed of Thought. Princeton University Press.

Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.

Liu, T. (2026). PCA-APT Stress Index for Market Drawdowns.

Narayanan, D., et al. (2019). PipeDream: Generalized pipeline parallelism for DNN training. Proceedings of the 27th ACM Symposium on Operating Systems Principles.

O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown.

Pasquale, F. (2015). The Black Box Society: The Secret Algorithms That Control Money and Information. Harvard University Press.

Rajbhandari, S., et al. (2020). ZeRO: Memory optimizations toward training trillion parameter models. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

Shalf, J. (2020). The future of computing beyond Moore’s Law. Philosophical Transactions of the Royal Society A, 378(2166).

Stoica, I., et al. (2017). Ray: A distributed framework for emerging AI applications. 13th USENIX Symposium on Operating Systems Design and Implementation.

Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

Wu, S., et al. (2023). BloombergGPT: A large language model for finance. arXiv preprint arXiv:2303.17564.

Zaharia, M., et al. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. 9th USENIX Symposium on Networked Systems Design and Implementation.

Zhang, L., et al. (2021). Deep reinforcement learning for automated stock trading: An ensemble strategy. SSRN Electronic Journal.

Zhou, Y., et al. (2022). Mixture-of-experts with exponential selection. arXiv preprint arXiv:2202.08906.

Mo, F., Haddadi, H., Katiyar, K., Ansari, R., & Chuah, C. N. (2021). PPFL: Privacy-preserving federated learning with trusted execution environments. Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, 94-108.

Wang, J., et al. (2021). A field guide to federated optimization. arXiv preprint arXiv:2107.06917.

Rothchild, D., et al. (2020). FetchSGD: Communication-efficient federated learning with sketching. Proceedings of the 37th International Conference on Machine Learning.

Kairouz, P., et al. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1-2), 1-210.

Fin-LLM-Inference: A High-Throughput Distributed System for Real-Time Financial Time Series Forecasting via Heterogeneous LLM-Augmented Reasoning Pipelines

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission

Journal Information

Indexing & Infrastructure