A Distributed Computing Framework for Efficient Resource Scheduling in Large-Scale Systems

Elias Thorne; Sarah J. Vance; Marcus L. Halloway

doi:10.66280/cset.v1i1.89

Authors

Elias Thorne Department of Electrical Engineering and Computer Science, South Dakota School of Mines and Technology
Sarah J. Vance School of Engineering, University of North Florida
Marcus L. Halloway Department of Computer Science, Wichita State University

DOI:

https://doi.org/10.66280/cset.v1i1.89

Keywords:

Distributed Computing, Resource Scheduling, Large-Scale Systems, Systems Architecture, Algorithmic Governance, Infrastructure Sustainability, Socio-Technical Systems.

Abstract

The rapid expansion of hyperscale data centers and the proliferation of edge-to-cloud continuums have necessitated a paradigm shift in resource scheduling methodologies. Traditional centralized scheduling architectures are increasingly susceptible to bottlenecking, single points of failure, and excessive latency when managing millions of concurrent tasks across geographically dispersed nodes. This paper proposes and analyzes a decentralized, distributed computing framework specifically engineered for high-efficiency resource allocation in large-scale systems. We move beyond the narrow focus of algorithmic complexity to provide a comprehensive socio-technical evaluation of scheduling infrastructures. The discussion emphasizes the critical structural trade-offs between global optimality and local responsiveness, the role of hierarchical governance in multi-tenant environments, and the physical requirements of deploying resilient scheduling agents. Furthermore, we examine the environmental sustainability of large-scale compute orchestration, the ethical imperatives of fairness in resource distribution among heterogeneous workloads, and the broader policy implications of autonomous scheduling in critical national infrastructures. By synthesizing perspectives from systems engineering, distributed systems theory, and institutional policy, this work provides a thorough conceptual roadmap for the next generation of scalable scheduling frameworks. We conclude that efficient resource management in the modern era is fundamentally a problem of balancing systemic robustness with adaptive decentralization, requiring a holistic integration of technical precision and governance accountability to ensure the long-term viability of global computing systems.

References

1.Abbas, A., et al. (2020). A survey on resource management in fog computing: Concepts, challenges, and future directions. IEEE Communications Surveys & Tutorials, 22(3), 1541-1571.

2.Armbrust, M., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50-58.

3.Barham, P., et al. (2003). Xen and the art of virtualization. ACM SIGOPS Operating Systems Review, 37(5), 164-177.

4.Bessis, N., et al. (2012). Big Data and Computational Intelligence in Networking. IGI Global.

5.Burns, B., et al. (2016). Borg, Omega, and Kubernetes. Communications of the ACM, 59(5), 50-57.

6.Buyya, R., et al. (2010). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599-616.

7.Chen, Y., et al. (2019). Energy-efficient resource management in cloud computing: A survey. Journal of Systems and Software, 151, 1-22.

8.Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

9.Foster, I., et al. (2008). Cloud computing and grid computing 360-degree compared. 2008 Grid Computing Environments Workshop.

10.Gantz, J., & Reinsel, D. (2012). The digital universe in 2020: Big data, bigger digital footprints, and the biggest growth in the far east. IDC iView: IDC Analyze the Future, 2007, 1-16.

11.Ghodsi, A., et al. (2011). Dominant Resource Fairness: Fair allocation of multiple resource types. NSDI '11: 8th USENIX Conference on Networked Systems Design and Implementation.

12.Hellerstein, J. L., et al. (2004). Feedback Control of Computing Systems. John Wiley & Sons.

13.Hindman, A., et al. (2011). Mesos: A platform for fine-grained resource sharing in the data center. NSDI '11: 8th USENIX Conference on Networked Systems Design and Implementation.

14.Isard, M., et al. (2009). Quincy: Fair scheduling for distributed computing clusters. SOSP '09: 22nd ACM Symposium on Operating Systems Principles.

15.Jennings, B., & Stadler, R. (2014). Resource management in clouds: Survey and research challenges. Journal of Network and Systems Management, 23(3), 567-619.

16.Katz, R. H. (2009). The information technology infrastructure for the 21st century. Communications of the ACM, 52(4), 11-13.

17.Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), 558-565.

18.Manvi, S. S., & Shyam, G. K. (2014). Resource management with a focus on virtualization in cloud computing: A survey. Journal of Network and Computer Applications, 38, 1-16.

19.Mell, P., & Grance, T. (2011). The NIST definition of cloud computing. NIST Special Publication, 800-145.

20.Ousterhout, J., et al. (2013). Sparrow: Distributed, low latency scheduling. SOSP '13: 24th ACM Symposium on Operating Systems Principles.

21.Pahl, C. (2015). Containerization and the PaaS cloud. IEEE Cloud Computing, 2(3), 24-31.

22.Reiss, C., et al. (2012). Heterogeneity and dynamicity in a Google enterprise cloud. SoCC '12: 3rd ACM Symposium on Cloud Computing.

23.Schwarzkopf, M., et al. (2013). Omega: Flexible, scalable schedulers for large compute clusters. EuroSys '13: 8th European Conference on Computer Systems.

24.Tanenbaum, A. S., & Van Steen, M. (2007). Distributed Systems: Principles and Paradigms. Prentice Hall.

25.Varghese, B., & Buyya, R. (2018). Next generation cloud computing: New trends and research directions. Future Generation Computer Systems, 79, 849-861.

26.Verma, A., et al. (2015). Large-scale cluster management at Google with Borg. EuroSys '15: 10th European Conference on Computer Systems.

27.Zaharia, M., et al. (2010). Spark: Cluster computing with working sets. HotCloud '10: 2nd USENIX Workshop on Hot Topics in Cloud Computing.

28.Zaharia, M., et al. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. NSDI '12: 9th USENIX Conference on Networked Systems Design and Implementation.

A Distributed Computing Framework for Efficient Resource Scheduling in Large-Scale Systems

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Current Issue

Information

Indexing & Infrastructure