Bio inspired multi agent system for distributed power and interference management in MIMO OFDM networks

Contents

Key assumptions System model Objective function Termite colony optimization (TCO)Pheromone update mechanism Probability-based decision-making for power allocation Adaptive pheromone feedback mechanism with deep learning Multi-objective optimization with multiple pheromone Role of weights in the multi-objective formulation Limitations in practical scenarios

The proposed novel bio-inspired MAS for distributed power allocation and interference management in massive MIMO–OFDM networks is developed to improve the scalability, adaptability, and efficiency needs of next-generation wireless networks. The proposed model incorporates TCO which is a swarm intelligence-based approach modeled based on the behavior of termite colonies. The optimization allows the agents interact locally to achieve global optimization objectives. In the proposed work, each base station (BS) in the network is considered as an independent agent that optimizes resource allocation by coordinating with neighboring BSs to manage interference and meet Quality of Service (QoS) requirements. Unlike conventional centralized optimization approaches which exhibit computational and communication overhead in large-scale environments the proposed MAS with TCO provides distributed and adaptive decision-making.

The proposed bio-inspired MAS is designed for distributed power allocation and interference management in massive MIMO–OFDM networks. TCO-MAS addresses scalability, adaptability, and energy efficiency challenges in MIMO–OFDM networks using bio-inspired, distributed optimization. The detailed process flow ensures distributed decision-making while maintaining coordination across BSs to optimize resource allocation.

Key assumptions

The LSTM model assumes sufficient historical channel state information (CSI) data to effectively learn patterns and provide accurate predictions. This requires a robust and uninterrupted data acquisition mechanism to avoid inaccuracies caused by sudden environmental changes or incomplete datasets.
While the proposed distributed framework reduces dependency on centralized control, scalability is contingent on the efficient distribution of computational tasks across BSs. The pheromone update mechanism and probabilistic decision-making are designed to minimize communication overhead, but the scalability might be constrained by network size and heterogeneity in BS capabilities.
The implementation assumes the availability of adequate computational resources at each BS to handle LSTM predictions in real-time. For large-scale networks with ultra-dense users, the computational overhead associated with frequent LSTM training and prediction updates could challenge real-time performance.

The inclusion of LSTM enhances the adaptive capabilities of TCO by predicting upcoming CSI, allowing BSs to pre-emptively adjust pheromone levels. This integration aims to improve convergence speed and allocation effectiveness under dynamic network conditions. Each BS operates as an autonomous agent, leveraging local interactions to optimize power allocation strategies. The LSTM model updates CSI predictions iteratively, enabling real-time responsiveness to the changes in the channel conditions.

The major reason for incorporating TCO in the proposed work is to utilize its unique feature characteristics for distributed optimization tasks in dynamic environments. Pheromone trails used in TCO is suitable to replicate the resource allocation in MIMO–OFDM system. Unlike algorithms such as Genetic Algorithms (GA), Particle Swarm Optimization (PSO), TCO can handle localized decision-making which makes it more suitable for scenarios where BSs need to operate semi-independently considering nearby BSs resource demands. Moreover, the TCO decentralized design supports scalability as each BS updates its pheromone trails based on local interactions which further reduces the need for centralized coordination. This allows the system and BS to adapt the network changes. Additionally, TCO pheromone-based probabilistic model provides more flexibility to explore diverse resource allocation strategies which balances the trade-offs between throughput, power efficiency, and latency effectively. The novelty of this proposed model is the multi-objective optimization approach which incorporates multiple pheromone types to attain the objectives like throughput maximization, power minimization, and latency reduction. The proposed Multi objective design integrated with an adaptive pheromone feedback mechanism which is enhanced by a deep learning model allows the MAS to predict upcoming network states and adjust pheromone levels.

The complete overview of proposed model is presented in Fig. 1. The process flow starts with BS which initialize its pheromone trails for each objective. At each iteration, BSs calculate utility values based on current channel states, interference levels, and power consumption, updating their pheromone trails based on both past and predicted network states. The deep learning LSTM network in each BS predicts the channel state information which allows BSs to expect the changes and update pheromone trails in advance. The adaptive pheromone feedback obtained by prediction model enhances the decision-making accuracy and reduces response time. Also, it enables each BS to maintain optimal power allocation even in dynamic conditions. The integration of predictive deep learning model within the pheromone feedback loop is a novel approach which enhances the MAS ability to adapt to future network states. When MAS is combined with the distributed nature of TCO and multi-objective optimization the complete system will be an effective framework for large-scale, dynamic wireless networks.

System model

The System Model used in the proposed work is designed as a distributed network with BSs equipped with multiple antennas to serve multiple users. The objective of the system is to provide effective power allocation, better interference management, and overall resource efficiency in a massive MIMO–OFDM network. For this a massive MIMO–OFDM network is considered. The network includes $N$ BSs each with $M$ antennas and $K$ users who are distributed across the network area. Each antenna receiving signals from the nearest BS. Each BS acts as an agent to optimize power allocation, manage interference, and adapt based on neighbor agents’ states. Each BS thus operates both independently and with limited, local communication with neighboring BSs.

The channel model used in the proposed model network is Rayleigh fading in which each BS-to-user connection is represented by a complex channel vector that captures the propagation effects between each BS antenna and each user. Let $h_{i,k} \in C^{M \times 1}$ indicates the channel vector between BS $i$ and user $k$. Each entry in $h_{i,k}$ represents the channel gain from an antenna of BS $i$ to user $k$. This channel vector is assumed to follow an independent, identically distributed Rayleigh fading model which is mathematically formulated as follows.

$$h_{{i,k}} = \left[ {h_{{i,k}}^{{\left( 1 \right)}} ,h_{{i,k}}^{{\left( 2 \right)}} , \ldots ,h_{{i,k}}^{{\left( M \right)}} } \right]^{T}$$

(1)

where $h_{i,k}^{\left( m \right)}$ indicates the complex channel gain from antenna $m$ at BS $i$ to user $k$. Each BS $i$ has a power allocation matrix $P_{i} \in C^{M \times M}$ which defines the transmit power distributed across its $M$ antennas. This matrix ensures that the BS efficiently allocates power to meet user demands while minimizing interference for neighboring BSs. Mathematically the power allocation matrix is described as

$$P_{i} = {\text{diag}}\left( {p_{i}^{\left( 1 \right)} ,p_{i}^{\left( 2 \right)} , \ldots ,p_{i}^{\left( M \right)} } \right)$$

(2)

where $p_{i}^{\left( m \right)}$ indicates the power allocated by BS $i$ to its $m^{{{\text{th}}}}$ antenna. $P_{i}$ indicates the diagonal matrix to simplify power distribution without inter-antenna interference. The signal received by user $k$ from BS $i$ is subject to noise and interference from other BSs. The signal-to-interference-plus-noise ratio (SINR) for user $k$ connected to BS $i$ is defined as follows

$${\text{SINR}}_{i,k} = \frac{{h_{i,k}^{H} P_{i} h_{i,k} }}{{{\upsigma }^{2} + I_{i,k} }}$$

(3)

where $h_{i,k}^{H} P_{i} h_{i,k}$ indicates the effective signal power received by user $k$ from BS $i$, ${\upsigma }^{2}$ indicates the noise power level and in the proposed it is typically modeled as additive white Gaussian noise (AWGN). $I_{i,k}$ indicates the interference power from all other BSs transmitting to user $k$ which is formulated as

$$I_{i,k} = \mathop \sum \limits_{j \ne i} h_{j,k}^{H} P_{j} h_{j,k}$$

(4)

where ${\text{SINR}}_{i,k}$ indicates the Signal-to-Interference-plus-Noise Ratio for user $k$ served by BS $i$, ${\upsigma }^{2}$ indicates the noise power, $I_{i,k}$ indicates the interference power from all BSs $\left( {j \ne i} \right)$ to user $k$. The sum rate which represents the total achievable data rate in the network. The data rate for user $k$ served by BS $i$ is given by the Shannon capacity formula which is given as follows.

$$R_{i,k} = \log_{2} \left( {1 + {\text{SINR}}_{i,k} } \right)$$

(5)

Thus, the total sum rate for BS $\left( i \right)$ covering all the users $k \in \left\{ {1, \ldots ,K_{i} } \right\}$, is formulated as

$$R_{i} = \mathop \sum \limits_{k = 1}^{{K_{i} }} \log_{2} \left( {1 + \frac{{h_{i,k}^{H} P_{i} h_{i,k} }}{{{\upsigma }^{2} + I_{i,k} }}} \right)$$

(6)

where $K_{i}$ represents the number of users served by BS $i$, $R_{i,k}$ Data rate for user $k$ connected to BS $i$, $R_{i}$ indicates the total data rate for all users served by BS $i$ and $K_{i}$ indicates the number of users connected to BS $i$. In addition to sum rate maximization, each BS must minimize its total transmit power to improve energy efficiency. This total power constraint is mathematically formulated as

$$P_{i} = {\text{Tr}}\left( {P_{i} } \right) = \mathop \sum \limits_{m = 1}^{M} p_{i}^{\left( m \right)}$$

(7)

where $P_{i}$ indicates the total transmit power for BS $i$, $p_{i}^{\left( m \right)}$ indicates the power allocated to the $m^{{{\text{th}}}}$ antenna of BS $i$.

Objective function

To optimize the trade-off between maximizing the sum rate and minimizing the total power for each BS, the objective function is defined as:

$$\mathop {\max }\limits_{{P_{i} }} \left( {\mathop \sum \limits_{k = 1}^{{K_{i} }} \log_{2} \left( {1 + \frac{{h_{i,k}^{H} P_{i} h_{i,k} }}{{\sigma^{2} + I_{i,k} }}} \right) – \alpha \,P_{i} } \right)$$

(8)

where ${\upalpha }$ indicates the weighting factor which is used to balance data rate maximization with power minimization. Equation (6) is the sum rate of all users served by BS and $P_{i} = {\text{Tr}}\left( {P_{i} } \right)$ is the total power constraint.

The final form of the objective function for each BS $i$ is formulated as

$$\mathop {\max }\limits_{{P_{i} }} \left( {R_{i} – \alpha \,P_{i} } \right) = \mathop {\max }\limits_{{P_{i} }} \left( {\mathop \sum \limits_{k = 1}^{{K_{i} }} \log_{2} \left( {1 + \frac{{h_{i,k}^{H} P_{i} h_{i,k} }}{{\sigma^{2} + I_{i,k} }}} \right) – \alpha \,Tr\left( {P_{i} } \right)} \right)$$

(9)

where $R_{i}$ indicates the Sum-rate for all users served by BS $i$, ${\upalpha }\,P_{i}$ indicates the power minimization term with weight factor ${\upalpha }$ and total power $P_{i}$. Each BS in the system optimizes its power allocation $P_{i}$ across antennas to maximize the data rate for its users while minimizing total power usage. The system can prioritize the optimization by adjusting the weight factor ${\upalpha }$. The local interaction between neighbor BS provides better adaptability and interference management for the system.

Termite colony optimization (TCO)

In the proposed work, TCO is used to optimize the power allocation. TCO was chosen for its simplicity, scalability, and decentralized decision-making, which reduce communication overhead and computational complexity. Unlike deep reinforcement learning (DRL), TCO uses pheromone-based feedback for probabilistic decision-making, eliminating the need for extensive training or predefined reward functions. TCO’s lightweight mechanisms make it ideal for resource-constrained environments, whereas DRL often requires significant computational resources and centralized coordination. Additionally, TCO’s decentralized nature enhances scalability, making it more adaptable to ultra-dense MIMO–OFDM networks where DRL-based approaches may face bottlenecks.

While both TCO and Particle Swarm Optimization (PSO) employ swarm intelligence principles, their mechanisms differ fundamentally. TCO is inspired by termite behavior and relies on pheromone trails as feedback for guiding decisions, making it a probabilistic process. In contrast, PSO mimics bird flocking or fish schooling, where particles deterministically adjust their positions and velocities based on individual and social experiences. In TCO, decisions are influenced by pheromone intensity and evaporation rates, while in PSO, they are driven by fixed mathematical rules combining local best, global best, and inertia factors.

TCO’s decision-making is governed by pheromone updates, defined as:

$$\tau_{ij} \left( {t + 1} \right) = \left( {1 – \rho } \right)\tau_{ij} \left( t \right) + \Delta \tau_{ij} \left( t \right)$$

(10)

In (10), $\rho$ is the evaporation rate, controlling the decay of past information, and $\Delta \tau_{ij}$ $\left( t \right)$ is the reinforcement term based on allocation utility. In contrast, PSO employs velocity and position updates:

$$v_{i} \left( {t + 1} \right) = \omega v_{i} \left( t \right) + c_{1} r_{1} \left( {p_{i} – x_{i} \left( t \right)} \right) + c_{2} r_{2} \left( {g – x_{i} \left( t \right)} \right)$$

(11)

and

$$x_{i} \left( {t + 1} \right) = x_{i} \left( t \right) + v_{i} \left( {t + 1} \right)$$

(12)

where $\omega$ is inertia, $c_{1}$ and $c_{2}$ are cognitive and social coefficients, and $r_{1}$ and $r_{2}$ are random factors. These equations highlight that TCO adapts through probabilistic pheromone reinforcement, whereas PSO deterministically balances exploration and exploitation via velocity adjustments.

TCO’s pheromone levels encode the quality of previous resource allocation strategies, influencing decision probabilities. In PSO, velocity represents the particle’s movement in the search space, directly determining its position updates. The pheromone evaporation rate in TCO controls the decay of outdated information, ensuring adaptability to dynamic conditions. In PSO, inertia weight plays a similar role, balancing exploration and exploitation by controlling the influence of past velocities. However, TCO relies on local pheromone interactions for decentralized optimization, whereas PSO requires a global best solution to guide particles.

Both TCO and PSO aim to balance exploration and exploitation but achieve this through distinct mechanisms. PSO relies on deterministic rules with defined parameters (e.g., $\omega , c_{1}$ and $c_{2}$), while TCO uses dynamic, adaptive pheromone mechanisms. Additionally, TCO is inherently decentralized and scalable due to its reliance on local pheromone trails, making it suitable for distributed systems like MIMO–OFDM. PSO’s reliance on a global best introduces scalability challenges in large, distributed environments, but it typically converges faster in centralized scenarios. These distinctions position TCO as more flexible for dynamic, distributed optimization tasks.

The base stations in the network is considered as an independent agent which is in the optimization model. The agent optimizes the power allocations considering the feedback from its environment. Also, the agent utilizes the pheromone trails to optimize the resource allocation. This swarm intelligent approach is used in the proposed system to handle the power allocation and interference Considering each BS maintains a pheromone level ${\uptau }_{i,k}$ which is associated with each user $k$ it serves. This reflects the effectiveness of past power allocations for meeting the desired Quality of Service (QoS) targets for that user. Each BS adjusts its power allocation dynamically by strengthening or weakening these pheromone trails to adapt dynamic network conditions. Thus, optimization provides better balance between exploring new allocation strategies and exploiting successful past allocations.

Pheromone update mechanism

The pheromone update process in the optimization model has two major factors such as evaporation and reinforcement. The evaporation prevents the pheromone trails from becoming overly reliant on past allocations and allows the agents to adapt new conditions. While the reinforcement strengthens pheromone trails for power allocations to maximize the data rate and minimize interference. For each BS $i$ and user $k$ the pheromone level ${\uptau }_{i,k}$ is iteratively updated. The pheromone update rule is mathematically formulated as

$$\tau _{{i,k}} \left( {t + 1} \right) = \left( {1 – \rho } \right)\tau _{{i,k}} \left( t \right) + \Delta \tau _{{i,k}}$$

(13)

where $\rho$ indicates the pheromone evaporation rate and its range is [0,1]. This pheromone evaporation rate defines how quickly the influence of past pheromone levels diminishes. $\Delta \tau_{i,k}$ indicates the incremental pheromone reinforcement which is calculated based on the utility of the power allocation strategy. The evaporation term $\left( {1 – \rho } \right)\tau_{i,k} \left( t \right)$ prevents pheromone saturation and allows new strategies to be explored over time. The reinforcement term $\Delta \tau_{i,k}$ improves network performance and provides future allocation decisions. The pheromone reinforcement term $\Delta \tau_{i,k}$ is based on the utility of the current power allocation strategy for user $k$ at BS $i$. The utility function considers both the data rate achieved for the user and the power efficiency of the allocation. Mathematically utility function is formulated as

$$U_{i,k} = \log_{2} \left( {1 + \frac{{h_{i,k}^{H} P_{i} h_{i,k} }}{{\sigma^{2} + I_{i,k} }}} \right) – \alpha \,Tr\left( {P_{i} } \right)$$

(14)

where $U_{i,k}$ indicates the utility for the power allocation strategy for user $k$ at BS $i$ which balancing the data rate and power usage. ${\upalpha }$ indicates the weight factor that allows control over the trade-off between data rate and power efficiency. $\log_{2} \left( {1 + \frac{{h_{i,k}^{H} P_{i} h_{i,k} }}{{{\upsigma }^{2} + I_{i,k} }}} \right)$ indicates the data rate achieved for user $k$ and ${\text{Tr}}\left( {P_{i} } \right)$ indicates the total power allocated by BS $i$. The utility $U_{i,k}$ reflects the success of the current allocation strategy and based on this utility the pheromone reinforcement $\Delta \tau_{i,k}$ is formulated as follows.

$$\Delta \tau _{{i,k}} = \gamma \cdot U_{{i,k}}$$

(15)

where ${\upgamma }$ indicates the scaling factor that determines the sensitivity of pheromone reinforcement to the utility value. $U_{i,k}$ indicates the utility of the power allocation strategy, ${\upgamma }$ indicates the scaling factor for pheromone reinforcement based on utility.

In the proposed Termite Colony Optimization-based Multi-Agent System (TCO-MAS), pheromone-based feedback mechanisms emulate the natural behavior of termites for efficient decision-making in distributed environments. These mechanisms enable agents to leave virtual pheromone trails that represent the quality of explored solutions, such as optimal power levels or interference-free channels. Subsequent agents are guided by the intensity of these pheromone trails, facilitating adaptive learning and dynamic optimization. This feedback loop ensures that the system converges toward optimal resource allocation while effectively responding to changes in network conditions.

Probability-based decision-making for power allocation

Once pheromone levels $\tau_{i,k}$ are updated each BS uses these levels to probabilistically decide on power allocation strategies. A higher pheromone level increases the likelihood of selecting that particular allocation and allows BSs to balance between exploiting high-performing allocations and exploring alternative strategies. The probability $P\left( {A_{i} = a|\tau_{i,k} } \right)$ of choosing a particular power allocation $A_{i} = a$ for user $k$ at BS $i{ }$ is mathematically formulated based on²⁵ as

$$P\left( {A_{i} = a{|}\tau_{i,k} } \right) = \frac{{\tau_{i,k}^{a} }}{{\mathop \sum \nolimits_{b} \tau_{i,k}^{b} }}$$

(16)

where $P\left( {A_{i} = a|\tau_{i,k} } \right)$ indicates the probability of selecting power allocation $a$ for user $k$ by BS $i$. The specific power allocation strategy is indicated as $a$ and the index for summing overall possible allocation are indicated as $b$. $\tau_{i,k}^{a}$ indicates the pheromone level associated with allocation $a$ for user $k$. $\mathop \sum \limits_{b} \tau_{i,k}^{b}$ indicates the normalization factor to ensure that the probabilities across all allocations sum to 1. The probabilistic approach in TCO-MAS assigns a selection probability to each power allocation option based on its pheromone level, encouraging the selection of successful allocations while maintaining flexibility to explore less-utilized options. In TCO-MAS, probabilistic methods determine the likelihood of selecting a particular power allocation strategy. The probability $P_{ij}$ of selecting option $j$ by agent $i$ is calculated using the normalized pheromone levels:

$$P_{ij} = \frac{{\tau_{ij}^{\alpha } }}{{\mathop \sum \nolimits_{k\epsilon N} \tau_{ik}^{\alpha } }}$$

(17)

where $\tau_{ij}$ is the pheromone level for option $j$, $\alpha$ controls the influence of pheromone intensity, and N represents all available options. Higher $\tau_{ij}$ values increase $P_{ij}$, ensuring that successful strategies are favored in subsequent iterations. Simultaneously, the probabilistic scheme maintains exploration by allowing all options, even those with lower pheromone levels, to have a non-zero probability of selection.

This method provides value by balancing exploration and exploitation, ensuring that the optimization process does not prematurely converge on suboptimal solutions. By dynamically updating pheromone levels based on reinforcement from fitness evaluations, TCO-MAS adapts to changing network conditions, improving resource allocation efficiency and fairness.

Adaptive pheromone feedback mechanism with deep learning

The Adaptive Pheromone Feedback Mechanism with Machine Learning in the TCO system utilizes predictive capabilities to further enhance the adaptability of each agent i.e., BS in a massive MIMO–OFDM network. The proposed model incorporates deep learning model LSTM so that the BS can predict the upcoming changes in the network and adjusts its pheromone levels. The combined approach enhances the convergence speed and effectiveness of resource allocation under dynamic conditions. The adaptive pheromone feedback procedure in the distributed optimization allows the system to utilize the pheromone trails as probabilistic guides to influence in resource allocation decisions of BS. This feedback based on the LSTM predictions enables the system to reduce latency in adjustments and allows to enhance the overall QoS.

In order to adjust the pheromone levels each BS incorporates an LSTM network to predict the channel state information for each user. The prediction model allows the BS to predict the changes in channel state information which directly affects the signal quality and interference patterns each user. Let $CSI_{i,k} \left( t \right)$ represent the channel state for user $k$ connected to BS $i$ at time $t$. The LSTM network predicts the upcoming CSI state $\widehat{CSI}_{i,k} \left( {t + 1} \right)$ by observing the changes in channel state information. Mathematically it is formulated as follows.

$$\widehat{CSI}_{i,k} \left( {t + 1} \right) = {\text{LSTM}}\left( {CSI_{i,k} \left( t \right),CSI_{i,k} \left( {t – 1} \right), \ldots ,CSI_{i,k} \left( {t – n + 1} \right)} \right)$$

(18)

where $\widehat{CSI}_{i,k} \left( {t + 1} \right)$ indicates the predicted channel state information for user $k$ at time $t + 1$ which is used necessary adjustments, $CSI_{i,k} \left( t \right),CSI_{i,k} \left( {t – 1} \right), \ldots ,CSI_{i,k} \left( {t – n + 1} \right))$ indicates the sequence of past channel state information observations for user $k$ which is fed as input to the LSTM network. The number of past observations is indicated as $n$. This prediction allows each BS to handle variations in channel quality for each user and adjust resource allocation accordingly.

Based on the predicted network state $\widehat{CSI}_{i,k} \left( {t + 1} \right)$ each BS adjusts its pheromone levels for power allocation to enhance decision making. The adaptive pheromone update procedure incorporating the feedback from the predicted CSI is mathematically described as follows.

$$\tau_{i,k} \left( {t + 1} \right) = \left( {1 – \rho } \right)\tau_{i,k} \left( t \right) + \beta \cdot \Delta \tau_{i,k} \cdot f\left( {\widehat{CSI}_{i,k} \left( {t + 1} \right)} \right)$$

(19)

where $\tau_{i,k} \left( {t + 1} \right)$ indicates the updated pheromone level for BS $i$ serving user $k$ at time $t + 1$, ${\uprho }$ indicates the pheromone evaporation rate which reduces the effect of past pheromone levels to allow new conditions to guide future allocations. $\Delta \tau_{i,k}$ indicates the standard pheromone reinforcement term which is based on utility from recent power allocations. $\beta$ indicates the sensitivity parameter that adjusts the influence of the predicted CSI on the pheromone update. The function $f\left( {\widehat{CSI}_{i,k} \left( {t + 1} \right)} \right)$ indicates the predicted CSI into a scaling factor influencing the reinforcement based on anticipated network conditions. The function $f\left( {\widehat{CSI}_{i,k} \left( {t + 1} \right)} \right)$ adjusts pheromone reinforcement by associating better or worse network states with higher or lower reinforcement. Based on this each BS adapt to future CSI and provides suitable power allocations under predicted conditions.

The adaptive function $f\left( {\widehat{CSI}_{i,k} \left( {t + 1} \right)} \right)$ is designed to interpret the predicted CSI in terms of reinforcement needs for the pheromone update. If $\widehat{CSI}_{i,k} \left( {t + 1} \right)$ predicts a high-quality channel, the function can increase reinforcement to support higher power allocations, improving data rate. If the prediction indicates fading CSI or increased interference the function can reduce reinforcement and guides the BS to consider power minimization strategies. The scaling function is simplified and expressed as an exponential function as follows

$$f\left( {\widehat{CSI}_{i,k} \left( {t + 1} \right)} \right) = \exp \left( { – \frac{1}{{\widehat{CSI}_{i,k} \left( {t + 1} \right)}}} \right)$$

(20)

The scaling function in this form decreases the influence of pheromone reinforcement as CSI degrades and allows adaptive responses to poorer channel conditions by lowering the likelihood of high-power allocation for users with expected low channel quality. The adaptive pheromone feedback in the proposed work reduces the response time and enhances the overall QoS by predicting the future demands. The predictive feedback improves the convergence speed of TCO and the adaptive scaling allows the BS to provide better balance between data rate and power efficiency based on network requirements.

Multi-objective optimization with multiple pheromone

In the proposed work, the traditional termite colony optimization model is incorporated with multiple pheromone types to extend the single objective optimization to multi-objective optimization. This modification is performed to address the multiple network requirements of MIMO–OFDM network. Each BS in the network need to balance the objectives like data rate maximization, power consumption minimization and latency reduction. This leads to the necessity of multi objective optimization and to attain this, each BS $i$ maintains different pheromone trails which enables them to dynamically allocate resource based on the objectives. For data rate maximization, the pheromone trail is represented as $\tau_{i,k}^{{\left( {rate} \right)}}$ and it tracks successful power allocations for user $k$ that increases the data rate. For power minimization the pheromone trail is indicates as $\tau_{i,k}^{{\left( {power} \right)}}$ and it tracks allocations that effectively reduce power consumption. For latency reduction the pheromone trail is indicated using $\tau_{i,k}^{{\left( {latency} \right)}}$ and it supports allocations to minimize response times for user $k$. Considering this, each BS $i$ has multiple pheromone types for each user and allows distinct probabilistic guidance across different objectives.

Each pheromone trail $\left( {\tau_{i,k}^{\left( O \right)} } \right)$ is updated separately to indicate the utility of the allocation in meeting the specific goal associated with that trail. For each objective $\left( {o \in \left\{ {rate,power,latency} \right\}} \right)$ the pheromone update rule is formulated as

$$\tau_{i,k}^{\left( O \right)} \left( {t + 1} \right) = \left( {1 – \rho } \right)\tau_{i,k}^{\left( O \right)} \left( t \right) + \Delta \tau_{i,k}^{\left( O \right)}$$

(21)

where $\tau_{i,k}^{\left( O \right)} \left( {t + 1} \right)$ indicates the updated pheromone level for objective $O$ at the next time step. $\rho$ indicates the common evaporation rate across all objectives to maintain a balance between recent and older pheromone information. $\Delta \tau_{i,k}^{\left( O \right)}$ indicates the reinforcement term specific to the utility achieved for objective $O$ and it is calculated separately for each type of pheromone. This update allows each pheromone type to evolve independently based on the success of the resource allocation strategy in achieving each specific objective. Reinforcement for each $\Delta \tau_{i,k}^{\left( O \right)}$ is determined by the utility function associated with that objective. Further for each objective $O$, a utility function $U_{i,k}^{\left( O \right)}$ is defined to measure the effectiveness of the resource allocation in achieving that specific goal. For data rate maximization, utility $U_{i,k}^{{\left( {rate} \right)}}$ is defined based on the data rate achieved for user $k$ by BS $i$ and it is mathematically formulated as

$$U_{i,k}^{{\left( {{\text{rate}}} \right)}} = \log_{2} \left( {1 + \frac{{h_{i,k}^{H} P_{i} h_{i,k} }}{{{\upsigma }^{2} + I_{i,k} }}} \right)$$

(22)

For power minimization, utility $U_{i,k}^{{\left( {{\text{power}}} \right)}}$ is defined based on the inverse of the total power allocated by BS $i$ and it is mathematically formulated as

$$U_{i,k}^{{\left( {{\text{power}}} \right)}} = – {\text{Tr}}\left( {P_{i} } \right)$$

(23)

For latency reduction, utility $U_{i,k}^{{\left( {{\text{latency}}} \right)}}$ is defined by inversely proportional to the current latency experienced by user $k$ with the objective of minimizing response times. Each utility is used to calculate the reinforcement $\Delta \tau _{{i,k}}^{{\left( o \right)}}$ for the corresponding pheromone trail and it is mathematically formulated as

$$\Delta \tau _{{i,k}}^{{\left( O \right)}} = \gamma ^{{\left( O \right)}} \cdot U_{{i,k}}^{{\left( O \right)}}$$

(24)

where ${\upgamma }^{\left( O \right)}$ is a scaling factor specific to objective $o$, determining how strongly the utility for that objective influences pheromone reinforcement.

With multiple pheromone trails guiding decision-making each BS must integrate these pheromone levels to decide on power allocations to address all objectives simultaneously. The weighted probability of selecting a particular power allocation $A_{i} = a$ for user $k$ is determined by combining the pheromone levels of each objective weighted by the relative importance ${\upgamma }^{\left( O \right)}$ of each objective. Mathematically it is formulated as

$$P\left( {A_{i} = a|\tau _{{i,k}} } \right) = \frac{{\mathop \sum \nolimits_{o} \gamma ^{{\left( O \right)}} \left( {\tau _{{i,k}}^{{\left( O \right)}} } \right)^{a} }}{{\mathop \sum \nolimits_{o} \mathop \sum \nolimits_{b} \gamma ^{{\left( O \right)}} \left( {\tau _{{i,k}}^{{\left( O \right)}} } \right)^{b} }}$$

(25)

where $P\left( {A_{i} = a|\tau _{{i,k}} } \right)$ indicates the probability of selecting power allocation $a$ for user $k$ based on the combined pheromone information. ${\uptau }_{i,k}^{\left( O \right)}$ indicates the pheromone trail associated with objective $o$ for user $k$. ${\upgamma }^{\left( O \right)}$ indicates the weight assigned to objective $O$ representing its priority in decision-making. $b$ indicates the index for summing across all possible power allocations. This weighted decision process allows each BS to prioritize different objectives according to network conditions. By maintaining separate pheromone trails for objectives and adjusting each objective through dynamic weights the proposed model effectively adapts to the demands of a massive MIMO–OFDM environment. The summarized pseudocode for the proposed model is presented as follows.

Role of weights in the multi-objective formulation

In the proposed multi-objective framework, weights play a pivotal role in balancing conflicting objectives such as maximizing the sum rate, minimizing power consumption, and reducing latency. These weights determine the priority or relative importance of each objective, allowing the optimization algorithm to align resource allocation decisions with the specific requirements of the network scenario.

The multi-objective function used in the proposed Termite Colony Optimization-based Multi-Agent System (TCO-MAS) can be expressed as:

$$Objective Function:f\left( x \right) = \mathop \sum \limits_{i = 1}^{N} w_{i} f_{i} \left( x \right)$$

(26)

In Eq. (26), $f_{i} \left( x \right)$ represents the i-th objective (e.g., sum rate maximization, power minimization, or latency reduction, $w_{i}$ is the weight assigned to the i-th objective,$x$ represents the resource allocation decision variables and $N$ is the total number of objectives. The assignment of these weights depends on various factors, including network demands, energy constraints, and latency sensitivity. For instance, in ultra-dense urban environments where throughput is critical, a higher weight can be assigned to the sum rate objective. Similarly, scenarios requiring stringent energy efficiency may prioritize power minimization, while latency-sensitive applications like augmented reality or autonomous driving would emphasize latency reduction.

In the TCO-MAS methodology, pheromone trails corresponding to individual objectives $\tau_{rate} , \tau_{power} , and \tau_{latency}$ are weighted and combined during the decision-making process. The combined pheromone level is calculated as $\tau_{combined} = w_{rate} .\tau_{rate} + w_{power} .\tau_{power} + w_{latency} .\tau_{latency}$, where $w_{rate} , w_{power} , and w_{latency}$ are the weights for the sum rate, power minimization, and latency objectives, respectively. This weighted combination influences the probabilistic decision-making process, ensuring that the weights affect the likelihood of selecting resource allocation strategies aligned with the prioritized objectives. Furthermore, an adaptive mechanism is employed to dynamically tune these weights based on real-time feedback from the network. For example, if deteriorating latency metrics are observed, the weight for latency ($w_{latency}$) can be increased, prompting the model to prioritize strategies that improve latency.

In practice, the weights allow the framework to adapt to diverse network scenarios. For instance, in a balanced setup, equal weights ($w_{rate}$ = $w_{power}$ = $w_{latency} = 1/3$) would ensure a uniform emphasis on throughput, energy efficiency, and latency. However, in a bandwidth-constrained environment with a higher priority for data rates, weights could be adjusted to $w_{rate}$ = 0.6w, $w_{power}$ = 0.2w, and $w_{latency}$ = 0.2w, leading to decisions favoring throughput optimization. This adaptive and flexible approach to weight assignment ensures that the proposed TCO-MAS framework can dynamically respond to varying network conditions and objectives, making it suitable for high-demand, next-generation wireless networks.

Limitations in practical scenarios

Real-time LSTM predictions may impose significant computational demands, especially in ultra-dense networks or scenarios requiring frequent updates. This could delay optimization and impact system responsiveness. While the TCO model is inherently adaptive, its performance depends on finely tuned pheromone adjustment parameters. These may require manual or automated optimization for specific network conditions, which adds complexity. The effectiveness of distributed decision-making could be impacted by differences in computational capabilities and resource availability among BSs, particularly in heterogeneous networks.

To address these challenges, future iterations of the proposed model could incorporate lightweight versions of LSTM or alternative predictive models with lower computational requirements. Additionally, hybrid approaches combining TCO with centralized optimization for specific high-demand scenarios could balance real-time responsiveness with computational efficiency.

The TCO-MAS approach minimizes communication overhead by leveraging decentralized agent-based interactions and localized decision-making. Instead of requiring extensive global information exchange, each agent relies on locally available pheromone information to guide its resource allocation decisions. This reduces the need for frequent, network-wide communication, thereby conserving bandwidth and lowering latency. The distributed nature of TCO-MAS ensures scalability and efficiency, especially in dense MIMO–OFDM networks where centralized solutions would be communication-intensive.

Source link