9
978-1-4799-5001-0/14/$31.00 ©2014 IEEE 6C2-1 NETWORK SURVIVABILITY ORIENTED MARKOV GAMES (NSOMG) IN WIDEBAND SATELLITE COMMUNICATIONS Dan Shen, Genshe Chen, Gang Wang, Intelligent Fusion Technology, Inc., Germantown, MD Khanh Pham, Air Force Research Laboratory, Kirtland AFB, NM Erik Blasch, Air Force Research Laboratory, Rome, NY Zhi Tian, National Science Foundation, Arlington, VA Abstract Future satellite communications (SATCOM) challenges will require cognitive spectrum management and agile waveform adaptation solutions that are not only context-aware and capable of learning and probing for subscriber distributions, quality of services (QoS), mission priorities and traffic patterns, but also agile in waveform adaptation in order to provide active countermeasures for persistent and adaptive radio frequency (RF) interferences. In this paper, we develop a game theory enabled high-level anti-interference strategies. A game theory model of strategic conflicts in SATCOM is designed to guide the high-level anti- interference strategies. Space communication conflicts (interference in congested space environment) were investigated as a two-player, zero- sum game in discrete time, where each player discovers the opponent’s exploits according to an impudent random process. The optimal exercise strategies were derived and the value of engaging in space communication conflict was quantified. Problem Statement In the future military satellite communications (MILSATCOM) infrastructure, it is envisioned that satellite communication systems and hybrid space- terrestrial systems will be the essential components for improved connection capabilities and enhanced defensive control over complex collaborative missions [1]. These wideband space communication networks entail unprecedented complexity and unpredictability of the operating environments as well as extremely high stakes of interference [2]. A game solution [3, 4] and the associated analysis provide high-level insights on SATCOM interference [5]. From our previous work [6], we extend the results to wideband SATCOM. Figure 1 shows an illustrative sketch of the scenario of interest. There are 4 geostationary (GEO) satellites (GEO 1, GEO 2, GEO 3, and GEO 4) for the Blue side (friendly force) and 3 inadvertent interfering satellites (Red side). These interferers may interfere with the space communication among the GEO satellites and Low Earth Orbit (LEO) satellites. Figures 2 and 3 show the interference of GEO 1, LEO 1 and Interferer 1. Figure 1. A Reserch Scenario with GEO SATCOM and Inadvertent Interferers GEO 1 Interferer 1 Interferer 2 Interferer 3 GEO 2 GEO 3 GEO 4

Network survivability oriented Markov games (NSOMG) in wideband satellite communications

Embed Size (px)

Citation preview

978-1-4799-5001-0/14/$31.00 ©2014 IEEE 6C2-1

NETWORK SURVIVABILITY ORIENTED MARKOV GAMES (NSOMG) IN WIDEBAND SATELLITE COMMUNICATIONS

Dan Shen, Genshe Chen, Gang Wang, Intelligent Fusion Technology, Inc., Germantown, MD Khanh Pham, Air Force Research Laboratory, Kirtland AFB, NM

Erik Blasch, Air Force Research Laboratory, Rome, NY Zhi Tian, National Science Foundation, Arlington, VA

Abstract

Future satellite communications (SATCOM) challenges will require cognitive spectrum management and agile waveform adaptation solutions that are not only context-aware and capable of learning and probing for subscriber distributions, quality of services (QoS), mission priorities and traffic patterns, but also agile in waveform adaptation in order to provide active countermeasures for persistent and adaptive radio frequency (RF) interferences. In this paper, we develop a game theory enabled high-level anti-interference strategies. A game theory model of strategic conflicts in SATCOM is designed to guide the high-level anti-interference strategies. Space communication conflicts (interference in congested space environment) were investigated as a two-player, zero-sum game in discrete time, where each player discovers the opponent’s exploits according to an impudent random process. The optimal exercise strategies were derived and the value of engaging in space communication conflict was quantified.

Problem Statement In the future military satellite communications

(MILSATCOM) infrastructure, it is envisioned that satellite communication systems and hybrid space-terrestrial systems will be the essential components for improved connection capabilities and enhanced defensive control over complex collaborative missions [1]. These wideband space communication networks entail unprecedented complexity and unpredictability of the operating environments as well as extremely high stakes of interference [2]. A game solution [3, 4] and the associated analysis provide high-level insights on SATCOM interference [5].

From our previous work [6], we extend the results to wideband SATCOM. Figure 1 shows an illustrative sketch of the scenario of interest. There are 4 geostationary (GEO) satellites (GEO 1, GEO 2, GEO 3, and GEO 4) for the Blue side (friendly force) and 3 inadvertent interfering satellites (Red side). These interferers may interfere with the space communication among the GEO satellites and Low Earth Orbit (LEO) satellites. Figures 2 and 3 show the interference of GEO 1, LEO 1 and Interferer 1.

Figure 1. A Reserch Scenario with GEO SATCOM and Inadvertent Interferers

GEO 1

Interferer 1

Interferer 2

Interferer 3

GEO 2

GEO 3

GEO 4

6C2-2

Figure 2. The Orbital View of a Scenario with

GEO1, SAT 1, and Interferer 1

Figure 3. The Ground Track View of a Scenario

with GEO1, SAT1, and Interferer 1

Wideband GEO SATCOM can provide flexible, high-capacity communications for the support equipment through procurement and operation of the satellite constellation and the associated control systems. It provides worldwide flexible, high data rate and long haul communications for various ground elements, support agencies, and interested users. It can leverage cost-effective methods and technological advances in the communications satellite industry. Each GEO satellite provides service in both the X and Ka frequency bands, with the unprecedented ability to cross-band between the two frequencies onboard the satellite. It features an electrically steerable and phased array X-band, a mechanically steered Ka-band, and a fixed earth-coverage X-band.

Challenges and Solution Framework The existing Wideband GEO SATCAOM

waveforms include Quadrature phase-shift keying (QPSK), Offset QPSK (OQPSK) and Quadrature amplitude modulation (QAM). In this paper, we designed and implemented the adaptive waveform mechanism to transport existing GEO SATCOM waveforms in RF congested environments (from which potential interference sources can be both the ground/ship-based interferers and space-based satellites).

The solution framework is shown in Figure 4. There are two levels in adaptive anti-interference approach. The first level is to perform the frequency-hopping spread spectrum (FHSS) using multiple frequency-shift keying (MFSK) waveforms to transmit the Wideband GEO SATCOM waveform [7].

Figure 4. System Diagram of the NSOMG

For example, in Figure 5 (where T is the bit time, Ts is the duration of a signal element, and Tc is the frequency shift time), M (M=4) different frequencies are used to encode log2(M) = 2 data bits. Each channel has a total bandwidth of Wd = M×fd, where fd is a different frequency. FHSS uses 2k different channels (k = 2), so the total bandwidth Ws= 2k×Wd.

Figure 5. An Example of FHSS using MFSK.

SAT1

Interferer1 GEO1

Legend:GEO 1SAT 1 (LEO)Interferer 1 (MEO)

Markov game for anti-interfering

RF Transmitter GEO SATCOM Waveforms FHSS/MFSK SATCOM

Channels

FHSS/MFSKDemodulator

RF Receiver

Interferer

Spectrum sensing Interfering

Spectrum sensing (weak signal detection)

Space object ATRSpace object

Propagator

SATCOMPerformance Evaluation

DataSource

DataSink

GEO SATCOM Waveforms

6C2-3

The advantage of FHSS/MFSK is that it is quite resistant to interference. However, the frequency usage is not efficient when Ws gets large. On the other hand, if Ws is set too small, the interferer can easily saturate the band and terminate the link. Therefore, in the second level, a strategic frequency hopping was added based on the Markov game theory, whose solutions can Nash-optimally determine when to shift the communication link to the next available Ws. The main idea is illustrated in Figure 6.

For the worst case of blue side, the conflict between the Blue side and the Red side exists: Red wants to interfere the communications while Blue tries to maintain the links and mitigate the interference effects by wideband cognitive radio techniques [8]. Due to the maneuvers [9, 10] and frequency hopping, these conflicts are dynamic and evolutionary. Our NSOMG framework fits well here.

Figure 6. Markov Game Enabled Two-Level

Frequency Hopping for Anti-Interference Waveforms

A Game Theoretic Model of Strategic Conflicts in Space Communication

In this section, we investigate worst case of the SATCOM network conflicts (interference and anti-interference) as a two-player, zero-sum game in discrete time, where each player discovers the opponent’s exploits according to an impudent random process. Here the Blue side’s exploit is the normal SATCOM link between GEO and LEO (or air/surface vehicles/stations). The Red side’s exploit is the interference. If the Blue side detects an exploit, the Blue side knows the current communication link

is being interfered. On the other hand, when the Red side discovers an exploit, the interferer knows a Blue SATCOM link exists in which band and between which objects (it is assumed that these information can be obtained from the spectrum sensing and signal detection modules). Upon discovery, the player must decide if and when to exercise action based on the exploit. The Red side may wait awhile to let the SATCOM link occur. In that way, the payoff from the interference side is greater than the payoff using immediately interference strategy. For the Blue side, upon detecting interference, the SATCOM system may keep the link awhile before hopping to another frequency. It is because the relink cost and the possibility of the random interference. During the wait period (interference presence), the Blue side suffers lower throughput and longer transmission delays. We have derived optimal exercise strategies and quantify the value of engaging in space communication conflict. Our analysis also leads to high-level insights on space interference and anti-interference, i.e., when the FHSS/MFSK signal should jump to the next available band.

Markov Game Formulation The NSMOG model focuses on a strategic space

communication conflicts between two players. Let i index the players i ∈ {1, 2}, where player 1 is for Red side (interferer) and player 2 is for Blue side (space communication system). Let T be the system clock. The game starts at T=0. We create a discrete-time model, where T increases over a set of positive integers. Let di be the moment that player i discovers the opponent’s exploit. We define aware time τi = max (0, T–di) to be the relative time that player i has known about the exploit. By definition, if player i is not aware of the exploit, then τi = 0. Then a state of the game S is defined as:

S = ⟨T, τ1, τ2⟩ (1)

where the elements of this three-tuple represents how long the game has existed, how long player 1 has known player 2’s exploit, and how long player 2 has known player 1’s exploit; respectively.

A success of player 1 (interferer) depends on both his ability to detect the exploit and the timing of interference. The launch time includes the radio-frequency (RF) propagation time delay. Similarly, a successful player 2 (Wideband GEO SATCOM) can

FHSS/MFSK in Frequency band 1

FHSS/MFSK in Frequency band 2

Sys

tem

Lev

el F

requ

ency

Hop

ping

6C2-4

quickly detect the interference source and the good timing to execute the anti-interference methods (including the applied adaptive waveform). Let pi(T) denote the probability that player i discovers the opponent’s exploit as system clock progresses from period T to period T+1. Let ai(T, τi) be the value of interference (for player 1) or anti-interference (for player 2) with a aware time of τi. and the system clock T. Two constraints are imposed on ai(T, τi). First, assuming ai(T, 0) = 0, it means that if an exploit is not detected, interference or anti-interference has no value. Second, assuming 0 ≤ ai(T, τi) ≤ Bi, where Bi is an arbitrary upper bound, thus disallowing interference or anti-interference with either a negative value or an infinite value.

Once a player detects an exploit, he may choose to use it (interference for player 1 or anti-interference for player 2). Let θi(T) denote the action set of player i at time T. We define θi(T) ⊆ {W, A} where

• W: Wait. While a player is waiting, he is either waiting to discover his opponent’s exploit (τi = 0) or he may already know the exploit (τi > 0) but waits for the interference to be more effective (for player 1) or less anti-interference effects (for player 2).

• A: Action (Interference for player 1 or Anti-interference for player 2). When a player takes actions, he receives the value at that time.

A player who does not know the exploit has a singleton action set {W}, and a player that does know the exploit has the full action set, {W, A}.

Our Markov game begins in the state ⟨T, τ1, τ2⟩ = ⟨0, 0, 0⟩ and proceeds in discrete rounds. In each round, the clock time T increases deterministically. For each player i, the aware time τi = 0 until the player discovers the exploit. Exploit discovery happens with probability pi(T) for player i in round T. When an “exploit” is discovered by a player the player’s aware time increases deterministically. The

resulting transitions of the Markov game state are summarized in Table 1. A visual depiction of state of the game is presented in Figure 7 (where we assume the system is time invariant, so that we can drop T).

Figure 7. Diagram of States in the Markov Game

In Figure 7, the arrows in the diagram show the possible transitions from one state to another, as described in Table 1. The red color indicates that player 1 (interferer detects spectrum used in space communication) while the blue color denotes the states where the player 2 (Blue side’s SATCOM system) detected the interferer. The green colored states denote both players know the exploits. Let V(⟨T, τ1 , τ2⟩) be the value of the game in state ⟨T, τ1, τ2⟩. This value represents the expected value of the players if they play the game starting at that state.

For the wideband GEO SATCOM system, the game value includes the performance evaluation of the communication links. Because the game is zero sum, payoffs for both players can be described by a single value. We seek to characterize V(⟨T, τ1, τ2⟩) for every state of the Markov game. We proceed in our analysis by considering three cases on τ1, τ2. In the following we focus on time-invariant systems.

⟨0,0⟩ ⟨1,0⟩

(1-p1) (1-p2)

⟨1,1⟩

…⟨2,0⟩ ⟨3,0⟩

⟨0,1⟩

⟨0,2⟩

⟨0,3⟩

⟨2,1⟩

⟨1,2⟩ ⟨2,2⟩

p1(1-p2)

p2(1-p1)p1p2

1

1-p2 1-p2

p2

… …

1-p1

1-p1

p1

6C2-5

Table 1. Markov Game State Transitions as a Function of ⟨T, τ1, τ2⟩ and Actions

τ2 = 0 τ2 > 0

τ1 = 0 θ1(T) = {W}, θ2(T) = {W}

⟨T, 0 , 0⟩

⎩⎪⎨

⎪⎧

(1−𝑝1(𝑇))(1−𝑝2(𝑇))�⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯� ⟨𝑇 + 1, 0 , 0⟩ 𝑝1(𝑇)(1−𝑝2(𝑇))�⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯� ⟨𝑇 + 1, 1 , 0⟩ (1−𝑝1(𝑇))𝑝2(𝑇)�⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯� ⟨𝑇 + 1, 0 , 1⟩ 𝑝1(𝑇)𝑝2(𝑇) �⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯� ⟨𝑇 + 1, 1 , 1⟩

θ1(T) = {W}, θ2(T) = {A, W}

⟨T, 0 , τ2⟩ �1−𝑝1(𝑇)�⎯⎯⎯⎯� ⟨𝑇 + 1, 0 , τ2 + 1⟩

𝑝1(𝑇)�⎯⎯⎯⎯⎯� ⟨𝑇 + 1, 1 , τ2 + 1⟩

τ1 > 0 θ1(T) = {A, W}, θ2(T) = {W}

⟨T, τ1 , 0⟩ �1−𝑝2(𝑇)�⎯⎯⎯⎯� ⟨𝑇 + 1, τ2 + 1 , 0⟩

𝑝2(𝑇)�⎯⎯⎯⎯⎯� ⟨𝑇 + 1, τ2 + 1 , 1⟩

θ1(T) = {A, W}, θ2(T) = {A, W}

⟨T, τ1 , τ2⟩ 1 �⎯� ⟨𝑇 + 1, τ1 + 1 , τ2 + 1⟩

Case 1: Both Players Know the Exploits In this case, τ1 > 0 and τ2 > 0. Table 2 represents

the payoffs of the Markov game in such a state in matrix form.

Table 2. Payoff Matrix for the Markov Game When Both Players Know the Exploit

Player 2 plays: W Player 2 plays: A

Player 1 plays: W

V(⟨T+1, τ1+1, τ2+1⟩) -a2(T, τ2)

Player 1 plays: A a1(T, τ1) a1(T, τ1) – a2(T, τ2)

Each entry in the matrix contains a single real number because since the game is zero sum one, where the player 1 wants to maximize the value and the other player 2 wants to minimize it. If both players wait, the value is determined by future play. If one player takes an action and the other waits, the acting player receives the value of action. If both players take action, the sum of the values give the result of the game.

This leads to the following observation.

Theorem 1: For any game state ⟨T, τ1, τ2⟩ such that τ1 > 0 and τ2 > 0, “Action, Action” is an iterated elimination of dominated strategies equilibrium with a value of a1(T, τ1) – a2(T, τ2).

Proof: If V(⟨T+1, τ1+1, τ2+1⟩) ≥ 0, then we have V(⟨T+1, τ1+1, τ2+1⟩) ≥ - a2(T, τ2) In addition,

a1(T, τ1) ≥ a1(T, τ1) – a2(T, τ2) Therefore, “action” is a dominate strategy for

player 2 (whose objective is to minimize the game value). Given player 2’s action strategy, player 1 must also play “action” because a1(T, τ1) – a2(T, τ2) ≥ -a2(T, τ2). So, “Action, Action” is a Nash equilibrium.

Otherwise V(⟨T+1, τ1+1 , τ2+1⟩) < 0, we have

V(⟨T+1, τ1+1 , τ2+1⟩) < a1(T, τ1)

In addition,

-a2(T, τ2) ≤ a1(T, τ1) – a2(T, τ2) Therefore, “action” is a dominated strategy for

player 1 (whose objective is to maximize the game value). Given player 1’s action strategy, player 2 will also choose an “action” because a1(T, τ1) ≥ a1(T, τ1) –a2(T, τ2). So, “Action, Action” is a Nash equilibrium.

This completes the proof of Theorem 1.

Remark 1: From the theorem 1, if the game starts in state ⟨T, τ1, τ2⟩ with τ1 > 0 and τ2 > 0, the game terminates immediately and the game value V(⟨T, τ1, τ2⟩) = a1(T, τ1) – a2(T, τ2). Therefore, a game starting in ⟨T, 0, 0⟩ ends, optimally, no later than the following state is reached: ⟨T, τ1, 1⟩, or ⟨T, 1, τ2⟩.

6C2-6

Case 2: Only One Players Knows the Exploit Without loss of generality, we investigate the

case from a state where player 1 has the exploit and player 2 does not. The analysis follows identical lines in the opposing situation. In this case, player 1 has a full action set and player 2 may only wait to discover the exploit, therefore θ1 = {A, W}, θ2 = {W}. Suppose the state of the game is ⟨T, τ1, 1⟩. We define

Y = (1-p2(T))V(⟨T+1, τ1+1, 0⟩) + p2(T) V(⟨T+1, τ1+1, 1⟩) (2)

to be the to be the expected utility if both players choose to wait at time T. Table 3 displays the payoffs in a matrix form.

Table 3. Payoff Matrix for the Markov Game When Only Player 1 Knows the Exploit

Player 2 plays: W

Player 1 plays: W Y

Player 1 plays: A a1(τ1)

It is obvious that the player 1 (whose objective is to maximize the game value) prefers to Action if Y≤ a1(τ1). The fundamental analytic question is “from which state does Player 1 prefer to act?” If Player 2 discovers the exploit, the game transitions to the case 1: Both players know the exploits. We characterize states ⟨T, τ, 0⟩ from which Player 1 prefers to act as follows. We define vτ(h) as the expected utility to Player 1 if he waits h time steps before action, staring in state ⟨T, τ, 0⟩. In particular, we have (for convenience, let qi(T) = 1-pi(T)), vτ(0) = a1(τ ) vτ(1) = q2(T) a1(τ+1) + p2(T)( a1(τ+1) – a2(1)) vτ(2) = q2(T+1) q2(T) a1(τ+2) + p2(T+1) q2(T)( a1(τ+2) – a2(1)) + p2(T)( a1(τ+1) – a2(1))

𝑣𝜏(ℎ) = 𝑎1(𝜏 + ℎ)�𝑞2(𝑇 + 𝑘)ℎ−1

𝑘=0

+

���𝑎1(𝜏 + 𝑘 + 1) − 𝑎2(1)�𝑝2(𝑇 + 𝑘)�𝑞2(𝑇 + 𝑗)𝑘−1

𝑗=0

�ℎ−1

𝑘=0

The definition of vτ(h) allows us to evaluate the states from which Player 1 prefers to action. Player 1 prefers to act rather wait in state ⟨T, τ, 0⟩ if and only if

a1(τ) = vτ(0) ≥ vτ(h) for all h ≥1 (3)

The Eq. (3) statement mirrors our intuition that a player should act only if an immediate action results in a higher utility than waiting for any number of turns before taking an action.

Theorem 2. If a1(τ) is concave and nondecreasing, and p2(T) is nondecreasing, then vτ(0) ≥ vτ(1) implies that Player should act in state ⟨T, τ, 0⟩ (i.e., Player 1 can never do better by waiting).

Proof. We proceed by showing vτ(0) ≥ vτ(1) → vτ(0) ≥ vτ(h) for all h ≥2 (4)

Consider the quantity

𝑣𝜏(ℎ + 1) − 𝑣𝜏(ℎ) =

𝑎1(𝜏 + ℎ + 1)�𝑞2(𝑇 + 𝑘)ℎ

𝑘=0

𝑎1(𝜏 + ℎ)�𝑞2(𝑇 + 𝑘)ℎ−1

𝑘=0+ �𝑎1(𝜏 + ℎ + 1) − 𝑎2(1)�𝑝2(𝑇

+ ℎ)�𝑞2(𝑇 + 𝑗)ℎ−1

𝑗=0

= �𝑞2(𝑇 + 𝑘)[𝑎1(𝜏 + ℎ + 1) − 𝑎1(𝜏 + ℎ)ℎ−1

𝑘=0− 𝑎2(1)𝑝2(𝑇 + ℎ)] (5)

We know that vτ(0) ≥ vτ(1), which implies that 0 ≥ vτ(1) – vτ(0) = a1(τ +1) – a1(τ ) – p2(T) a2(1)

≥ a1(τ +h+1) – a1(τ+h ) – p2(T) a2(1) (6) where the last inequality comes from the fact that a1(τ ) is concave and nondecreasing.

From (5), we have 0≥ a1(τ +h+1) – a1(τ+h ) – p2(T) a2(1) ≥a1(τ +h+1) – a1(τ+h ) – p2(T+h) a2(1) (7) where the last inequality comes from the fact p2(T) is non-decreasing and a2(1) is nonnegative.

6C2-7

Finally, multiplying both sides of the (7) by the positive number ∏ 𝑞2(𝑇 + 𝑘)ℎ−1

𝑘=0 , we obtain

0 ≥ vτ(h+1) – vτ(h) (8)

Then,

vτ(h) - vτ(0) = [vτ(h) – vτ(h-1)]

+[vτ(h-1) – vτ(h-2)] + …

+ [vτ(1) – vτ(0)] ≤ 0 (9)

Eq. (9) completes the proof.

Remark 2: If we assume stationary probabilities pi(T) = pi, then Theorem 2 shows that vτ(0) ≥ vτ(1) is sufficient condition to prefer action at a holding time of τ. On the other hand, Eq. (9) means that vτ(0) ≥ vτ(1) is necessary to prefer action at a holding time of τ. Therefore, from state ⟨T, τ, 0⟩, Player 1 waits for

k* = mink {vk(0) ≥ vk(1)}

= mink {a1(k+1) – a1(k) ≤ p2 a2(1)} (10)

time steps before action. Therefore, the game value is

V(⟨T, 1, 0⟩) = v0(k*) (11)

Case 3: Neither Player Knows the Exploit In this case, τ1 = τ2 = 0, therefore, both players

have singleton action sets, θ1 = {W}, θ2 = {W}. The game value, given that Player 1 discovers the exploit first, is V(⟨T, 1, 0⟩). Similarly, if Player 2 detects the exploit first, the game value is V(⟨T, 0, 1⟩). If both players simultaneously discover the exploits, then the game value is V(⟨T, 1, 1⟩) = a1(1) – a2(1). Because the state ⟨T, 0, 0⟩ transitions into previously analyzed states, only the first transition is concerned. For the stationary discovery probabilities, the next state transition probabilities out of S = ⟨T, 0, 0⟩ are:

Pr{𝑛𝑒𝑥𝑡 𝑠𝑡𝑎𝑡𝑒 𝑖𝑠 ⟨𝑇, 1, 0⟩ } = 𝛾1,0

=𝑝1(1 − 𝑝2)

𝑝1(1 − 𝑝2) + 𝑝2(1 − 𝑝1) + 𝑝1𝑝2

Pr{𝑛𝑒𝑥𝑡 𝑠𝑡𝑎𝑡𝑒 𝑖𝑠 ⟨𝑇, 0, 1⟩ } = 𝛾1,0

=𝑝2(1 − 𝑝1)

𝑝1(1 − 𝑝2) + 𝑝2(1 − 𝑝1) + 𝑝1𝑝2

Pr{𝑛𝑒𝑥𝑡 𝑠𝑡𝑎𝑡𝑒 𝑖𝑠 ⟨𝑇, 1, 1⟩ } = 𝛾1,1

=𝑝1𝑝2

𝑝1(1 − 𝑝2) + 𝑝2(1 − 𝑝1) + 𝑝1𝑝2

where the γ value for brevity has been introduced.

The game value

V(⟨T, 0, 0⟩) = γ1,0 V(⟨T, 1, 0⟩) –

γ0,1 V(⟨T, 0, 1⟩) + γ1,1 V(⟨T, 1, 1⟩)

= γ1,0 v1(k1*) - γ1,0 v2(k2

*) + γ1,1 (a1(1) - a2(1)) (12)

where the negative sign comes from the fact that Player 1 is a maximizing player and Player 2 is a minimizing player. v1(k1

*) denotes the result of (10)-(11) if Player 1 is the first to discover the exploit, while v2(k2

*) denotes the result of (10)-(11) if Player 2 is the first to discover the exploit.

Experiments and Results of the Hardware Demonstration

A Universal Software Radio Peripheral (USRP) and Gnu's not Unix (GNU) Radio based hardware testbed has been implemented to demonstrate the integrated game theory enabled spectrum management and waveform adaptation. We emulated the interference and anti-interference conflicts in the frequency band 1.3 GHz to 1.6 GHz.

Hardware Components In the hardware implementation, the USRP

designed by Ettus Research is utilized. The radio is a USRP1 (Figure 8), which is the original Universal Software Radio Peripheral™ hardware (USRP) that provides entry-level RF processing capability. It is intended to provide software defined radio development capability for cost-sensitive users and applications. The architecture includes an Altera Cyclone Field-programmable Gate Array (FPGA), 64 mega-samples per second (MS/s) dual analog-to-digital converter (ADC), 128 MS/s dual digital-to-analog converter (DAC) and Universal Serial Bus (USB) 2.0 connectivity to provide data to host processors. A modular design allows the USRP1 to operate from the direct current (DC) to 6 GHz. The USRP1 platform can support two complete RF daughter-boards. This feature makes the USRP ideal for applications requiring high isolation between transmit and receive chains, or dual-band dual transmit/receive operation. The USRP1 can stream up to 8 MS/s to and from host applications and users can implement custom functions in the FPGA fabric.

6C2-8

Figure 8. USRP 1

Software Components In addition to the NSMOG algorithms, the GNU

radio [11] is used to link the hardware to algorithms.

GNU Radio is a popular software-defined radio (SDR) application framework commonly used with USRP devices. It can be used for signal processing from the physical layer with readily-available low-cost external RF hardware, or without hardware in a simulation-like environment. It's widely used for learning about, building, and deploying software radios, both in business and academic studies.

The bridge between GNU Radio and the USRP device is the USRP Hardware Driver software (UHD), which is also the hardware driver for all USRP devices. It works on all major platforms (Linux, Windows, and Mac) and can be built with GNU compiler collection (GCC), Clang, and Microsoft Visual Studio (MSVC) compilers. The UHD provides a set of blocks in the gr-uhd (Gnu Radio - UHD) component, which includes:

• USRP source block - provides receiver (TX) data to downstream processing blocks

• USRP sink block - accepts transmitter (RX) data from upstream processing blocks

The blocks can be used in C++, Python, or in the

graphical GNU Radio Companion environment.

Implementation The demo setup is shown in Figure 9. The

interferer tries to interrupt the data transmission from the transmitter to the receiver. We assumed that the wideband GEO SATCOM transponder is perfect, i.e., no signal distortion caused by satellite transponder.

Figure 9. Hardware Demo Setup

We built a set of digital video broadcasting (DVB) transmitters and receivers, where digital modulation Gaussian Minimum Shift Keying (GMSK) was used. The input is an mp4 video file. In the GMSK Modulation, the samples/symbol is 2 and the bits/symbol is 1. Thus for each byte (8 bits) of data, 16 samples exist. Therefore the sample rate of USRP sink is 16 times that of the sample rate of Throttle, which specifies how fast to read the input file.

Figure 10 shows the received video stream and the Fast Fourier Transform (FFT) plot. We performed interference and anti-interference experiments. The frequency hopping guided by game strategies can maintain video streaming in the congested environment with inadvertent interferers.

Figure 10. DVB Experiment Result

Transmitter

Receiver

Interferer

6C2-9

Conclusions In this paper, we developed a practical Network

Survivability Oriented Markov Game (NSOMG) framework for the advanced wideband satellite communication (SATCOM). NSOMG offers a holistic approach to dynamic resource management, spectrum sensing and waveform adaptation in the presence of interferers. System integration and performance evaluation demonstrates a practical wideband GEO SATCOM scenario. The preliminary results shows that our game-theoretic solution can drastically improve the survivability, effectiveness in terms of satellite throughput and waveform adaption, robustness and autonomy of space and air-borne tactical communication networks [12]. Moreover, a Universal Software Radio Peripheral (USRP) and Gnu's not Unix (GNU) Radio based hardware testbed has been implemented to demonstrate the software defined radio (SDR) applications with the capabilities of integrating our spectrum management and waveform adaptation.

References [1] K. L. B. Cook, “Current Wideband MILSATCOM Infrastructure and the Future of Bandwidth Availability,” IEEE Aerospace and Electronics Systems Magazine, pp. 23-28, Dec. 2008.

[2] Defense Information Systems Agency, (n.d.) “Joint Spectrum Center,” Retrieved July 15, 2012.

[3] G. Owen, Game Theory, Emerald Group Publishing Limited; 3rd edition, 1995.

[4] G. Chen, D. Shen, C. Kwan, J. B. Cruz, M. Kruger, E. Blasch, “Game Theoretic Approach to Threat Prediction and Situation Awareness,” Journal of Advances in Information Fusion, Vol. 2, 1–14, 2007.

[5] Donald H. Martin, Communication Satellites, Aerospace Corp; 4 Sub edition (May 2000).

[6] X. Tian, Z. Tian, K. Pham, E. Blasch, and G. Chen, “QoS-aware Dynamic Spectrum Access for Cognitive Radio Networks,” Proc. SPIE, Vol. 8739, 2013.

[7] W. Yu, X. Fu, E. Blasch, K. Pham, D. Shen, and G. Chen, “On Effectiveness of Hopping-Based Techniques for Network Forensic Traceback,” Int’l J. of Networked and Distributed Computing, Vol. 1, No. 3, 2013.

[8] Z. Tian, E. Blasch, W. Li, G. Chen, and X. Li, “Performance Evaluation of Distributed Compressed Wideband Sensing for Cognitive Radio Networks,” Int’l Conf. on Info Fusion, 2008.

[9] E. Blasch, K. Pham, D. Shen, and G, Chen, “Orbital Satellite Pursuit-Evasion Game-Theoretical Control,” IEEE Int’l. Conf. on Info. Sci., Sig. Processing and App. (ISSPA), 2012.

[10] W. Yu, S. Wei, G. Xu, G. Chen, K. Pham, E. P. Blasch, and C. Lu, “On Effectiveness of Routing Algorithms for Satellite Communication Networks,” Proc. SPIE, Vol. 8739, 2013.

[11] http://gnuradio.org/redmine/, accessed March 23, 2014.

[12] E. Blasch, T. Busch, S. Kumar, and K. Pham, “Trends in Survivable/Secure Cognitive Networks,” IEEE Int’l Conf, on Computing, Networking, and Communications, 2013.

Acknowledgements This work was sponsored by Air Force Research

Laboratory under contract FA9453-13-M-0154. The views expressed in this study are those of the authors, and do not reflect the views of Air Force Research Laboratory (AFRL).

33rd Digital Avionics Systems Conference October 5-9, 2014