1 Introduction
The performance of online learning algorithms such as online gradient descent in adversarial, adaptive settings is a classic staple of optimization and game theory, e.g,
[10, 17, 37]. Arguably, the most well known results in this space are the following:
Sublinear regret of is achievable in adversarial settings but only after employing a carefully chosen sequence of shrinking stepsizes or if the time horizon is finite and known in advance and the fixed learning rate is selected accordingly.

Sublinear regret algorithms “converge” to Nash equilibria in zerosum games.
Despite the well established nature of these results recent work has revealed some surprising insights that come to challenge the traditional ways of thinking in this area. Specifically, in the case of zerosum games what is referred to as “convergence” to equilibrium, is the fact that when both agent apply regretminimizing algorithms, both the timeaverage of the mixed strategy profiles as well as the utilities of the agents converge approximately to their Nash equilibrium values, where the approximation error can become arbitrarily close to zero by choosing a sufficiently small stepsize. Naturally, this statement does not imply that the daytoday behavior converges to equilibria. In fact, the actual realized behavior is antithetical to convergence to equilibrium. [1] showed that Nash equilibria are repelling in zerosum games for all followtheregularizedleader dynamics. As seen in Figure 1 the dynamics spiral outwards away from the equilibrium.
These novel insights about the geometry of learning dynamics in zerosum games suggest a much richer and not well understood landscape of coupled strategic behaviors. They also raise the tantalizing possibility that we may be able to leverage this knowledge to prove tighter regret bounds in games. In fact, a series of recent papers has focused on beating the “blackbox” regret bounds using a combination of tailored dynamics and adaptive stepsizes, e.g, [12, 30, 33, 16] but so far no new bounds have been proven for the classic setting of fixed learning rates. Interestingly, [16] explicitly examine the case of fixed learning rates to show that learning achieves sublinear “approximate regret” where the algorithm compares itself against times the performance of the best action with hindsight. In contrast, our aim is to show sublinear regret for fixed using the standard notion of regret.
Intuitively, nonequilibration and more generally this emergent behavioral complexity seem like harbingers of bad news in terms of system performance as well as of significant analytical obstacles. This pessimism seems especially justified given recent results about the behavior of online dynamics with fixed stepsizes in other small games (e.g. twobytwo coordination/congestion games), where their behavior can be shown to become provably chaotic ([25, 11]). Nevertheless, we show that we can leverage this geometric information to provide the first to our knowledge sublinear regret guarantees for online gradient descent with fixed stepsize in games. Instability of Nash equilibria is not an obstacle, but in fact may be leveraged as a tool, for proving low regret.
Our results. We study the dynamics of gradient descent with fixed step size in twostrategy, twoplayer games. We leverage a deep understanding of the geometry of its orbits to prove the first sublinear regret bounds despite the constant learning rate. We show that the player strategies are repelled away from the Nash equilibrium. More specifically, regardless of the choice of the initial condition there are only a finite number of iterations where both players select mixed strategies (Theorem 1). We prove a worstcase regret bound of for arbitrarily learning without prior knowledge of (Theorem 3) matching the well known optimal bound for adaptive learning rates. An immediate corollary of our results is that timeaverage of the mixed strategy profiles as well as the utilities of the agents converge to their exact Nash equilibrium values (and not to approximations thereof) (Corollary 4). Finally, we present a matching lower bound of (Theorem 5) establishing that our regret analysis is tight.
To obtain the upper bound, we establish a tight understanding of the geometry of the trajectories in the dual space, i.e., the trajectories of the payoff vectors. We show there exists a linear transformation of the payoff vectors that rotate around the Nash equilibrium. Moreover, the distance between the Nash equilibrium and these transformed utility vectors increases by a constant in each rotation (Lemma
8). In addition, the time to complete a rotation is proportional to the distance between the Nash equilibrium and the transformed payoff vectors (Lemma 9). Together, these results imply a quadratic relationship between the number of iterations and the number of rotations completed establishing the regret bound. We establish the lower bound by exactly tracking the strategies and regret for a single game.2 Preliminaries
A twoplayer game consists of two players where each player has strategies to select from. Player can either select a pure strategy or a mixed strategy . A strategy is fully mixed if .
The most commonly studied class of games is zerosum games. In a zerosum game, there is a payoff matrix where player 1 receives utility and player receives utility resulting in the following optimization problem:
(TwoPlayer ZeroSum Game) 
The solution to this saddle problem is the Nash equilibrium . If player 1 selects her Nash equilibria , then she guarantees her utility is independent of what strategy player selects. is referred to as the value of the game.
2.1 Online Learning in Continuous Time
In many applications of game theory, players know neither the payoff matrix nor the Nash equilibria. In such settings, players select their strategies adaptively. The most common way to do this in continuous time is by using a followtheregularizedleader (FTRL) algorithm. Given a strongly convex regularizer, a learning rate , and an initial payoff vector , players select their strategies at time according to
(Player 1 Payoff Vector)  
(Player 2 Payoff Vector)  
(Continuous FTRL) 
In this paper, we are primarily interested in the regularizer resulting in the Gradient Descent algorithm:
(Continuous Gradient Descent) 
Continuous time FTRL learning in games has an interesting number of properties including timeaverage converge to the set of coarse correlated equilibria at a rate of in general games [21] and thus to Nash equilibria in zerosum games. These systems can also exhibit interesting recurrent behavior e.g. periodicity [28, 23], Poincaré recurrence [21, 29, 27] and limit cycles [19]. These systems have formal connections to Hamiltonian dynamics (i.e. energy perserving systems) [2]. All of these types of recurrent behavior are special cases of chain recurrence [26, 24].
2.2 Online Learning in Discrete Time
In most settings, players update their strategies iteratively in discrete time steps. The most common class of online learning algorithms is again the family of followtheregularizedleader algorithms.
(Player 1 Payoff Vector)  
(Player 2 Payoff Vector)  
(FTRL)  
(Gradient Descent) 
where corresponds to the learning rate. In Lemma 6 of Appendix A, we show (FTRL) is the first order approximation of (Continuous FTRL).
These algorithms again have interesting properties in zerosum games. The timeaverage strategy converges to a approximate Nash equilibrium [10]. On the contrary, Bailey and Piliouras show that the daytoday behavior diverges away from interior Nash equilibria [1]. For notational simplicity we do not introduce different learning rates but all of our proofs immediately carry over to this setting.
2.3 Regret in Online Learning
The most common way of analyzing an online learning algorithm is by examining its regret. The regret at time/iteration is the difference between the accumulated utility gained by the algorithm and the total utility of the best fixed action with hindsight. Formally for player 1,
(1)  
(2) 
for continuous and discrete time respectively.
In the case of (Continuous FTRL) it is possible to show rather strong regret guarantees. Specifically, [21] establishes that even for nonzerosum games. In contrast, (FTRL) only guarantees for a fixed learning rate. In this paper, we utilize the geometry of (Gradient Descent) to show in 2x2 zerosum games ().
3 The Geometry of Gradient Descent
Theorem 1.
Let be a 2x2 game that has a unique fully mixed Nash equilibrium where strategies are updated according to (Gradient Descent). For any nonequilibrium initial strategies, there exists a such that is on the boundary for all .
Theorem 1 strengthens the result for (Gradient Descent) in 2x2 games from [1].
Specifically, [1] show that strategies come arbitrarily close to the boundary infinitely often when updated with any version of (FTRL).
This is accomplished by closely studying the geometry of the player strategies.
We strengthen this result for (Gradient Descent) in 2x2 games by focusing on the geometry of the payoff vectors.
The proof of Theorem 1 relies on many of the tools developed in Section 4 for Theorem 3 and is deferred to Appendix F.
The first step to understanding the trajectories of the dynamics of (Gradient Descent), is characterizing the solution to (Gradient Descent).
To streamline the discussion and presentation of results, we defer the proof of Lemma 2 to Appendix B.
Lemma 2.
3.1 Convex Conjugate of the Regularizer
Our analysis primarily takes place in the space of payoff vectors. The payoff vector is a formal dual of the strategy obtained via
(4) 
which is known as the convex conjugate or Fenchel Coupling of and is closely related to the Bregman Divergence. In [21] it is shown that the “energy” is conserved in (Continuous FTRL). By Lemma 6, (FTRL) is the first order approximation of (Continuous FTRL). The energy is convex, and therefore the energy will be nondecreasing in (FTRL). [1] capitalized on this nondecreasing energy to show that strategies come arbitrarily close to the boundary infinitely often in (FTRL).
In a similar fashion, we precisely compute to better understand the dynamics of (Gradient Descent). We deviate slightly from traditional analysis of (FTRL) and embed the learning rate into the regularizer . Formally, define . Through the maximizing argument [18], we have
(5) 
From Lemma 2,
(6)  
(7) 
and
(8)  
(9) 
Therefore,
(10)  
(11) 
3.2 Selecting the Right Dual Space in 2x2 Games
Since is a strongly smooth function in the simplex, we expect for to be strongly convex [18] – at least when it’s corresponding dual variable is positive. However, (11) is not strongly convex for all . This is because cannot appear anywhere in . Rather, is contained to a space dual to the domain .
There are many nonintersecting dual spaces for the payoff vectors that yield strategies . [21] informally define a dual space when they focus the analysis on the vector . Similarly, we define a dual space that will be convenient for showing our results in 2x2 zerosum games. Consider the payoff matrix
(12) 
Without loss of generality, we may assume , , and is singular, i.e., (see Appendix C for details). Denote as
(13)  
(14)  
(15) 
Therefore
(16) 
since is singular. When increases by , increases by . Thus, the vector describes the span of the dual space . Moreover, (FTRL) is invariant to constant shifts in the payoff vector and therefore we may assume . By induction,
(17)  
(18) 
This conveniently allows us to express in terms of ,
(19) 
Symmetrically,
(20) 
Combining these relationships with Lemma 2 yields
(21)  
(22) 
The selection of this dual space also allows us to employ a convenient variable substitution to plot and on the same graph.
(23)  
(24) 
The strategy can now be expressed as
(25) 
Moreover, (11) can be rewritten as
(26)  
(27) 
where and . Both of these expressions are obviously strongly convex when the corresponding player strategy is in . The full details of these reduction can be found in Appendix D. With this notation, is simply the projection of onto the unit square as shown in Figure 2.
(a) Iterations 195  (b) Iterations 95140 
4 Regret in 2x2 ZeroSum Games
Theorem 3.
Let be a 2x2 game that has a unique fully mixed Nash equilibrium. When is updated according to (Gradient Descent), .
It is well known that if an algorithm admits sublinear regret in zerosum games, then the timeaverage play converges to a Nash equilibirum. Thus, Theorem 3 immediately results in the following corollary.
Corollary 4.
Let be a 2x2 game that has a unique fully mixed Nash equilibrium. When is updated according to (Gradient Descent), average strategy converges to as .
Proof of Theorem 3.
The result is simple if . Neither player strategy will ever change. Since player 1’s opponent is playing the fully mixed , player 1’s utility is constant independent of what strategy is selected and therefore the regret is always . Now consider .
The main details of the proof are captured in Figure 3. Specifically in Section E.1, we establish break points and analyze the impact strategies have on the regret. The strategies are contained in adjacent red and green sections as shown in Figure 3.
Next in Section E.2, we show that there exists iterations where for each partitioning, . Specifically, we show that consecutive payoff vectors appear in a red section of Figure 3. The remaining points all appear in a green section and the corresponding player strategies are equivalent. This implies
(28)  
(29)  
(30) 
Denote as the total energy of the system in iteration . In Section E.3, we show this energy increases linearly in each partition, i.e., . In Section E.4, we also show that the size of each partition is proportional to the energy in the system at the beginning of that partition, i.e., . Combining these two, . Therefore and where is the total number of partitions. Finally, it is well known ([10]) that the regret of player in zerosum games through iterations is bounded by
(31)  
(32)  
(33)  
(34) 
completing the proof of the theorem. ∎
Next, we provide a game and initial conditions that has regret establishing that the bound in Theorem 3 is tight.
Theorem 5.
Consider the game Matching Pennies with learning rate and initial conditions . Then player 1’s regret is when strategies are updated with (Gradient Descent).
The proof follows similarly to the proof of Theorem 3 by exactly computing the regret in every iteration of (Gradient Descent). The full details appear in Section G.
5 Related Work
The study of learning dynamics in game theory has a long history dating back work of [7] and [31] on fictitious play in zerosum games, which followed shortly after von Neumann’s seminal work on zerosum games ([34, 35]). Some good reference books are the following: [17, 10, 37]. The classic results about timeaverage convergence of noregret dynamics have been successfully generalized to include multiplayer extensions of network constantsum games by [15, 9, 8].
Nonequilibrating dynamics in algorithmic game theory. In recent years the algorithmic game theory community has produced several interesting nonequilibrium results. These proofs are typically based on adhoc techniques, and results in this area typically revolve around specific examples of games with a handful of agents and strategies. [13] show that multiplicative weights update (MWU) does not converge even in a timeaverage sense in the case of a specific 3x3 game. [19] establish nonconvergence for a continuoustime variant of MWU, known as the replicator dynamic, for a 2x2x2 game and show that as a result the system social welfare converges to states that dominate all Nash equilibria. [25, 11] prove the existence of LiYorke chaos in MWU dynamics of 2x2 potential games. Our result add a new chapter in this area with new detailed understanding of the nonequilibrium trajectories of gradient descent in twobytwo zerosum games and their implications to regret.
Connections to continuous time dynamics in game theory. From the perspective of evolutionary game theory, which typically studies continuous time dynamics, numerous nonconvergence results are known but again typically for small games, e.g., [32]. [29] and [27] show that replicator dynamics, the continuous time version of MWU exhibit a specific type of near periodic behavior, which is known as Poincaré recurrence. Recently, [21] show how to generalize this recurrent behavior for replicator to more general continuous time variants of FTRL dynamics. [20] show that these arguments can also be adapted in the case of dynamically evolving games. Cycles arise also in team competition ([28]) as well as in network competition ([23]). The papers in this category combine delicate arguments such as volume preservation and the existence of constants of motions (“energy preservation”) for the dynamics to establish cyclic behavior. In the case of discrete time dynamics, such as the multiplicative weights or gradient descent, the system trajectories are first order approximations of the above motion and these conservation arguments are no longer valid. Instead as we have seen in this paper the “energy” is not preserved but increases over time at a predictable rate that allows us to prove tight bounds on the regret. Finally, [26] have put forward a program for linking game theory and topology of dynamical systems.
Fast regret minimization in games. It is widely known that the “blackbox” average regret rate of it is achieved by MWU with suitably shrinking step size without making any assumptions about its environment. Recently, several authors have focused instead on obtaining stronger regret guarantees for systems of learning algorithms in games. [12] and [30] develop noregret dynamics with a regret minimization rate when played against each other in twoplayer zerosum games. [33] further analyze a recency biased variant of FTRL in more general games and showed a regret minimization rate. The social welfare converges at a rate of , a result which was extended to standard versions of FRTL dynamics by [16].
Learning in zerosum games and applications to Artificial Intelligence.
A stream of recent papers proves positive results about convergence to equilibria in (mostly bilinear) zerosum games for suitable adapted variants of firstorder methods and then apply these techniques to Generative Adversarial Networks (GANs), showing improved performance (e.g. [14]) [4] exploit conservation laws of learning dynamics in zerosum games (e.g. [29, 21]) to develop new algorithms for training GANs that add a new component to the dynamic that aims at minimizing this energy function. Different energy shrinking techniques for convergence in GANs (nonconvex saddle point problems) exploit connections to variational inequalities and employ mirror descent techniques with an extra gradient step ([22]). Game theoretic inspired techniques such as timeaveraging seem to work well in practice for a wide range of architectures ([36]).Finally, the emergence of cycles in zerosum competition lies at the core of some of the most exciting problems in creating artificial agents for complex environments such as Starcraft, where even evaluating the strength of an individual agent is a nontrivial task ([5]). Recent approaches are inspired by the emergence of cyclic behavior to introduce algorithms that aim at gametheoretic niching ([3]).
6 Conclusion
We present the first, to our knowledge, proof of sublinear regret for the most classic FTRL dynamic, online gradient descent, in twobytwo zerosum games. Our proof techniques leverage geometric information and hinge upon the fact that FTRL dynamics, although are typically referred to as “converging” to Nash equilibria in zerosum games, diverge away from them. We strongly believe that these techniques, which we are just introducing, are far from being fully mined. Although several novel ideas will be required, we are fairly confident that these sublinear regret bounds carry over to much more general classes of FTRL dynamics as well as to large (zerosum) games.
Acknowledgements
James P. Bailey and Georgios Piliouras acknowledge SUTD grant SRG ESD 2015 097, MOE AcRF Tier 2 Grant 2016T21170, grant PIESGPAI201801 and NRF 2018 Fellowship NRFNRFF201807.
References
 [1] Bailey, J. P., and Piliouras, G. Multiplicative weights update in zerosum games. In ACM Conference on Economics and Computation (2018).
 [2] Bailey, J. P., and Piliouras, G. MultiAgent Learning in Network ZeroSum Games is a Hamiltonian System. arXiv eprints (Mar 2019), arXiv:1903.01720.
 [3] Balduzzi, D., Garnelo, M., Bachrach, Y., Czarnecki, W. M., Perolat, J., Jaderberg, M., and Graepel, T. Openended learning in symmetric zerosum games. arXiv preprint arXiv:1901.08106 (2019).
 [4] Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. The Mechanics of nPlayer Differentiable Games. In ICML (2018).
 [5] Balduzzi, D., Tuyls, K., Pérolat, J., and Graepel, T. Reevaluating evaluation. In NIPS (2018).
 [6] Bertsekas, D. P. Nonlinear programming. Athena scientific Belmont, 1999.
 [7] Brown, G. Iterative solutions of games by fictitious play. In Activity Analysis of Production and Allocation, T.C. Koopmans (Ed.), New York: Wiley. (1951).
 [8] Cai, Y., Candogan, O., Daskalakis, C., and Papadimitriou, C. Zerosum polymatrix games: A generalization of minmax. Mathematics of Operations Research 41, 2 (2016), 648–655.
 [9] Cai, Y., and Daskalakis, C. On minmax theorems for multiplayer games. In ACMSIAM Symposium on Discrete Algorithms (2011), SODA, pp. 217–234.
 [10] CesaBianchi, N., and Lugoisi, G. Prediction, Learning, and Games. Cambridge University Press, 2006.
 [11] Chotibut, T., Falniowski, F., Misiurewicz, M., and Piliouras, G. Family of chaotic maps from game theory. arXiv preprint arXiv:1807.06831 (2018).
 [12] Daskalakis, C., Deckelbaum, A., and Kim, A. Nearoptimal noregret algorithms for zerosum games. In Proceedings of the Twentysecond Annual ACMSIAM Symposium on Discrete Algorithms (Philadelphia, PA, USA, 2011), SODA ’11, Society for Industrial and Applied Mathematics, pp. 235–254.
 [13] Daskalakis, C., Frongillo, R., Papadimitriou, C., Pierrakos, G., and Valiant, G. On learning algorithms for Nash equilibria. Symposium on Algorithmic Game Theory (SAGT) (2010), 114–125.
 [14] Daskalakis, C., Ilyas, A., Syrgkanis, V., and Zeng, H. Training gans with optimism. In ICLR (2018).
 [15] Daskalakis, C., and Papadimitriou, C. On a network generalization of the minmax theorem. In ICALP (2009), pp. 423–434.
 [16] Foster, D. J., Lykouris, T., Sridharan, K., and Tardos, E. Learning in games: Robustness of fast convergence. In Advances in Neural Information Processing Systems (2016), pp. 4727–4735.
 [17] Fudenberg, D., and Levine, D. K. The Theory of Learning in Games. MIT Press Books. The MIT Press, 1998.
 [18] Kakade, S. M., Shalevshwartz, S., and Tewari, A. On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization, 2009.
 [19] Kleinberg, R., Ligett, K., Piliouras, G., and Tardos, É. Beyond the Nash equilibrium barrier. In Symposium on Innovations in Computer Science (ICS) (2011).
 [20] Mai, T., Panageas, I., Ratcliff, W., Vazirani, V. V., and Yunker, P. Cycles in Zero Sum Differential Games and Biological Diversity. In ACM EC (2018).
 [21] Mertikopoulos, P., Papadimitriou, C., and Piliouras, G. Cycles in adversarial regularized learning. In ACMSIAM Symposium on Discrete Algorithms (2018).
 [22] Mertikopoulos, P., Zenati, H., Lecouat, B., Foo, C.S., Chandrasekhar, V., and Piliouras, G. Mirror descent in saddlepoint problems: Going the extra (gradient) mile. ArXiv eprints (July 2018).
 [23] Nagarajan, S. G., Mohamed, S., and Piliouras, G. Three body problems in evolutionary game dynamics: Convergence, periodicity and limit cycles. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (2018), International Foundation for Autonomous Agents and Multiagent Systems, pp. 685–693.
 [24] Omidshafiei, S., Papadimitriou, C., Piliouras, G., Tuyls, K., Rowland, M., Lespiau, J.B., Czarnecki, W. M., Lanctot, M., Perolat, J., and Munos, R. rank: Multiagent evaluation by evolution. arXiv preprint arXiv:1903.01373 (2019).
 [25] Palaiopanos, G., Panageas, I., and Piliouras, G. Multiplicative weights update with constant stepsize in congestion games: Convergence, limit cycles and chaos. In Advances in Neural Information Processing Systems (2017), pp. 5872–5882.
 [26] Papadimitriou, C., and Piliouras, G. From nash equilibria to chain recurrent sets: An algorithmic solution concept for game theory. Entropy 20, 10 (2018).
 [27] Piliouras, G., NietoGranda, C., Christensen, H. I., and Shamma, J. S. Persistent patterns: Multiagent learning beyond equilibrium and utility. In AAMAS (2014), pp. 181–188.
 [28] Piliouras, G., and Schulman, L. J. Learning dynamics and the coevolution of competing sexual species. In ITCS (2018).
 [29] Piliouras, G., and Shamma, J. S. Optimization despite chaos: Convex relaxations to complex limit sets via poincaré recurrence. In Proceedings of the twentyfifth annual ACMSIAM symposium on Discrete algorithms (2014), SIAM, pp. 861–873.
 [30] Rakhlin, S., and Sridharan, K. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems (2013), pp. 3066–3074.
 [31] Robinson, J. An iterative method of solving a game. Annals of Mathematics 54 (1951), 296–301.
 [32] Sandholm, W. H. Population Games and Evolutionary Dynamics. MIT Press, 2010.
 [33] Syrgkanis, V., Agarwal, A., Luo, H., and Schapire, R. E. Fast convergence of regularized learning in games. In Proceedings of the 28th International Conference on Neural Information Processing Systems (Cambridge, MA, USA, 2015), NIPS’15, MIT Press, pp. 2989–2997.
 [34] von Neumann, J. Zur theorie der gesellschaftsspiele. Mathematische Annalen 100 (1928), 295–300.
 [35] von Neumann, J., and Morgenstern, O. Theory of Games and Economic Behavior. Princeton University Press, 1944.
 [36] Yazıcı, Y., Foo, C.S., Winkler, S., Yap, K.H., Piliouras, G., and Chandrasekhar, V. The Unusual Effectiveness of Averaging in GAN Training. ArXiv eprints (June 2018).
 [37] Young, H. P. Strategic learning and its limits. Oxford Univ. Press, 2004.
Appendix A First Order Approximation of (Continuous FTRL)
Lemma 6.
(FTRL) is the first order approximation of (Continuous FTRL).
Proof.
The first order approximation of is
(35)  
(36) 
and
(37) 
Inductively, and as defined in (FTRL) completing the proof of the lemma. ∎
Appendix B Optimal Solution to (Gradient Descent)
The KKT optimality conditions (see [6]) for (Gradient Descent) are given by
(Critical Point)  
(Nonnegativity)  
(Primal Feasibility)  
(Dual Feasibility)  
(Complimentary Slackness) 
where and .
Let be the set of where . By (Complimentary Slackness), for all . Therefore, (Critical Point) becomes
(38) 
Substituting (38) into (Primal Feasibility) yields
(39)  
(40) 
and . Therefore
(41) 
The variable represents that the constraint is unenforced. Enforcing constraints never improves the objective value of an optimization problem and therefore is a maximal set where (41) is feasible. Moreover, it is straightforward to show that if then . Thus, greedily removing the lowest valued from until (41) is feasible yields the optimal solution to (Gradient Descent).
Appendix C Payoff Matrix Assumptions
The payoff matrix is in the form
(42) 
In this paper, we make three assumptions about : , and . In order, we show that we may make these assumption without loss of generality.
In 2x2 games, if there is a unique fully mixed Nash equilibrium, then it is straight forward to show that player 2’s equilibrium is
(43) 
and therefore , and when there is a unique fully mixed Nash equilibrium. Similarly, by analyzing player 1’s Nash equilibrium, and . Now consider the payoff matrix
(44) 
The determinant of payoff matrix is zero. Moreover, (FTRL) is invariant to shifts in the payoff matrix, so for the purpose of the dynamics , and are equivalent matrices. Thus, without loss of generality we may assume the payoff matrix is singular by shifting the matrix by a specific constant.
Next, we argue that we may assume . Players 1 and 2 separately try to solve
(45)  
Comments
There are no comments yet.