17

Click here to load reader

Some statistics on Dyck paths

Embed Size (px)

Citation preview

Page 1: Some statistics on Dyck paths

Journal of Statistical Planning andInference 101 (2002) 211–227

www.elsevier.com/locate/jspi

Some statistics on Dyck paths

Donatella Merlini ∗, Renzo Sprugnoli, M. Cecilia VerriDipartimento di Sistemi e Informatica, Universita di Firenze, via Lombroso 6=17, 50134 Firenze, Italy

Received 17 March 1999; received in revised form 24 September 1999

Abstract

We study some statistics related to Dyck paths, whose explicit formulas are obtained bymeans of the Lagrange Inversion Theorem. There are 3ve such statistics and one of them iswell-known and owed to Narayana. The most interesting of the other four statistics is relatedto Euler’s trinomial coe8cients and to Motzkin numbers: we perform a study of that statisticproving a number of its properties. c© 2002 Elsevier Science B.V. All rights reserved.

MSC: 05A15; 05A16; 60C05

Keywords: Dyck paths; Lagrange inversion theorem; Trinomial coe8cients

1. Introduction

Dyck paths are very well-known combinatorial objects that have been widely studiedfrom various points of view (see, e.g. Goulden and Jackson, 1983; Labelle, 1993;Labelle and Yeh, 1990; Merlini et al., 1994, 1996; Viennot, 1983). Together withmany other objects, they are counted by Catalan numbers. A surprisingly large numberof 1–1-correspondences are known that relate these paths to other classes of objects,such as binary trees, planar trees, parenthesized expressions, polygon triangulationsand so on; a large number of references can be found in Gould (1977) and in Stanley(1999).In this paper, we consider Dyck paths as underdiagonal paths in the Z2 lattice, start-

ing at the origin and never going above the main diagonal and made up of east = (1; 0)and north = (0; 1) steps (the convention to take paths with steps (1; 1) and (1;−1) isat least as frequently used).Other properties of Dyck paths, related to Catalan numbers, have also been studied.

For example, the so-called Catalan triangle in Table 1(a) is de3ned by the fact thatits generic element cn;k counts the number of partial Dyck paths arriving at the point(n; n− k). Due to the chamaleontic nature of Catalan numbers, cn;k also counts many

∗ Corresponding author. Tel.: +39-55-479-6771; fax: +39-55-479-6730.E-mail address: [email protected] (D. Merlini).

0378-3758/02/$ - see front matter c© 2002 Elsevier Science B.V. All rights reserved.PII: S0378 -3758(01)00180 -X

Page 2: Some statistics on Dyck paths

212 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

Table 1Catalan and Motzkin triangles

n=k 0 1 2 3 4 5

(a) Catalan triangle0 11 1 12 2 2 13 5 5 3 14 14 14 9 4 15 42 42 28 14 5 1

(b) Motzkin triangle0 11 1 12 2 2 13 4 5 3 14 9 12 9 4 15 21 30 25 14 5 1

other things: for instance, the number of Dyck path of semilength n that touches ktimes the main diagonal (the origin excluded), and, by an obvious correspondence, thenumber of parenthesized expressions with n open parentheses and having exactly kprimitive components.Dyck paths are composed by east and north steps; underdiagonal paths also composed

by diagonal steps are called Motzkin paths. The Motzkin triangle (see Donaghey andShapiro, 1977), illustrated in Table 1(b), is very similar to the Catalan triangle (see alsoBarcucci et al., 1991) and its generic element Mn;k counts the number of Motzkin pathscomposed by n steps and arriving at a distance k from the main diagonal. In particular,the numbers in column 0 are called Motzkin numbers. In Barcucci et al. (1991), alot of properties are studied, connecting Catalan and Motzkin numbers, binomial andtrinomial coe8cients, for which we refer to Section 3.A common way for studying all these kinds of numbers is through the use of

context-free grammars de3ning the set of paths in the Z2 lattice which count them.For example, if we denote by 0 an east step, by 1 a north step and by x a diagonalstep, Dyck and Motzkin paths are de3ned by the following sets of productions (writtenin the Backus–Naur form):

D ::= | 0D1D M ::= | xM | 0M1M;

where denotes the empty word. In this way, a path corresponds to a word in thelanguage generated by the grammar and, from these grammars, the SchJutzenbergermethodology (SchJutzenberger, 1963) allows us to 3nd the corresponding generatingfunctions. Usually, Dyck paths are counted in terms of their semilength, and this givesus the Catalan numbers, furthermore, if we wish to count Dyck paths according to theirsemilength n and to the number k of their valleys (i.e., two consecutive steps east andnorth), it is su8cient to count Dyck words according to their semi-length (as words)and according to the number of their subsequences 01. The resulting statistic, shown

Page 3: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 213

Table 2Narayana distribution

n=k 0 1 2 3 4 5

0 11 0 12 0 1 13 0 1 3 14 0 1 6 6 15 0 1 10 20 10 1

in Table 2, is called the Narayana distribution and Sulanke (1999) describes a lot ofproperties of Dyck paths sharing the same distribution. If n;k is the generic elementin the Narayana triangle, it is known that

n;k =1n

(nk

)(n

k − 1

): (1)

This formula is easily proved by means of the Lagrange inversion formula (LIF),(see Goulden and Jackson, 1983), as we will show in the next section, and the samemethod can be used to 3nd out formulas for other statistics related to Dyck paths. Dueto that, we call them statistics related to the Lagrange Inversion, and in this paper weparticularly investigate one of them. This will be called the Trinomial statistics, sinceit is related to the trinomial coe8cients of Euler; not surprisingly, it is also related toMotzkin numbers, which can be expressed in terms of central binomial coe8cients, asshown in Barcucci et al. (1991). Finally, we prove a number of properties of trinomialstatistics, ranging from an explicit formula to generating functions and to unimodality.The paper is organized in the following way: in Section 2 we classify and describe

the Lagrange statistics; Section 3 illustrates the trinomial statistics and prove somegeneral properties; 3nally, in Section 4, we analyse more deeply the correspondingtriangle, trying to characterize it by means of recurrences and generating functions.

2. Statistics related to Lagrange Inversion

We are now going to use the Narayana distribution as our introductory example toshow some statistics, related to the LIF, which involve Dyck paths. Let us supposewe wish to count Dyck paths of semi-length n (i.e., arriving at the point (n; n)),having exactly k valleys, or, equivalently, Dyck words of length 2n having exactly koccurrences of the string 01. We can proceed by considering the -free context-freegrammar generating Dyck words; according to standard methods (i.e., substituting inall the possible ways in the occurrences of the non-terminal symbol D) we obtain

D ::=01 | 0D1 | 01D | 0D1D: (2)

Since by this de3nition D does not contain the empty word, possible occurrences of01 are found in D, in the word 01 and in all the words generated by 01D. In fact,

Page 4: Some statistics on Dyck paths

214 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

an occurrence of 01 cannot be after the 0 in 0D1 because D always begins with 0;for the same reason, no other occurrence of 01 can be found in 0D1D, except theones already present in D. We use the SchJutzenberger methodology to pass from thegrammar to the corresponding generating function. Let us use the indeterminate t forcounting couples 0=1, and the indeterminate w for counting the occurrences of thestring 01. The bivariate generating function D(t; w) is given by the solution of

D(t; w)= tw + tD(t; w) + twD(t; w) + tD(t; w)2: (3)

An explicit form for D(t; w) is easily obtained

D(t; w)=1− t − tw −√

1− 2t − 2tw + t2 − 2t2w + t2w2

2tw:

This formula gives no hint on how the form of [tnwk ]D(t; w)= n;k should be. Aclassical application of the LIF, however, gives the appropriate result. Formula (3) canbe written as

D= t(w + wD + D + D2)= t(1 + D)(w + D);

therefore, if we set �(w)= (1 + D)(w + D), we have

[tnwk ]D(t; w) =1n[Dn−1wk ](1 + D)n(w + D)n=

1n

(nk

)[Dn−1]Dn−k(1 + D)n

=1n

(nk

)[Dk−1](1 + D)n=

1n

(nk

)(n

k − 1

):

This is the celebrated formula of Narayana for the present and many other statisticson Dyck paths. If we look at how this result has been obtained, we observe that

• the presence of a single couple 0=1 in the productions for D allows us to have asingle t in the functional equation de3ning D(t; w); t is collected from the variousterms and this determines the application of the Lagrange Inversion Formula, whichrequires a relation D= t�(w;D);

• the indeterminate w seems to play a secondary role, but its appearance in two of theterms in the functional equation determines the 3nal form of the Narayana formula.

As a consequence, we may deduce that by varying the position of the indeterminatew in the functional equation for D(t; w) other statistics are obtained with the LIF.Let us give a name to the four productions in (2): (a) is the 3rst one with 01 as aright-hand side member; (b) is the second one with 0D1; (c) is the third one with01D and 3nally, (d) is the fourth one with 0D1D as a right-hand side member. In thisway, we can identify a distribution by the combination of letters corresponding to theterms in which the indeterminate w appears. For example, case (a; c) corresponds tothe Narayana statistic, and we can investigate all the diNerent possible situations. Weobserve that, from an algebraic point of view, the indeterminate w in position (b) is

Page 5: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 215

equivalent to w in position (c); therefore, only the following ten cases are possible:

(a); (b)= (c); (d); (a; b)= (a; c); (a; d); (b; c); (b; d)= (c; d);

(a; b; c); (a; b; d)= (a; c; d); (b; c; d):

Obviously, the two extreme cases: w nowhere and w in (a; b; c; d) are not interesting,and will be ignored. We also wish to point out that a symmetry exists and, for example,statistic (a) is symmetric to (b; c; d), where w appears in complementary positions. Inthis way, the number of possible statistics is reduced to 3ve, and in this section wewill 3nd an explicit formula for each of them.Let us begin with case (a). Suppose we change production (a) D → 01 into

D → XX , where X is a new terminal symbol; we wish to count the new wordsaccording to their semilength and to the number of couples XX . The SchJutzenbergermethodology gives the following relation:

D= t(w + D + D + D2)= t(w + D(2 + D)):

In this case �(w;D)= (w + D(2 + D)) and therefore the LIF gives

[tnwk ]D=1n[Dn−1wk ](w + D(2 + D))n=

1n

(nk

)[Dn−1]Dn−k(2 + D)n−k

=1n

(nk

)[Dk−1](2 + D)n−k =

1n

(nk

)(n− kk − 1

)2n−2k+1:

In Table 3(i) we show the initial part of the in3nite triangle corresponding to thisstatistic.The second case is (b)= (c). Suppose we wish to count Dyck words according to

semilength and to the number of occurrences of the sequence 010. These sequenceseither occur inside a D or are generated by the production D → 01D; since D alwaysbegins by 0. In all the other cases, new 010 sequences can never be generated. TheSchJutzenberger methodology gives

D= t(1 + wD + D + D2)= t(wD + (1 + D + D2)):

Here we have �(w;D)=wD + (1 + D + D2) and therefore by the LIF

[tnwk ]D=1n[Dn−1wk ](wD + (1 + D + D2))n

=1n

(nk

)[Dn−1]Dk(1 + D + D2)n−k

=1n

(nk

)[Dn−k−1](1 + D + D2)n−k =

1n

(nk

)PTn−k = �n;k : (4)

It is immediate to observe that [Dn−k−1](1 + D + D2)n−k is related to the trinomialcoe8cients of Euler, which are de3ned as Tn= [tn](1+ t+ t2)n. In the next section wewill investigate more deeply this connection; for the moment, let us denote by PTn the

Page 6: Some statistics on Dyck paths

216 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

Table 3Statistics (a) and (b)= (c)

n=k 0 1 2 3 4 5

(i) Statistic (a)0 11 0 12 0 2 03 0 4 1 04 0 8 6 0 05 0 16 24 2 0 0

(ii) Statistic (b)= (c)0 11 1 02 1 1 03 2 2 1 04 4 6 3 1 05 9 16 12 4 1 0

coe8cient of tn−1 in (1 + t + t2)n and so obtain a formula for this statistics, whichtherefore will be called trinomial distribution for Dyck paths; it will be the mainobject of our studies in the next sections. The upper part of the corresponding in3nitetriangle is shown in Table 3(ii); its elements will be denoted by �n;k and we observethat column 0 contains the so-called Motzkin numbers.The third case is (d). Suppose we wish to count Dyck words according to semilength

and the number of applications of the production D→ 0D1D in the generation of theword (i.e., in its syntactic tree). We have

D= t(1 + D + D + wD2)= t((1 + 2D) + wD2):

Here �(D;w)= (1 + 2D) + wD2 and therefore the LIF gives

[tnwk ]D=1n[Dn−1wk ]((1 + 2D) + wD2)n=

1n

(nk

)[Dn−1]D2k(1 + 2D)n−k

=1n

(nk

)[Dn−2k−1](1 + 2D)n−k =

1n

(nk

)(n− k

n− 2k − 1

)2n−2k−1

=1n

(nk

)(n− kk + 1

)2n−2k−1:

This distribution is strongly related to (a). In fact, if in the formula for (a) we performthe transformation k → k + 1 we have

1n

(n

k + 1

)(n− k − 1

k

)2n−2k−1 =

1n

(n

n− k − 1

)(n− k − 1

k

)2n−2k−1

=1n

(nk

)(n− k

n− 2k − 1

)2n−2k−1

Page 7: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 217

which is distribution (d); therefore, this latter statistic is a simple translation of theformer.The fourth case is (a; b)= (a; c) and, as we have already seen, it corresponds to the

Narayana statistic.Finally, the 3fth case is (a; d). Here we have

D= t(w + D + D + wD2)= t(w(1 + D2) + 2D)

and the LIF gives

[tnwk ]D=1n[Dn−1wk ](w(1 + D2) + 2D)n=

1n

(nk

)[Dn−1](1 + D2)k2n−kDn−k

=1n

(nk

)2n−k [Dk−1](1 + D2)k =

1n

(nk

)(k

(k − 1)=2

)2n−k :

As usual, a non-integer “denominator” in a binomial coe8cient gives a zero result.This distribution is also related to (a); in fact, by substituting in the last formula 2k−1to k, we obtain

1n

(n

2k − 1

)(2k − 1k − 1

)2n−2k+1 =

1n

(nk

)(n− k

n− 2k + 1

)2n−2k+1

which is exactly the formula for distribution (a). The reader is invited to write down theupper part of the corresponding triangle. As is apparent from the formula, columns withodd indices are all zero, while columns with even indices coincide with the non-nullcolumns of (a).These are all the possible Lagrange statistics for Dyck paths (or words); they are

reduced to three main cases: the Narayana distribution, trinomial distribution and (a)distribution. Obviously, they do not exhaust the possible statistics on Dyck paths, butsurely they are relatively easy to study, since we have obtained explicit formulas forthem. In the next section, we will study the trinomial distribution and observe that asimilar study could also be done for (a) distribution.

3. The trinomial statistics

Central trinomial coe6cients {Tn}n∈N = {1; 1; 3; ; 7; 19; 51; : : :}, i.e., coe8cients[tn](1+ t+ t2)n; n∈N, were studied for the 3rst time by Euler. In some way, they arerelated to binomial coe8cients and, in fact, by taking Tn;k = [tn−k ](1 + t + t2)n, weobtain an in3nite triangle T = {Tn;k}n;k∈N that is very similar to Pascal’s triangle (seeTable 4). Each of its elements not belonging to column 0 is obtained by summing upthe three elements above it. Furthermore, by an obvious argument, column 0’s elementsare obtained by summing the previous element in the same column and the elementin the previous row and column 1 multiplied by two. In the literature, the elementsof this triangle are known as the trinomial coe6cients but, here on, we will use thisname to denote central trinomial coe8cients Tn.

Page 8: Some statistics on Dyck paths

218 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

Table 4The trinomial coe8cients

n=k −5 −4 −3 −2 −1 0 1 2 3 4 5

0 11 1 1 12 1 2 3 2 13 1 3 6 7 6 3 14 1 4 10 16 19 16 10 4 15 1 5 15 30 45 51 45 30 15 5 1

Let us call T = {Tn;k}n;k∈N this array; in the theory of Riordan Arrays (see Merliniet al., 1997; Sprugnoli, 1994) this means that T is actually a Riordan Array withA-sequence 1 + t + t2 and Z-sequence 1 + 2t; from these sequences the form of theRiordan Array is easily deduced:

T =

(1√

1− 2t − 3t2;1− t −√

1− 2t − 3t2

2t2

):

In other words, the generating function of column k is

Tk(t)=1√

1− 2t − 3t2

(1− t −√

1− 2t − 3t2

2t2

)k

and, in particular, the generating function for columns 0 and 1 are

T0(t)=1√

1− 2t − 3t2; T1(t)= PT (t)=

1− t −√1− 2t − 3t2

2t√1− 2t − 3t2

:

Before proceeding with this summary of trinomial coe8cient properties, we wish toobserve that the triangle T can also be obtained as a result in a lattice path problem.Let us consider underdiagonal lattice paths in Z2 composed by the coloured steps(1; 0; black), (1; 1; black), (1; 2; black) and, for steps ending on the main diagonalx−y=0; by (1; 1; black), (1; 2; black), (1; 2; red). Then Tn;k represents the number of(coloured) paths starting at the origin and ending at (n; n − k). In this interpretation,trinomial coe8cients count the number of coloured paths ending on the main diagonal.Coming back to the original de3nition of the trinomial triangle, we immediately seethat the elements in column 1 are de3ned as [tn−1](1 + t + t2)n; and therefore are justthe coe8cients appearing in our formula for the trinomial distribution of Dyck paths.This justi3es our notation PT (t) for T1(t). Since trinomial coe8cients are better knownthan the coe8cients PTn (which will be called the sub-trinomial coe6cients), let usbegin our study by expressing PTn in terms of the Tn’s.

Lemma 3.1. The sub-trinomial coe6cients PTn= [tn−1](1 + t + t2)n can be expressedin terms of the trinomial coe6cients by the formula:

PTn=Tn+1 − Tn

2:

Page 9: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 219

Proof. As already observed and as an immediate consequence of the de3nitions, Tn+1

is the sum of the previous element Tn plus twice the element of column 1 in theprevious row, i.e., PTn. This gives Tn+1 =Tn+2 PTn; which is just what we were lookingfor.

As a consequence of this lemma, formula (4) for trinomial distribution can be writtenin the form:

�n;k =1n

(nk

)Tn−k+1 − Tn−k

2; n¿ 0: (5)

We can now give a complete characterization of the trinomial statistics by 3nding botha bivariate generating function for it and some relations allowing us to build the wholetriangle one element after the other.

Theorem 3.2. The bivariate generating function �(t; w) for the trinomial distribution(5) is

�(t; w)=1 + t − tw −√

1− 2t − 3t2 − 2tw + 2t2w + t2w2

2t

and therefore; the generating function of column 0 is

�(t)=1 + t −√

1− 2t − 3t2

2t:

Proof. From Section 2, we know that the functional equation de3ning �(t; w); therewritten D (except for the constant term 1 corresponding to the empty path), is �=1+t(1 + w�+ �+ �2); therefore, the function �(t; w) is the solution of this second degreeequation in � having �(0; 0)=1. By a simple application of the de l’Hopital theorem,we see that the solution with a sign − in front of the square root is the right one, andthis proves the formula for �(t; w). The generating function for the column 0 is nowobtained 1 as �(t; 0)= �(t) and this proves the second formula.

The bivariate generating function mathematically characterizes an in3nite triangle likeour trinomial statistics; however, it is di8cult to try to extract from it useful informationon the distribution. Some system of Computer Algebra is good for developing it intoa McLaurin series:

�(t; w)= 1 + t(1 + w) + t2(2 + 2w + w2) + t3(4 + 6w + 3w2 + w3) + O(t4)

but a human would prefer some simpler characterization. For example, we can lookfor recurrence relations allowing us to build the elements �n;k one after the other, as

1 If F(t; w) is the generating function of a lower triangular array, we consistently use F(t)=F(t; 0) forthe generating function of column 0.

Page 10: Some statistics on Dyck paths

220 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

the recurrence(nk

)=(n− 1k − 1

)+(n− 1k

)

does for binomial coe8cients in the Pascal triangle.We will now 3nd out a recurrence for the elements in column 0; later on we will

be able to determine a relation between �n+1; k+1 and �n;k . Therefore, we can build thewhole triangle by 3rst expanding column 0 up to the index we are interested in, andthen by extending the triangle to all the other columns. Let us denote {�0; �1; �2; : : :}the coe8cients of the column 0 in the triangle of the trinomial distribution; they willbe called the co-trinomial coe6cients.Before going on with our characterization of co-trinomial coe8cients, let us observe

that they are just a modi3ed version of the Motzkin numbers. If we denote by Mn thenth Motzkin number, their generating function is obtained from the de3nition given inSection 1:

M (t)=1− t −√

1− 2t − 3t2

2t2;

thus we have

�(t)= 1 + tM (t) or �n+1 =Mn; ∀n¿ 0:

The connection between Motzkin numbers and central trinomial coe8cients obviouslyderives from the common radicand

√1− 2t − 3t2 appearing in their generating func-

tions. In Barcucci et al. (1991), explicit relations are proved, as

Mn= 12(3Tn + 2Tn+1 − Tn+2)=

Tn+1 + 3Tn2(n+ 2)

or, for all the elements in the Motzkin and trinomial coe8cient triangles

Mn;k =Tn;k − Tn;k+2:

The recurrence relation de3ning Motzkin numbers (M0 =M1 = 1) is

Mn=1

n+ 2((2n+ 1)Mn−1 + 3(n− 1)Mn−2);

therefore the recurrence for co-trinomial coe8cients is very similar and could be ob-tained by simply setting n→ n− 1. However, we obtain it by a classical method:

Theorem 3.3. The co-trinomial coe6cients {�n}n∈N are de8ned by the second orderrecurrence:

�n=1

n+ 1((2n− 1)�n−1 + 3(n− 2)�n−2)

and the initial conditions �0 = 1; �1 = 1:

Page 11: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 221

Proof. The generating function �(t) is algebraic and therefore, its coe8cients shouldsatisfy a recurrence relation with polynomial coe8cients. To 3nd this relation we canfollow this standard method. We try to 3nd out a diNerential equation p1(t)�′(t) +p2(t)�(t)= 1 of which �(t) is a solution. The two coe8cients p1(t) and p2(t) shouldbe rational functions in t. If we denote by Q the radical

√1− 2t − 3t2, we easily

obtain

�(t)=1− t − Q

2t; �′(t)=

(1− t)Q − (1− 2t − 3t2)2t2(1− 2t − 3t2)

:

If we now substitute these expressions in the diNerential equation, we can separate therational part (not depending on Q) from the coe8cients of Q (the irrational part). Wethus obtain a system of equations:

(1− t)p1(t)− t(1− 2t − 3t3)p2(t)= 0;

(1− 2t − 3t2)p1(t)− t(1− 3t − t2 + 3t3)p2(t)=− 2t2(1− 2t − 3t2)

and by solving it, we have

p1(t)=1− 2t − 3t2

2; p2(t)=

1− t2t

:

The diNerential equation is, therefore, (1− 2t− 3t2)t�′(t)+ (1− t)�(t)= 2t and we canextract the coe8cients of [tn] from both sides:

[tn−1]�′(t)− 2[tn−2]�′(t)− 3[tn−3]�′(t) + [tn]�(t)− [tn−1]�(t)= 2�n;1;

n�n − 2(n− 1)�n−1 − 3(n− 2)�n−2 + �n − �n−1 = 2�n;1:

The Kronecker � on the right-hand side member has no meaning, because this relationshould be valid for n¿ 1; therefore, we can substitute it by 0 and this immediatelygives the desired recurrence. The initial conditions are easily determined by directconsiderations.

We must now determine a relation connecting an element to the elements in theprevious column. This is particularly easy for our trinomial statistics, for which wehave:

Theorem 3.4. The coe6cients �n;k in the triangle of trinomial distribution are relatedto one another by the formula:

�n+1; k+1 =n

k + 1�n;k

which allows us to build the whole triangle starting from column 0. As a consequence;the generating functions for columns are related by

�k+1(t)=t2

k + 1�′k(t); k¿ 0:

Page 12: Some statistics on Dyck paths

222 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

Proof. We immediately have, from Section 1,

�n+1; k+1 =1

n+ 1

(n+ 1k + 1

)PTn+1−k−1 =

1k + 1

(nk

)PTn−k =

nk + 1

1n

(nk

)PTn−k

which is the relation desired. We can now look at this formula as a relation betweenthe coe8cients of the two generating functions �k(t) and tk+1(t) for columns k andk +1. From the theory of formal power series, we know that if f(t) is the generatingfunction for a sequence {fn}n∈N; then we have

G{fn+1}= f(t)− f0

tand G{nfn}= tf′(t);

where G denotes the “generating function operator”. For k ¿ 0, the coe8cient ofposition 0 in every column is zero, and therefore the relation above translates into

�k+1(t)t

=1

k + 1t�′k(t)

which is the formula desired. Obviously, for k =0 we have �1(t)= PT (t)= t2�′(t).

Note the analogous formulas for Narayana and (a) distributions (denoted by �n;k and!n;k , respectively):

�n+1; k+1 =n(n+ 1)k(k + 1)

�n;k ; !n+1; k+1 =n(n− 2k + 1)2k(k + 1)

!n;k :

We conclude our general consideration on the trinomial distribution by proving that itis unimodal. More speci3cally, we prove:

Theorem 3.5. The trinomial distribution is unimodal and in row n it attains itsmaximum for k ≈ n=4.

Proof. Let us study the ratio between two consecutive coe8cients of the trinomialstatistics in row n. We obviously have

�n;k+1

�n;k=

1=n(

nk + 1

)PTn−k−1

1=n(nk

)PTn−k

=k!(n− k)!

(k + 1)!(n− k − 1)!

PTn−k−1

PTn−k=n− kk + 1

PTn−k−1

PTn−k:

The generating function PT (t) has two algebraic singularities at t=1 and t= 13 . This

latter singularity is dominant and therefore the radius of convergence for PT (t) is 13 .

This means that the ratio of two consecutive coe8cients of PT (t) approaches 13 as their

indices tend to in3nity. Therefore, for large values of n, we can substitute 13 to the

ratio PTn−k−1= PTn−k and see when the corresponding ratio �n;k+1=�n;k is (approximately)equal to 1:

n− kk + 1

13≈ 1 or n− k ≈ 3k + 3 or k ≈ n− 3

4:

Page 13: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 223

Let us call Pk this special value of k. When k ¡ Pk the ratio Tn;k+1=Tn;k is greater than1, and therefore the values in row n are increasing; when k ¿ Pk the ratio is smallerthan 1 and the values are decreasing. This shows that the distribution is unimodal andattains its maximum for k = Pk ≈ n=4, as desired.

We conclude here our study of the basic properties of the trinomial distribution. Inthe next section, we will prove some more speci3c properties and consequences of theconsiderations above.

4. Other properties of the trinomial distribution

In the previous sections, we have taken into consideration three main sequences: tri-nomial coe8cients {Tn}n∈N; sub-trinomial coe8cients { PTn}n∈N and co-trinomial coe8-cients {�n}n∈N. In order to complete our study, we now give some general informationon these quantities, i.e., we show recurrence relations allowing us to compute thesecoe8cients in an e8cient way and give their asymptotic values, showing how theygrow and providing an easy method to compute their approximations. These formulasare well-known in the particular case of trinomial coe8cients and we give them hereonly to show their connection and similarity with the other quantities.

Theorem 4.1. Trinomial; sub-trinomial and co-trinomial coe6cients satisfy the fol-lowing recurrence relations with initial conditions:

nTn=(2n− 1)Tn−1 + 3(n− 1)Tn−2; T0 = 1; T1 = 1;

(n2 − 1) PTn= n(2n− 1) PTn−1 + 3n(n− 1) PTn−2; PT 0 = 0; PT 1 = 1;

(n+ 1)�n+1 = (2n− 1)�n−1 + 3(n− 2)�n−2; �0 = 1; �1 = 1:

Proof. The recurrence relation for co-trinomial coe8cients was found in Theorem 3.3of the previous section and is repeated here for the sake of completeness. The re-currence relation for trinomial coe8cients is well-known (see, e.g., Barcucci et al.,1991) and is easily obtained with the method shown in the proof of Theorem 3.3.By Theorem 3.4, for k =0 we have �n+1 = n�n;0; however, by de3nition �n;0 = �n andas already observed �n+1;1 = PTn. Therefore, we have �n= PTn=n and by substituting thisformula into the recurrence relation for �n; we immediately 3nd the recurrence relationfor sub-trinomial coe8cients.

Concerning the asymptotic values, we use the standard method of singularities ingenerating functions. In all three cases, the generating functions T (t); PT (t) and �(t)have a dominating singularity at t= 1

3 :

Page 14: Some statistics on Dyck paths

224 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

Theorem 4.2. Trinomial; sub-trinomial and co-trinomial coe6cients have the follow-ing asymptotic expressions:

Tn=

√32

(2nn

)(34

)n(1− 1

8(2n− 1)+

9128(2n− 1)(2n− 3)

+ O(

1n3

));

PTn =

√36

(2n+ 2n+ 1

)(34

)n+1

×(1− 5

8(2n+ 1)+

33128(2n+ 1)(2n− 1)

+ O(

1n3

));

�n=

√3

2n− 1

(2nn

)(34

)n(1− 21

8(2n− 3)+

1665128(2n− 3)(2n− 5)

+ O(

1n3

)):

Proof. The three generating functions are to be developed around t= 13 ; this is easily

done by hand and more easily done by some system of computer algebra:

T (t)=1√

1− 2t − 3t2=

√32

1√1− 3t

+

√3

16

√1− 3t +

3√3

256(1− 3t)3=2 + · · · ;

PT (t) =1− t −√

1− 2t − 3t2

2t√1− 2t − 3t2

=

√36

1√1− 3t

− 12+

5√3

48

√1− 3t

+11√3

768(1− 3t)3=2 + · · · ;

�(t) = 1−√3√1− 3t +

32(1− 3t)− 7

√3

8(1− 3t)3=2 +

32(1− 3t)2

− 111√3

128(1− 3t)5=2 + · · · :

We can now extract the coe8cient of tn by remembering the classical identities connect-ing binomial coe8cients with half-integer numerator and central binomial coe8cients:(−1=2

n

)=

(−1)n

4n

(2nn

);

(3=2n

)=

(−1)n34n(2n− 1)(2n− 3)

(2nn

);

(1=2n

)=

(−1)n−1

4n(2n−1)

(2nn

);

(5=2n

)=

(−1)n−1154n(2n−1)(2n−3)(2n−5)

(2nn

):

From the coe8cients appearing in the corrections inside the parentheses, we deduce thatthe approximation for trinomial coe8cients is good, the approximation of sub-trinomialcoe8cients is not so good and the one for co-trinomial coe8cients is relatively poor.

The next result we wish to 3nd is the average number of subsequences 010 in theDyck words of length n. We begin with the following:

Page 15: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 225

Lemma 4.3. The total number of subsequences 010 in all the Dyck words of lengthn is given by

Pn=12

(2nn

)−(2n− 1n− 1

)=

n− 12(2n− 1)

(2nn

); n¿ 0:

Proof. Let us consider the triangle of trinomial distribution; the number Pn is theweighted sum of row n or, in other words, if �n(w) is the generating function of rown; Pn= [�′n(w) |w=1], i.e., the evaluation at w=1 of the derivatives for �n(w). In aglobal way, we can 3nd the generating function P(t)=G{Pn}n∈N by diNerentiating thebivariate generating function �(t; w) with respect to w and then by substituting w=1.We 3nd

@�(t; w)@w

=12t

(t − t2 − t2w√

1− 2tw − 2t + t2w2 + 2t2w − 3t2− t);

and therefore

P(t)=[@�(t; w)@w

∣∣∣∣ w=1]=

1− 2t −√1− 4t

2√1− 4t

:

From this generating function, we 3nd

Pn= [tn](

1√1− 4t

− t√1− 4t

− 12

)=

12

(2nn

)−(2n− 2n− 1

)− 1

2�n;0

valid for every n. The last passage is obvious.

The announced average value is now easily found:

Theorem 4.4. The average number of subsequences 010 in the Dyck paths of semi-length n is given by

aven=n2 − 1

2(2n− 1)=n4+

18+ O

(1n

):

Proof. The total number of Dyck words of length n is the nth Catalan number Cn=(2nn )=(n+1) and therefore aven is obtained by dividing the Pn found in the previous lemmaby Cn.

We conclude our paper by solving the following problem. In the previous section,we have seen that the trinomial distribution attains its maximum at k ≈ n=4 in row n;we wish to 3nd the asymptotic value of this maximum as n→ ∞. This is not di8cultif we use the asymptotic value for PTn obtained in Theorem 4.2 and use the Stirlingapproximation to evaluate binomial coe8cients in the formula for �n;k .

Theorem 4.5. The asymptotic value for the maximum value in row n is given by

�n; (n−3)=4 ∼2√

234

n

&n2:

Page 16: Some statistics on Dyck paths

226 D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227

Proof. The formula for �n; (n−3)=4 is

�n; (n−3)=4 =1n

(n

(n− 3)=4

)PTn−(n−3)=4 =

1n

(n

(n− 3)=4

)PT 3(n+1)=4:

Theorem 4.2 now gives

PT 3(n+1)=4 ∼√36

(3(n+ 1)=2 + 23(n+ 1)=4 + 1

)(34

)3(n+1)=4+1

∼√36

43(n+1)=4+1√&(1 + 3(n+ 1)=4)

(34

)3(n+1)=4+1

∼ 33(n+1)=4√&(n+ 7=3)

:

The most di8cult part is the binomial coe8cient:(n

(n− 3)=4

)

∼√2&nnn=en√

2&(n− 3)=4((n− 3)=4e)(n−3)=4√2&3(n+ 1)=4(3(n+ 1)=4e)3(n+1)=4

=√nnn√

6&(n− 3)(n+ 1)=16((n− 3)=4)(n−3)=4(3(n+ 1)=4)3(n+1)=4

=

√8n

3&(n2 − n− 3)nn4n

(n− 3)(n−3)=4(n+ 1)3(n+1)=433(n+1)=4 :

We now take the logarithm of the parts with exponents:

exp(n ln n− n− 3

4ln(n− 3)− 3

4(n+ 1) ln(n+ 1)

)

=exp(n ln n− n− 3

4ln(n(1− 3

n

))− 3

4(n+ 1) ln

(n(1 +

1n

)))

=exp(n ln n− n− 3

4ln n+

n− 34

3n− n− 3

492n2

+ · · · − 34(n+ 1)ln n

− 34(n+ 1)

1n+

34n+ 12n2

+ · · ·)

and obtain(n

(n− 3)=4

)∼√

83&(n− 1)

4ne−15=4n

33(n+1)=4 ∼√

83&n

4n

33(n+1)=4

(1− 15

4n+ · · ·

):

Finally, we can join these partial results

�n; (n−3)=4 ∼ 33(n+1)=4√&(n+ 7=3)

√8

3&(n− 1)4n

33(n+1)=4 ∼2√

234

n

&n2;

which is our 3nal result.

Page 17: Some statistics on Dyck paths

D. Merlini et al. / Journal of Statistical Planning and Inference 101 (2002) 211–227 227

5. Uncited References

Sulanke, 1998.

References

Barcucci, E., Pinzani, R., Sprugnoli, R., 1991. The Motzkin family. Pure Math. Appl. 2, 249–279.Donaghey, R., Shapiro, L.W., 1977. Motzkin numbers. J. Combin. Theory Ser. A 23, 291–301.Gould, H.W., 1977. Research Bibliography of Two Special Sequences. Rev. ed. Combinatorial Research

Institute, Morgantown, WV.Goulden, I.P., Jackson, D.M., 1983. Combinatorial Enumeration. Wiley, New York.Labelle, J., 1993. On pairs of non-crossing generalized Dyck paths. J. Statist. Plann. Inference 34, 209–217.Labelle, J., Yeh, Y., 1990. Generalized Dyck paths. Discrete Math. 82, 1–6.Merlini, D., Sprugnoli, R., Verri, M.C., 1994. Algebraic and combinatorial properties of simple, coloured

walks. In: Proceedings of CAAP’94, Lecture Notes in Computer Science, Vol. 787. Springer, Berlin,pp. 218–233.

Merlini, D., Sprugnoli, R., Verri, M.C., 1996. The area determined by underdiagonal lattice paths. In:Proceedings of CAAP’96, Lecture Notes in Computer Science, Vol. 787. Springer, Berlin, pp. 59–71.

Merlini, D., Rogers, D.G., Sprugnoli, R., Verri, M.C., 1997. On some alternative characterizations of Riordanarrays. Canad. J. Math. 49 (2), 301–320.

SchJutzenberger, M.P., 1963. Context-free language and pushdown automata. Inform. and Control 6, 246–264.Sprugnoli, R., 1994. Riordan arrays and combinatorial sums. Discrete Math. 132, 267–290.Stanley, R.P., 1999. Enumerative Combinatorics, Vol. 2. Cambridge University Press, Cambridge.Sulanke, R.A., 1998. Catalan path statistics having the Narayana distribution. Discrete Math. 180, 369–389.Sulanke, R.A., 1999. Constraint sensitive Catalan path statistics having the Narayana distribution. Discrete

Mathematics 204, 397–414.Viennot, X., 1983. Une thVeorie combinatoire des polynomes orthogonaux gVenVeraux. UQAM.