# 4.4: Orthogonality and Normalization - Mathematics

Consider the series

[frac{a_0}{2} + sum_{n=1}igg[a_ncosigg(frac{npi x}{L}igg) + b_nsinigg(frac{npi x}{L}igg)igg], hspace{3cm} -L leq x leq L.]

This is called a trigonometric series. If the series approximates a function f (as will be discussed) it is called a Fourier series and a and b are the Fourier coefficients of f.

In order for all of this to make sense we first study the functions

[{1,cosigg(frac{npi x}{L}igg), sinigg( frac{npi x}{L}igg)}, hspace{3 cm} n=1,2,dots,]

and especially their properties under integration. We find that

[ int_{-L}^L 1cdot 1 dx = 2L,]

[ int_{-L}^L 1 cdot cosigg(frac{npi x}{L}igg) dx = 0]

[ int_{-L}^L 1 cdot sinigg(frac{npi x}{L}igg) dx = 0]

[ egin{align} int_{-L}^L cosigg(frac{mpi x}{L}igg) cdot cosigg(frac{npi x}{L}igg) dx & = frac{1}{2}int_{-L}^L cosigg(frac{(m+n)pi x}{L}igg) + cosigg(frac{(m-n)pi x}{L}igg) dx & = igg{ egin{array}{lr} 0 & mbox{if } n leq m L & mbox{if } n=m end{array}end{align}, ]

[ egin{align} int_{-L}^L sinigg(frac{mpi x}{L}igg) cdot sinigg(frac{npi x}{L}igg) dx & = frac{1}{2}int_{-L}^L cosigg(frac{(m+n)pi x}{L}igg) + cosigg(frac{(m-n)pi x}{L}igg) dx & = igg{ egin{array}{lr} 0 & mbox{if } n leq m L & mbox{if } n=m end{array}end{align}, ]

[egin{align} int_{-L}^L cosigg(frac{mpi x}{L}igg) cdot sinigg(frac{npi x}{L}igg) dx & = frac{1}{2}int_{-L}^L cosigg(frac{(m+n)pi x}{L}igg) + cosigg(frac{(m-n)pi x}{L}igg) dx & = igg{ egin{array}{lr} 0 & mbox{if } n leq m L & mbox{if } n=m end{array} end{align},]

If we consider these integrals as some kind of inner product between functions (like the standard vector inner product) we see that we could call these functions orthogonal. This is indeed standard practice, where for functions the general definition of inner product takes the form

[(f,g) = int_a^b w(x)f(x)g(x)dx.]

If this is zero we say that the functions f and g are orthogonal on the interval [ab] with weight function w. If this function is 1, as is the case for the trigonometric functions, we just say that the functions are orthogonal on [ab].

The norm of a function is now defined as the square root of the inner-product of a function with itself (again, as in the case of vectors),

[ orm{f} = sqrt{int_a^b w(x)f(x)^2dx}.]

If we define a normalised form of f (like a unit vector) as ( f/ orm{f}), we have

[ orm{frac{f}{ orm{f}}} = sqrt{frac{int_a^bw(x)f(x)^2dx}{ orm{f}^2}}=frac{sqrt{int_a^b w(x)f(x)^2dx}}{ orm{f}}=frac{ orm{f}}{ orm{f}}=1.]

Exercise (PageIndex{1})

What is the normalised form of (ig{1, cosig(frac{npi x}{L}ig), sinig(frac{npi x}{L}ig)ig}?)

( ig{frac{1}{sqrt{2L}}, ig(frac{1}{sqrt{L}}ig)cosig(frac{n pi x}{L}ig),ig(frac{1}{sqrt{L}}ig)sinig(frac{n pi x}{L}ig) ig})

A set of mutually orthogonal functions that are all normalised is called an orthonormal set.

## Chapter 3 Orthogonality

if vector (ar) is orthogonal to every vector in a subspace (W) of (mathbb) , then (ar) is said to be orthogonal to (W) . The subspace that contains the set of vectors that are orthogonal to (W) is called the orthogonal complement, denoted by (W^) .

This corresponds to discussions in Section 2.4, where

It’s easy to verify that (W^) is closed under scalar multiplication, and under vector addition, and that any vector in (W) has (n) components. So that (W^) is a subspace of (mathbb^n)

### 3.1.2 Orthogonal Sets and Orthogonal Basis

An orthogonal set is a set of vectors (<ar_1, dots, ar_p>) in (mathbb) , in which each pair of distinct vectors is orthogonal: (ar_i^ ar_j = 0 quad i ot = j) . Note that the set do not necessarily span the whole (mathbb) , but a subspace (W) .

Since vectors in orthogonal sets is mutually perpendicular, they must also be linearly independent and could form a basis for a subspace (W) . In such case, they are called orthogonal basis.

There is a particular advantage in using orthogonal basis rather than other basis, because we can find a easy representation of any vector in (W) .

Theorem 3.1 For each (ar) in (W) , there exists a linear combination

[ y = c_1ar_1 + cdots + c_par_p ]

[ c_i = frac <arcdot ar_i><ar_i cdot ar_i> quad i = 1, cdots, p ]

where (<ar_1, dots, ar_p>) is an orthogonal basis.

[ egin ar_1 cdot ar &= ar_1 cdot (c_1ar_1 + cdots + c_par_p) &= c_1 ar_1 cdot ar_1 end ] So:

Derivations for other (c_i) is similar.

### 3.1.3 Orthogonal Decomposition

Orthogonal decomposition split (ar) in (mathbb) into two vectors, one in (W) and one in its orthogonal compliment (W^) .

Theorem 3.2 Let (mathbb^n) be a inner product space and (W) and subspace of (mathbb^n) . Then every (ar) in (W) can be written uniquely in the form

Let (ar_1, . ar_m) be a orthonormal basis for (W) , there exists linear combination according to Section 3.1.2

[ ar_w = (ar cdot ar_1)ar_1 + cdots + (ar cdot ar_m)ar_m ] and

[ ar_ = ar - ar_w ] It is clear that (ar_W in W) . And we can also show that (ar_) is perpendicular to (W)

[ egin ar_ cdot ar_i &= [ar- (ar cdot ar_1)ar_1 - cdots - (ar cdot ar_m)ar_m] cdot ar_i &= (ar cdot ar_1) - [(ar cdot ar_i)ar_i cdot ar_i] &= 0 end ]

To prove that (ar_w) and (ar_) are unique (does not depend on the choice of basis), let (ar_1', . ar_m') be another orthonormal basis for (W) , and define (ar_w') and (ar_') similarly we want to get (ar_w' = ar_w) and (ar_' = ar_) .

[ underbrace<ar_w - ar_w'>_ = underbrace<ar_' - ar_>_> ] From the orthogonality of these subspaces, we have

The existence and uniqueness of the decomposition above mean that

## Computational Methods for Modelling of Nonlinear Systems

### Theorem 61

Let v1, …, vp be orthogonal vectors determined by Lemma 34 of Section 5.7.4. Then f 0 and F 1 , … , 0 F p 0 , satisfying (7.140) – (7.141) , are determined by

with an arbitrary matrix Q k ∈ ℝ t × ( n − t ) .

The accuracy associated with transform T p 0 given by (7.142) and (7.147)–(7.176) is such that

Proof. If v1, …, vp are determined by Lemma 34 , then J (f, ℱ1, …, ℱp) is still represented by (7.150) . Let us consider J0, J1 and J2 given by

with J (f, ℱ1, …, ℱp) defined by (7.150) , we use the relationships (see Section 4.4.1)

due to orthogonality of the vectors v1, …, vsk.

On the basis of (7.183)–(7.185) and similarly to (7.157) – (7.157) , we establishe that (7.182) is true. Hence,

It follows from the last two terms in (7.187) that the constrained minimum (7.140) – (7.141) is achieved if f = f 0 with f 0 given by (7.174) , and F k 0 is such that

Therefore, the constrained minimum (7.140) – (7.141) is achieved if f = f 0 where f0 is defined by (7.174) , and if

The latter follows from Theorem 54 and Remarks 29 and 30 . Thus, (7.175) – (7.176) are true.

Then (7.178) follows from (7.187) , (7.189) , (7.174) and (7.190) .

## What's the difference between Normalization and Standardization?

At work we were discussing this as my boss has never heard of normalization. In Linear Algebra, Normalization seems to refer to the dividing of a vector by its length. And in statistics, Standardization seems to refer to the subtraction of a mean then dividing by its SD. But they seem interchangeable with other possibilities as well.

When creating some kind of universal score, that makes up $2$ different metrics, which have different means and different SD's, would you Normalize, Standardize, or something else? One person told me it's just a matter of taking each metric and dividing them by their SD, individually. Then summing the two. And that will result in a universal score that can be used to judge both metrics.

For instance, say you had the number of people who take the subway to work (in NYC) and the number of people who drove to work (in NYC).

$ext longrightarrow x$ $ext longrightarrow y$

If you wanted to create a universal score to quickly report traffic fluctuations, you can't just add $ext(x)$ and $ext(y)$ because there will be a LOT more people who ride the train. There's 8 million people living in NYC, plus tourists. That's millions of people taking the train everyday verse hundreds of thousands of people in cars. So they need to be transformed to a similar scale in order to be compared.

Would you normalize $x$ & $y$ then sum? Would you standardize $x$ & $y$ then sum? Or would you divide each by their respective SD then sum? In order to get to a number that when fluctuates, represents total traffic fluctuations.

Any article or chapters of books for reference would be much appreciated. THANKS!

Also here's another example of what I'm trying to do.

Imagine you're a college dean, and you're discussing admission requirements. You may want students with at least a certain GPA and a certain test score. It'd be nice if they were both on the same scale because then you could just add the two together and say, "anyone with at least a 7.0 can get admitted." That way, if a prospective student has a 4.0 GPA, they could get as low as a 3.0 test score and still get admitted. Inversely, if someone had a 3.0 GPA, they could still get admitted with a 4.0 test score.

But it's not like that. The ACT is on a 36 point scale and most GPA's are on 4.0 (some are 4.3, yes annoying). Since I can't just add an ACT and GPA to get some kind of universal score, how can I transform them so they can be added, thus creating a universal admission score. And then as a Dean, I could just automatically accept anyone with a score above a certain threshold. Or even automatically accept everyone whose score is within the top 95%. those sorts of things.

Would that be normalization? standardization? or just dividing each by their SD then summing?

The length (or norm ) of vector ( extbf x = egin x_1 x_2 . . . x_n end ) written as ( || extbf x || ) is given by
[ || extbf x || = sqrt = sqrt < extbf x cdot extbf x>]
From the above definition, we can easily conclude that
( || extbf x || ge 0 ) and ( || extbf x ||^2 = extbf x cdot extbf x )
A unit vector is a vector whose length (or norm) is equal to 1.

Vectors ( extbf x ) and ( extbf y ) are orthogonal if and only if
[ ||x+y||^2 = ||x||^2 + ||y||^2 ]

## Orthogonal polynomials

whereby the degree of every polynomial $P _$ is equal to its index $n$, and the weight function (weight) $h( x) geq 0$ on the interval $( a, b)$ or (when $a$ and $b$ are finite) on $[ a, b]$. Orthogonal polynomials are said to be orthonormalized, and are denoted by $< widehat _ >$, if every polynomial has positive leading coefficient and if the normalizing condition

$intlimits _ < a >^ < b >widehat <> _ ^ <2>( x) h( x) dx = 1$

is fulfilled. If the leading coefficient of each polynomial is equal to 1, then the system of orthogonal polynomials is denoted by $< widetilde _ >$.

The system of orthogonal polynomials $< widehat _ >$ is uniquely defined if the weight function (differential weight) $h$ is Lebesgue integrable on $( a, b)$, is not equivalent to zero and, in the case of an unbounded interval $( a, b)$, has finite moments

$h _ = intlimits _ < a >^ < b >x ^ h( x) dx.$

Instead of a differential weight $h$, an integral weight $d sigma ( x)$ can be examined, where $sigma$ is a bounded non-decreasing function with an infinite set of points of growth (in this case, the integral in the condition of orthogonality is understood in the Lebesgue–Stieltjes sense).

For the polynomial $P _$ of degree $n$ to be part of the system $< P _ >$ with weight $h$, it is necessary and sufficient that, for any polynomial $Q _$ of degree $m < n$, the condition

$intlimits _ < a >^ < b >P _ ( x) Q _ ( x) h( x) dx = 0$

is fulfilled. If the interval of orthogonality $( a, b)$ is symmetric with respect to the origin and the weight function $h$ is even, then every polynomial $P _$ contains only those degrees of $x$ which have the parity of the number $n$, i.e. one has the identity

The zeros of orthogonal polynomials in the case of the interval $( a, b)$ are all real, different and distributed within $( a, b)$, while between two neighbouring zeros of the polynomial $P _$ there is one zero of the polynomial $P _$. Zeros of orthogonal polynomials are often used as interpolation points and in quadrature formulas.

Any three consecutive polynomials of a system of orthogonal polynomials are related by a recurrence formula

$P _ ( x) = ( a _ x + b _ ) P _ ( x) - c _ P _ ( x), n = 1, 2 dots$

$P _ <1>( x) = mu _ <1>x + u _ <1>dots$

$P _ ( x) = mu _ x ^ + u _ x ^ + dots ,$

The number $d _ ^ <-1>$ is a normalization factor of the polynomial $P _$, such that the system $< d _ ^ <-1>P _ >$ is orthonormalized, i.e.

$d _ ^ <-1>P _ ( x) = widehat _ ( x).$

For orthogonal polynomials one has the Christoffel–Darboux formula:

Orthogonal polynomials are represented in terms of the moments $< h _ >$ of the weight function $h$ by the formula

$psi _ ( x) = left | egin h _ <0>&h _ <1>&dots &h _ h _ <1>&h _ <2>&dots &h _ cdot &cdot &dots &cdot h _ &h _ &dots &h _ <2n-1> 1 & x &dots &x ^ end ight | ,$

while the determinant $Delta _$ is obtained from $psi _ ( x)$ by cancelling the last row and column and $Delta _$ is defined in the same way from $psi _ ( x)$.

On a set of polynomials $widetilde _$ of degree $n$ with leading coefficient equal to one, the minimum of the functional

$F( widetilde _ ) = intlimits _ < a >^ < b >widetilde <> _ ^ <2>( x) h( x) dx$

is achieved if and only if

$widetilde _ ( x) equiv widetilde _ ( x)$

moreover, this minimum is equal to $d _ ^ <2>$.

If the polynomials $< P _ >$ are orthonormal with weight $h$ on the segment $[ a, b]$, then when $p > 0$, the polynomials

$widehat _ ( t) = sqrt p widehat _ ( pt+ q), n = 0, 1 dots$

are orthonormal with weight $h( pt+ q)$ on the segment $[ A, B]$ which transfers to the segment $[ a, b]$ as a result of the linear transformation $x = pt + q$. For this reason, when studying the asymptotic properties of orthogonal polynomials, the case of the standard segment $[- 1, 1]$ is considered first, while the results thus obtained cover other cases as well.

The most important orthogonal polynomials encountered in solving boundary problems of mathematical physics are the so-called classical orthogonal polynomials: the Laguerre polynomials $< L _ ( x alpha ) >$( for which $h( x) = x ^ alpha e ^ <-x>$, $alpha > - 1$, and with interval of orthogonality $( 0, infty )$) the Hermite polynomials $< H _ ( x) >$( for which $h( x) = mathop < m exp>(- x ^ <2>)$, and with interval of orthogonality $(- infty , infty )$) the Jacobi polynomials $< P _ ( x alpha , eta ) >$( for which $h( x) = ( 1- x) ^ alpha ( 1+ x) ^ eta$, $alpha > - 1$, $eta > - 1$, and with interval of orthogonality $[- 1, 1]$) and their particular cases: the ultraspherical polynomials, or Gegenbauer polynomials, $< P _ ( x, alpha ) >$( for which $alpha = eta$), the Legendre polynomials $< P _ ( x) >$( for which $alpha = eta = 0$), the Chebyshev polynomials of the first kind $< T _ ( x) >$( for which $alpha = eta = - 1/2$) and of the second kind $< U _ ( x) >$( for which $alpha = eta = 1/2$).

The weight function $h$ of the classical orthogonal polynomials $< K _ >$ satisfies the Pearson differential equation

whereby, at the ends of the interval of orthogonality, the conditions

$limlimits _ h( x) B( x) = limlimits _ h( x) B( x) = 0$

The polynomial $y = K _ ( x)$ satisfies the differential equation

$B( x) y ^ + [ A( x) + B ^ prime ( x)] y ^ prime - n[ p _ <1>+ ( n+ 1) q _ <2>] y = 0.$

For classical orthogonal polynomials one has the generalized Rodrigues formula

where $c _$ is a normalization coefficient, and the differentiation formulas

$frac L _ ( x alpha ) = - L _ ( x alpha + 1), frac H _ ( x) = 2nH _ ( x),$

$frac P _ ( x alpha , eta ) = frac<1> <2>( alpha + eta + n + 1 ) P _ ( x alpha + 1, eta + 1).$

For particular cases of the classical orthogonal polynomials one has representations using the hypergeometric function

$P _ ( x alpha , eta ) = left ( egin n+ a n end ight ) F left ( - n, n + alpha + eta + 1 alpha + 1 1- frac <2> ight ) ,$

$P _ ( x) = F left ( - n, n+ 1 1 1- frac <2> ight ) ,$

$T _ ( x) = F left ( - n, n frac<1> <2> 1- frac <2> ight ) ,$

$U _ ( x) = ( n+ 1) F left ( - n, n+ 2 frac<3> <2> 1- frac <2> ight )$

$L _ ( x alpha ) = left ( egin n end ight ) Phi (- n alpha + 1 x),$

$H _ <2n>( x) = (- 1) ^ ( 2n)! over Phi left ( - n frac<1> <2> x ^ <2> ight ) ,$

$H _ <2n+1>( x) = (- 1) ^ ( 2n+ 1)! over 2 x Phi left ( - n frac<3> <2> x ^ <2> ight ) .$

Historically, the first orthogonal polynomials were the Legendre polynomials. Then came the Chebyshev polynomials, the general Jacobi polynomials, the Hermite and the Laguerre polynomials. All these classical orthogonal polynomials play an important role in many applied problems.

The general theory of orthogonal polynomials was formulated by P.L. Chebyshev. The basic research apparatus used was the continued fraction expansion of the integral $intlimits _ < a >^ < b >frac dt$ the denominators of the convergents of this continued fraction form a system of orthogonal polynomials on the interval $( a, b)$ with weight $h$.

In the study of orthogonal polynomials, great attention is paid to their asymptotic properties, since the conditions of convergence of Fourier series in orthogonal polynomials depend on these properties.

The asymptotic properties of the classical orthogonal polynomials were first studied by V.A. Steklov in 1907 (see [8]). He used and perfected the Liouville method, which was previously used in the study of solutions of the Sturm–Liouville equation. The Liouville–Steklov method was subsequently widely used, as a result of which the asymptotic properties of the Jacobi, Hermite and Laguerre orthogonal polynomials have been studied extensively.

In the general case of orthogonality on $[- 1, 1]$ with arbitrary weight satisfying certain qualitative conditions, asymptotic formulas for orthogonal polynomials were first discovered by G. Szegö in 1920–1924. He introduced polynomials which were orthogonal on the circle, studied their basic properties and found an extremely important formula, representing polynomials orthogonal on $[- 1, 1]$ by polynomials orthogonal on the circle. In his study of the asymptotic properties of polynomials orthogonal on the circle, Szegö developed a method based on a special generalization of the Fejér theorem on the representation of non-negative trigonometric polynomials by using methods and results of the theory of analytic functions.

In 1930, S.N. Bernstein [S.N. Bernshtein] [2], in his research on the asymptotic properties of orthogonal polynomials, used methods and results of the theory of approximation of functions. He examined the case of a weight function of the form

where the function $h _ <0>( x)$, called a trigonometric weight, satisfies the condition

$0 < c _ <1>leq h _ <0>( x) leq c _ <2>< infty .$

If on the whole segment $[- 1, 1]$ the function $h _ <0>( x)$ satisfies a Dini–Lipschitz condition of order $gamma = 1 + epsilon$, where $epsilon > 0$, i.e. if

$| h _ <0>( x + delta ) - h _ <0>( x) | leq frac <| mathop< m ln>| delta | | ^ gamma > , x, x+ delta in [- 1, 1],$

then for the polynomials $< widehat _ >$ orthonormal with weight (1) on the whole segment $[- 1, 1]$, one has the asymptotic formula

$widehat _ ( x) = sqrt < frac<2> ( x) > > cos ( n heta + q) + O left [ frac<1> <( mathop< m ln>n ) ^ epsilon > ight ] ,$

where $heta = mathop < m arccos>x$ and $q$ depends on $heta$.

In the study of the convergence of Fourier series in orthogonal polynomials the question arises of the conditions of boundedness of the orthogonal polynomials, either at a single point, on a set $A subset [- 1, 1]$ or on the whole interval of orthogonality $[- 1, 1]$, i.e. conditions are examined under which an inequality of the type

$ag <2 >| widehat _ ( x) | leq M, x in A subseteq [- 1, 1] ,$

occurs. Steklov first posed this question in 1921. If the trigonometric weight $h _ <0>( x)$ is bounded away from zero on a set $A$, i.e. if

$ag <3 >h _ <0>( x) geq c _ <3>> 0, x in A subseteq [- 1, 1],$

and satisfies certain extra conditions, then the inequality (2) holds. In the general case,

$ag <4 >| widehat _ ( x) | leq epsilon _ sqrt n , epsilon _ ightarrow 0, x in [- 1, 1] ,$

follows from (3), when $A=[- 1, 1]$, without extra conditions.

The zeros of the weight function are singular points in the sense that the properties of the sequence $< widehat _ >$ are essentially different at the zeros and at other points of the interval of orthogonality. For example, let the weight function have the form

If the function $h _ <1>( x)$ is positive and satisfies a Lipschitz condition on $[- 1, 1]$, then the sequence $< widehat _ >$ is bounded on every segment $[ a, b] subset [- 1, 1]$ which does not contain the points $< x _ >$, while the inequalities

$| widehat _ ( x _ ) | leq c _ <4>( n+ 1) ^ /2 > , k = 1 dots m ,$

The case where the zeros of the weight function are positioned at the ends of the segment of orthogonality was studied by Bernstein [2]. One of the results is that if the weight function has the form

$h( x) = h _ <1>( x)( 1- x) ^ alpha ( 1+ x) ^ eta , x in [- 1, 1],$

where the function $h _ <1>( x)$ is positive and satisfies a Lipschitz condition, then for $alpha > - 1/2$, $eta > - 1/2$, the orthogonal polynomials permit the weighted estimation

while at the points $x = pm 1$ they increase at a rate $n ^$ and $n ^ <eta + 1/2 >$, respectively.

In the theory of orthogonal polynomials, so-called comparison theorems are often studied. One such is the Korous comparison theorem: If the polynomials $< widehat omega _ >$ are orthogonal with weight $p$ on the segment $[ a, b]$ and are uniformly bounded on a set $A subset [ a, b]$, then the polynomials $< widehat _ >$, orthogonal with weight $h = p cdot q$, are also bounded on this set, provided $q$ is positive and satisfies a Lipschitz condition of order $alpha = 1$ on $[ a, b]$. Similarly, given certain conditions on $q$, asymptotic formulas or other asymptotic properties can be transferred from the system $< widehat omega _ >$ to the system $< widehat _ >$. Moreover, if $q$ is a non-negative polynomial of degree $m$ on $[ a, b]$, then the polynomials $< widehat _ >$ can be represented by the polynomials $< widehat omega _ >$ using determinants of order $m+ 1$( see [8]). Effective formulas for orthogonal polynomials have also been obtained for weight functions of the form

where $Q _$ is an arbitrary positive polynomial on $[- 1, 1]$( see [8]). In most cases, the calculation of orthogonal polynomials with arbitrary weight is difficult for large numbers $n$.

#### References

 [1] P.L. Chebyshev, "Complete collected works" , 2 , Moscow-Leningrad (1947) pp. 103–126 314–334 335–341 357–374 (In Russian) [2] S.N. Bernshtein, "Collected works" , 2 , Moscow (1954) pp. 7–106 (In Russian) [3] Ya.L. Geronimus, "Orthogonal polynomials" Transl. Amer. Math. Soc. , 108 (1977) pp. 37–130 [4] P.K. Suetin, "Classical orthogonal polynomials" , Moscow (1979) (In Russian) [5] V.B. Uvarov, "Special functions of mathematical physics" , Birkhäuser (1988) (Translated from Russian) [6] H. Bateman (ed.) A. Erdélyi (ed.) et al. (ed.) , Higher transcendental functions , 2. Bessel functions, parabolic cylinder functions, orthogonal polynomials , McGraw-Hill (1953) [7] D. Jackson, "Fourier series and orthogonal polynomials" , Carus Math. Monogr. , 6 , Math. Assoc. Amer. (1971) [8] G. Szegö, "Orthogonal polynomials" , Amer. Math. Soc. (1975) [9] , Guide to special functions , Moscow (1979) (In Russian translated from English) [10] J.A. Shohat, E. Hille, J.L. Walsh, "A bibliography on orthogonal polynomials" , Nat. Acad. Sci. USA (1940)

See also Fourier series in orthogonal polynomials. Two other textbooks are [a3] and [a2]. See [a1] for some more information on the history of the classical orthogonal polynomials. Regarding the asymptotic properties of the classical orthogonal polynomials it should be observed that many workers (P.S. Laplace, E. Heine, G. Darboux, T.J. Stieltjes, E. Hilb, etc.) preceded Stekov, but he was the first to adapt Liouville's method.

See [a5] for state-of-the-art surveys of many aspects of orthogonal polynomials. In particular, the general theory of orthogonal polynomials with weight functions on unbounded intervals has made big progress, see also [a4].

You can input only integer numbers or fractions in this online calculator. More in-depth information read at these rules.

Vectors a and b are orthogonal if

You can input only integer numbers, decimals or fractions in this online calculator (-2.4, 5/7, . ). More in-depth information read at these rules.

Welcome to OnlineMSchool. This web site owner is mathematician Dovzhyk Mykhailo. I designed this web site and wrote all the mathematical theory, online exercises, formulas and calculators.

## Hermitian Operators

Since the eigenvalues of a quantum mechanical operator correspond to measurable quantities, the eigenvalues must be real, and consequently a quantum mechanical operator must be Hermitian. To prove this, we start with the premises that (&psi) and (&phi) are functions, (int d au) represents integration over all coordinates, and the operator (hat ) is Hermitian by definition if

This equation means that the complex conjugate of Â can operate on (&psi^*) to produce the same result after integration as Â operating on (&phi), followed by integration. To prove that a quantum mechanical operator (hat ) is Hermitian, consider the eigenvalue equation and its complex conjugate.

Note that (a^* = a) because the eigenvalue is real. Multiply Equation ( ef<4-38>) and ( ef<4-39>) from the left by (&psi^*) and (&psi), respectively, and integrate over the full range of all the coordinates. Note that (&psi) is normalized. The results are

Since both integrals equal (a), they must be equivalent.

The operator acting on the function,

produces a new function. Since functions commute, Equation ( ef<4-42>) can be rewritten as

Eigenfunctions of a Hermitian operator are orthogonal if they have different eigenvalues. Because of this theorem, we can identify orthogonal functions easily without having to integrate or conduct an analysis based on symmetry or other considerations.

(&psi) and (&phi) are two eigenfunctions of the operator Â with real eigenvalues (a_1) and (a_2), respectively. Since the eigenvalues are real, (a_1^* = a_1) and (a_2^* = a_2).

Multiply the first equation by (&phi^*) and the second by (&psi) and integrate.

Subtract the two equations in Equation ef <4-45>to obtain

The left-hand side of Equation ef <4-46>is zero because (hat ) is Hermitian yielding

[ 0 = (a_1 - a_2 ) int psi ^* psi , d au label <4-47>]

If (a_1) and (a_2) in Equation ef <4-47>are not equal, then the integral must be zero. This result proves that nondegenerate eigenfunctions of the same operator are orthogonal.

Two wavefunctions, (psi_1(x)) and (psi_2(x)), are said to be orthogonal if

Multiplying the complex conjugate of the first equation by (psi_(x)), and the second equation by (psi^*_(x)), and then integrating over all (x), we obtain

[ int_<-infty>^infty (A psi_a)^ast psi_ dx = a int_<-infty>^inftypsi_a^ast psi_ dx, label< 4.5.4>]

However, from Equation ( ef<4-46>), the left-hand sides of the above two equations are equal. Hence, we can write

By assumption, (a eq a'), yielding

In other words, eigenstates of an Hermitian operator corresponding to different eigenvalues are automatically orthogonal.

The eigenvalues of operators associated with experimental measurements are all real.

Draw graphs and use them to show that the particle-in-a-box wavefunctions for (psi(n = 2)) and (psi(n = 3)) are orthogonal to each other.

The two PIB wavefunctions are qualitatively similar when plotted

These wavefunctions are orthogonal when

and when the PIB wavefunctions are substituted this integral becomes

[egin int_0^L sqrt> sin left( dfrac<2n>x ight) sqrt> sin left( dfrac<2n>x ight) dx &= ? [4pt] dfrac<2> int_0^L sin left( dfrac<2>x ight) sin left( dfrac<3>x ight) &= ? end]

We can expand the integrand using trigonometric identities to help solve the integral, but it is easier to take advantage of the symmetry of the integrand, specifically, the (psi(n=2)) wavefunction is even (blue curves in above figure) and the (psi(n=3)) is odd (purple curve). Their product (even times odd) is an odd function and the integral over an odd function is zero. Therefore (psi(n=2)) and (psi(n=3)) wavefunctions are orthogonal.

This can be repeated an infinite number of times to confirm the entire set of PIB wavefunctions are mutually orthogonal as the Orthogonality Theorem guarantees.

From my perspective, two states are orthogonal if you have 0 probability to measure one on the states when the system is prepared in the other one.

Let us take a 3 levels system $<|psi_1 angle,|psi_2 angle,|psi_3 angle>$ and consider a state $|psi angle=sqrt<10>>|psi_1 angle + sqrt<10>>|psi_2 angle$. The state is not orthogonal to $|psi_1 angle$ or $|psi_2 angle$, has a measure would give 30% and 70% of chance to find those states. By contrast, there is no way to obtain $|psi_3 angle$ through the measurement, corresponding to $|psi angle$ and $|psi_3 angle$ being orthogonal.

What is the physical interpretation of orthogonality?

Think of the $x$, $y$ and $z$ coordinate system you use every day. No matter what mathematical manipulation we do, we cannot express one direction in terms of the other two. They are orthogonal and linearly independent.

The physical result of orthogonality is that systems can be constructed, in which the components of that system have their individual distinctiveness preserved.

Another example are the spherical harmonics functions, which are a complete set of orthogonal functions. They can be considered as a mathematical representation of the physical fact that the set of electron orbitals around, say a hydrogen atom, will always retain their distinctive arrangement.

This is just one example of the physical consequences of orthogonality, since as the previous answers state, it means that the states are independent from each other, that is we can always tell them apart.

## 4.7 Applications of Linear Algebra to Dynamical Systems: Markov Chains

Before going to the formal definition of Markov Chains in the textbook, let us introduce the topic with an example from a real life situation. Consider a city with two kinds of populations: the inner city population and the suburb population. We assume that every year 40% of the inner city population moves to the suburbs, while 30% of the suburb population moves to the inner part of the city. Let I 0 and S 0 denote the initial population of the inner city and the suburban area, respectively. Thus, after one year, the population of the inner city is given by

while the population of the suburbs is given by

After two years, the population of the inner city is given by

I 2 = 0 . 6 I 1 + 0 . 3 S 1 = 0 . 6 ( 0 . 6 I 0 + 0 . 3 S 0 ) + 0 . 3 ( 0 . 4 I 0 + 0 . 7 S 0 )

and the suburban population is given by

S 2 = 0 . 4 I 1 + 0 . 7 S 1 = 0 . 4 ( 0 . 6 I 0 + 0 . 3 S 0 ) + 0 . 7 ( 0 . 4 I 0 + 0 . 7 S 0 )

Representing these expressions in matrix notation, the populations after one year are given by

( I 1 S 1 ) = ( 0.6 0.3 0.4 0.7 ) ( I 0 S 0 ) ,

( I 2 S 2 ) = ( 0.6 0.3 0.4 0.7 ) ( I 1 S 1 )

= ( 0.6 0.3 0.4 0.7 ) ( 0.6 0.3 0.4 0.7 ) ( I 0 S 0 )

( I n S n ) = ( 0.6 0.3 0.4 0.7 ) n ( I 0 S 0 ) .

has particular characteristics, namely, the entries of each column vector are positive and their sum equals 1 . Such vectors are called probability vectors, and a matrix for which all the column vectors are probability vectors is called transition or stochastic matrix. Andrei Markov (1856&ndash1922), a Russian mathematician, was the first one to study these matrices. At the beginning of twentieth century he developed the fundamentals of the Markov Chain Theory. In this section we learn about some applications of this theory.