$$ P\left[\lim {n \rightarrow \infty} X{n}=0\right]=1 $$ if $$ \sum_{n} p_{n}<\infty $$ To verify that observe that if $$ \sum_{n} p_{n}=\sum_{n} P\left[X_{n}=1\right]<\infty, $$ then by the Borel-Cantelli Lemma $$ P\left(\left[X_{n}=1\right] \text { i.o. }\right)=0 $$ Taking complements, we find $$ 1=P\left(\limsup {n \rightarrow \infty}\left[X{n}=1\right]^{c}\right)=P\left(\liminf {n \rightarrow \infty}\left[X{n}=0\right]\right)=1 . $$
Math 425/Math 340 / Stat 231/ST 8553/Math 551/MA 485/585COURSE NOTES :
$$ P\left[X_{k}=1\right]=p_{k}=1-P\left[X_{k}=0\right] $$ Then we assert that $$ P\left[X_{n} \rightarrow 0\right]=1 \text { iff } \sum_{n} p_{k}<\infty $$ To verify this assertion, we merely need to observe that $$ P\left{\left[X_{n}=1\right] \text { i.o. }\right}=0 $$ iff $$ \sum_{n} P\left[X_{n}=1\right]=\sum_{n} p_{n}<\infty $$
Consider maximization of the function $L(\mathbf{W}, \mathbf{H})$ in, written here without the matrix notation $$ L(\mathbf{W}, \mathbf{H})=\sum_{i=1}^{N} \sum_{j=1}^{p}\left[x_{i j} \log \left(\sum_{k=1}^{r} w_{i k} h_{k j}\right)-\sum_{k=1}^{r} w_{i k} h_{k j}\right] . $$ Using the concavity of $\log (x)$, show that for any set of $r$ values $y_{k} \geq 0$ and $0 \leq c_{k} \leq 1$ with $\sum_{k=1}^{r} c_{k}=1$, $$ \log \left(\sum_{k=1}^{r} y_{k}\right) \geq \sum_{k=1}^{r} c_{k} \log \left(y_{k} / c_{k}\right) $$
In this equation $c_{i 1}$ is the coefficient of $\bar{X}{i . .}$ when testing for a linear trend, and $c{i 2}$ is the coefficient for the quadratic trend.
The value of $a_{1}^{}$ is estimated from $C_{1}$ for the $A$ main effect just as in Chapter 10: $$ \hat{a}{1}^{}=C{1} / \Sigma_{i} c_{i 1}^{2}=100 / 20=5 . $$ To estimate $a^{}\left(b_{j}\right){2}$, we use the same formula (substituting $C{2}$ for $C_{1}$ and $C_{i 2}$ for $C_{i 1}$, but we use a different $C_{2}$ for each level of $B$. The values we use are in the quadratic column in Table 11.3: $$ \begin{aligned} &\hat{a}^{}\left(b_{1}\right){2}=-2 / \Sigma{i} c_{i 2}^{2}=-2 / 4=-.50 \ &\hat{a}^{}\left(b_{2}\right){2}=-13 / \Sigma{i} c_{i 2}^{2}=-13 / 4=-3.25 \ &\hat{a}^{}\left(b_{3}\right){2}=-19 / \Sigma{i} c_{i 2}^{2}=-19 / 4=-4.75 . \end{aligned} $$ The estimate of $\mu_{11}$ would then be $\hat{\mu}{11}=45.00+5(-3)-.50(1)=29.50$. The estimate of $\mu{12}$ would be $\mu_{12}=52.75+54(-3)-3.25(1)=34.50$. The other estimates shownand plotted in are obtained similarly.
Sometimes addition, subtraction, or multiplication of matrices is possible after one or both matrices have been transposed. To transpose a matrix, we simply exchange rows and columns. For example, the transpose of $C$ is $$ C^{t}=\left|\begin{array}{rr} 3 & 4 \ -2 & 2 \ 1 & 0 \end{array}\right| $$ One important use of transposing is to enable the multiplication of a matrix by itself. We cannot write $A A$ unless $A$ is a square matrix, but we can always write $A^{t} A$ and $A A^{t}$. For the matrix above, $$ C^{t} C=\left|\begin{array}{rr} 3 & 4 \ -2 & 2 \ 1 & 0 \end{array}\right|\left|\begin{array}{rrr} 3 & -2 & 1 \ 4 & 2 & 0 \end{array}\right|=\left|\begin{array}{rrr} 25 & 2 & 3 \ 2 & 8 & -2 \ 3 & -2 & 1 \end{array}\right| . $$
The discriminant function estimator $$ \hat{\boldsymbol{\beta}}{D}=\frac{n(n-1)}{N{0} N_{1}} \hat{\Sigma}^{-1} \hat{\Sigma}{\boldsymbol{x}} \hat{\boldsymbol{\beta}}{O L S} . $$ Now when the conditions of Definition $10.3$ are met and if $\mu_{1}-\mu_{0}$ is small enough so that there is not perfect classification, then $$ \boldsymbol{\beta}{L R}=\Sigma^{-1}\left(\mu{1}-\mu_{0}\right) . $$ Empirically, the OLS ESP and LR ESP are highly correlated for many LR data sets where the conditions are not met, eg when some of the predictors are factors. This suggests that $\boldsymbol{\beta}{L R} \approx d \boldsymbol{\Sigma}{\boldsymbol{x}}^{-1}\left(\boldsymbol{\mu}{1}-\boldsymbol{\mu}{0}\right)$ for many LR data sets where $d$ is some constant depending on the data. Results from Haggstrom (1983) suggest that if a binary regression model is fit using OLS software for MLR, then a rough approximation is $\hat{\boldsymbol{\beta}}{L R} \approx \hat{\boldsymbol{\beta}}{O L S} / M S E$. So a rough approximation is LR ESP $\approx(\mathrm{OLS}$ ESP $) / M S E$.
Under the normality assumption, it can be shown that $$ \frac{\hat{\beta}{i}-\beta{i}}{s_{\hat{\beta}{i}}} \sim t{n-p} $$ although we will not derive this result. It follows that a $100(1-\alpha) \%$ confidence interval for $\beta_{i}$ is $$ \hat{\beta}{i} \pm t{n-p}(\alpha / 2) s_{\hat{\beta}{i}} $$ To test the null hypothesis $H{0}: \beta_{i}=\beta_{i 0}$, where $\beta_{i 0}$ is a fixed number, we can use the test statistic $$ t=\frac{\hat{\beta}{i}-\beta{i 0}}{s_{\hat{\beta}_{i}}} $$