# 数据科学基础|DATA1001/ COMP615/CS910/MATHS 7027 Foundations of Data Science代写

0

Keep in mind that the leading terms here for large $k$ are the last two and, in fact, at $k=n$, they cancel each other so that our argument does not prove the fallacious statement for $c \geq 1$ that there is no connected component of size $n$, since there is. Let
$$f(k)=\ln n+k+k \ln \ln n-2 \ln k+k \ln c-c k \ln n+c k^{2} \frac{\ln n}{n}$$
Differentiating with respect to $k$,
$$f^{\prime}(k)=1+\ln \ln n-\frac{2}{k}+\ln c-c \ln n+\frac{2 c k \ln n}{n}$$
and
$$f^{\prime \prime}(k)=\frac{2}{k^{2}}+\frac{2 c \ln n}{n}>0 .$$
Thus, the function $f(k)$ attains its maximum over the range $[2, n / 2]$ at one of the extreme points 2 or $n / 2$. At $k=2, f(2) \approx(1-2 c) \ln n$ and at $k=n / 2, f(n / 2) \approx-c \frac{n}{4} \ln n$. So

## DATA1001/ COMP615/CS910/MATHS 7027COURSE NOTES ：

$$\sum_{i=0}^{\infty} \frac{i}{i !}=\sum_{i=1}^{\infty} \frac{i}{i !}=\sum_{i=1}^{\infty} \frac{1}{(i-1) !}=\sum_{i=0}^{\infty} \frac{1}{i !}$$
and
$$\sum_{i=0}^{\infty} \frac{i^{2}}{i !}=\sum_{i=1}^{\infty} \frac{i}{(i-1) !}=\sum_{i=0}^{\infty} \frac{i+1}{i !}=\sum_{i=0}^{\infty} \frac{i}{i !}+\sum_{i=0}^{\infty} \frac{1}{i !}=2 \sum_{i=0}^{\infty} \frac{1}{i !}$$
Thus,
$$\sum_{i=0}^{\infty} \frac{i(i-2)}{i !}=\sum_{i=0}^{\infty} \frac{i^{2}}{i !}-2 \sum_{i=0}^{\infty} \frac{i}{i !}=0$$

# 纵向数据分析|BSTT537/ BST 5441/STAT 6289/STAT 36900/Stat 771/STATS 768Longitudinal Data Analysis代写

0

$$\ell\left(\boldsymbol{Y}{i} \mid \boldsymbol{\theta}\right)=\prod{j=1}^{n_{i}} \prod_{c=1}^{C}\left(p_{i j c}\right)^{\boldsymbol{y}{\mathrm{v} c}}$$ where $\mathrm{y}{i j e}=1$ if $Y_{i j}=c$, and 0 otherwise. The marginal log-likelihood from the $N$ level-2 units,
$$\log L=\sum_{i}^{N} \log h\left(\boldsymbol{Y}{i}\right)=\sum{i}^{N} \int_{\theta} \ell\left(\boldsymbol{Y}_{i} \mid \theta\right) g(\boldsymbol{\theta}) d \theta$$

## CS 57800/ 3860/COMP 540 001/COMP_SCI 396/STAT3888/YCBS 255COURSE NOTES ：

$$\mathrm{E}\left(y_{i}\right)=\operatorname{Var}\left(y_{i}\right)=\lambda_{i}$$
The likelihood for $N$ independent observations from ( $12.1$ ) is
$$L=\prod_{i=1}^{N} \frac{\exp \left(-\lambda_{i}\right)\left(\lambda_{i}^{y_{i}}\right)}{y_{i} !}$$
and the corresponding log-likelihood function is
$$\log L=-\sum_{i}^{N}\left[\lambda_{i}+y_{i} \log \lambda_{i}-\log \left(y_{i} !\right)\right]$$

# 统计机器学习|CS 57800/ 3860/COMP 540 001/COMP_SCI 396/STAT3888/YCBS 255Statistical Machine Learning代写

0

We will use $\theta_{X_{i} \mid \mathrm{Pa}{i}}$ to denote the subset of parameters that determine $P\left(X{i} \mid \mathrm{Pa}{i}\right)$. In the case where the parameters are disjoint (each CPD is parameterized by a separate set of parameters that do not overlap; this allows us to maximize each parameter set independently. We can write the likelihood as follows: $$L\left(\boldsymbol{\theta}{\mathcal{G}}: \mathcal{D}\right)=\prod_{i=1}^{n} L_{i}\left(\boldsymbol{\theta}{X{i} \mid \mathbf{P a}{i}}: \mathcal{D}\right)$$ where the local likelihood function for $X{i}$ is
$$L_{i}\left(\theta_{X_{i} \mid \mathrm{Pa}{i}}: \mathcal{D}\right)=\prod{j=1}^{m} P\left(x_{i}^{j} \mid \mathbf{p a}{i}^{j}: \theta{X_{i} \mid \mathrm{Pa}{i}}\right) .$$ The simplest parameterization for the CPDs is as a table. Suppose we have a variable $X$ with parents $\boldsymbol{U}$. If we represent that CPD $P(X \mid \boldsymbol{U})$ as a table, then we will have a parameter $\theta{x \mid u}$ for each combination of $x \in \operatorname{Val}(X)$ and $\boldsymbol{u} \in \operatorname{Val}(\boldsymbol{U})$. In this case, we can write the local likelihood function as follows:

## CS 57800/ 3860/COMP 540 001/COMP_SCI 396/STAT3888/YCBS 255COURSE NOTES ：

$$P(\mathcal{G} \mid \mathcal{D})=\frac{P(\mathcal{D} \mid \mathcal{G}) P(\mathcal{G})}{P(\mathcal{D})}$$
where, as usual, the denominator is simply a normalizing factor that does not help distinguish between different structures. Then, we define the Bayesian score as
$$\operatorname{score}{B}(\mathcal{G}: \mathcal{D})=\log P(\mathcal{D} \mid \mathcal{G})+\log P(\mathcal{G}),$$ The ability to ascribe a prior over structures gives us a way of preferring some structures over others. For example, we can penalize dense structures more than sparse ones. It turns out, however, that this term in the score is almost irrelevant compared to the second term. This first term, $P(\mathcal{D} \mid \mathcal{G})$ takes into consideration our uncertainty over the parameters: $$P(\mathcal{D} \mid \mathcal{G})=\int{\Theta_{\mathcal{G}}} P\left(\mathcal{D} \mid \theta_{\mathcal{G}}, \mathcal{G}\right) P\left(\theta_{\mathcal{G}} \mid \mathcal{G}\right) d \boldsymbol{\theta}_{\mathcal{G}}$$

# (广义）线性模型|STAT3030/STATS5019 /STAT 504/STA600/Stat 539/STAT*6802/ST411/STAT 7430/SS 3860B(Generalized) Linear Models代写

0

$$\mu_{t}=\sum_{i} \beta_{i} x_{i t}$$
However, this yields a conditional model of the form
$$\mu_{t \mid t-1}=\rho\left(y_{t-1}-\sum_{i} \beta_{i} x_{i, t-1}\right)+\sum_{i} \beta_{i} x_{i t}$$
This may also be written
$$\mu_{t \mid t-1}-\sum_{i} \beta_{i} x_{i t}=\rho\left(y_{t-1}-\sum_{i} \beta_{i} x_{i, t-1}\right)$$

## STA 144/STAT 451/STAT 506/STA 317 COURSE NOTES ：

where, again, $K$ is the unknown asymptotic maximum value. And again, we can obtain a linear structure, this time for a complementary log log link:
$$\log \left[-\log \left(\frac{K-y}{K}\right)\right]=\log (\alpha)+\beta t$$
We can use the same iterative procedure as before.

# 调查的抽样理论|STA 144/STAT 451/STAT 506/STA 317Sampling Theory of Surveys 代写

0

$$-\sum_{i} \log \left(1-y_{1} a\right)=s_{1} a+s_{2} a_{2}^{2}+s_{3} a_{3}^{3}+\ldots$$
where $a$ is any constant such that
$$|a|<\frac{1}{\max y_{6}}$$
giving us
$$g\left(p_{1}^{n_{1}} p_{2}^{n_{2}} \ldots\right)=\sum g_{1}(P, Q) s\left(q_{1}^{x_{1}} q_{2}{ }^{\times *} \ldots\right)$$
and
$$s\left(p_{1}^{n_{1}} p_{2}^{n_{2}} \ldots\right)=\sum s_{0}(P, Q) g\left(q_{1}^{x_{1}} q_{2}^{x_{1}} \ldots\right)$$

## STA 144/STAT 451/STAT 506/STA 317 COURSE NOTES ：

$N^{t} e_{j}=0$ for $i<j$
and
$$N^{j} e_{i}=n(n-1) \ldots(n-j+1)$$
Hence
$$V\left(s^{2}\right)=\frac{\mu_{4}-\mu_{2}^{2}}{n}+\frac{2}{n(n-1)}{ }^{\mu_{2}^{2}} .$$
Using the Pearsonian notation for departure from normality, this can be written as

# 贝叶斯统计推断|STA 145/STAT 625/ECON 7960/STAT 6574/Bayesian Statistical Inference代写

0

$$y_{t} \sim \operatorname{Normal}\left(\log \left(q K P_{t}\right), \sigma^{2}\right), \quad t=1, \ldots, n=23,$$
where $q, K$, and $P_{t}$ denote the “catchability parameter,” “carrying capacity” of the environment, and biomass in year $t$ expressed as a proportion of the carrying capacity, respectively. In addition, the biomass dynamics are given by
$$\log P_{t}=f\left(P_{t-1}\right)+u_{t}, \quad f\left(P_{t-1}\right)=\log \left[P_{t-1}+r P_{t-1}\left(1-P_{t-1}\right)-\frac{C_{t-1}}{K}\right],$$
where $u_{t} \sim \operatorname{Normal}\left(0, \omega^{2}\right), r$ is the “intrinsic growth rate,” and $C_{t-1}$ denotes the total catch, in kilotonnes, during year $t-1$. To avoid the issues discussed above, we will express this, instead, as
$$\log P_{t} \sim \operatorname{Normal}\left(f\left(P_{t-1}\right), \omega^{2}\right), \quad t=1, \ldots, n .$$

## STAT 770/BIOS 805/BIOSTAT695/PSY 525/625/SOCI612/STA 4504/STA 517 COURSE NOTES ：

$$\psi_{i j}=\frac{D}{V_{i}} \exp \left(-\frac{C L_{i}}{V_{i}} t_{i j}\right)$$
This is the unique solution to the following differential equation and initial condition, at time $t=t_{i j}$ :
$$\frac{d C(t)}{d t}=-\frac{C L_{i}}{V_{i}} C(t), \quad C(t=0)=\frac{D}{V_{i}}$$

# 㞚性数掉分析|STAT 770/BIOS 805/BIOSTAT695/PSY 525/625/SOCI612/STA 4504/STA 517Analysis of Categorical Data代写

0

$$\operatorname{Pr}\left(T_{i}>t \mid T_{i} \geq t\right)=1-p_{i t} .$$
The discrete-time survivor function can be expressed as the product of the conditional probabilities of having “survived” all previous time points or time intervals, as
$$S_{i t}=\operatorname{Pr}\left(T_{i} \geq t\right)=\prod_{s=1}^{t-1}\left(1-p_{i s}\right) .$$
The unconditional event probability (or probability of experiencing the event at time $t$ ) is the discrete-time analog of the continuous-time probability distribution function and may be written as
\begin{aligned} \operatorname{Pr}\left(T_{i}=t\right) &=\operatorname{Pr}\left(T_{i}=t \mid T_{i} \geq t\right) \operatorname{Pr}\left(T_{i} \geq t\right) \ &=p_{i t} S_{i t} \ &=p_{i t} \prod_{s=1}^{t-1}\left(1-p_{i s}\right) \end{aligned}

## STAT 770/BIOS 805/BIOSTAT695/PSY 525/625/SOCI612/STA 4504/STA 517 COURSE NOTES ：

We assume that, conditional on $v$, the hazard rate is a product of an underlying hazard $\lambda(t)$ and the multiplicative frailty,
$$\lambda(t \mid v)=\lambda(t) v,$$
and that frailty (v) follows a gamma distribution,
$$g(v)=\frac{\alpha^{\alpha} v^{\alpha-1}}{\Gamma(\alpha)} \exp (-\alpha v) \quad \text { where } \alpha>0,$$
with mean $\mathrm{E}(v)=1$ and $\operatorname{var}(v)=1 / \alpha=\phi$.

# 高等数理统计学|MAST90123/STAT 550/Math 776Advanced Mathematical Statistics代写

0

For each $i$ compute $f\left(\Psi\left(t_{i}\right)\right) \delta t_{i}$. A little algebraic trickery yields
\begin{aligned} f\left(\Psi\left(t_{i}\right)\right) \delta t_{i} &=f\left(\Psi\left(t_{i}\right)\right) \delta t_{i} \frac{\Delta t_{i}}{\Delta t_{i}} \ &=f\left(\Psi\left(t_{i}\right)\right) \frac{\delta t_{i}}{\Delta t_{i}} \Delta t_{i} \end{aligned}
Sum over all $i$.
The integral $\int_{C} f(x, y) d \mathbf{s}$ is defined to be the limit of the sum from the previous step as $n \rightarrow \infty$. But as $n \rightarrow \infty$ it also follows that $\Delta t_{i} \rightarrow 0$. Our integrand contains the term $\frac{\delta_{i}}{\Delta t_{i}}$. As $\Delta t_{i} \rightarrow \infty$ this converges to $\left|\frac{\partial \Psi}{\partial t}\right|$. Hence, our integral has become
$$\int_{C} f(x, y) d \mathbf{s}=\int_{a}^{b} f(\Psi(t))\left|\frac{\partial \Psi}{\partial t}\right| d t$$

## MAST90123/STAT 550/Math 776 COURSE NOTES ：

We now compute the area of each parallelogram.
Observe that each parallelogram is spanned by the vectors
$$V_{u}=\Psi\left(u_{i+1}, v_{j}\right)-\Psi\left(u_{i}, v_{j}\right)$$
and
$$V_{v}=\Psi\left(u_{i}, v_{j+1}\right)-\Psi\left(u_{i}, v_{j}\right)$$
The desired area is thus the magnitude of the cross product of these vectors:
$$\text { Area }=\left|V_{u} \times V_{v}\right|$$
We now do some algebraic tricks:
\begin{aligned} \left|V_{u} \times V_{v}\right| &=\left|V_{u} \times V_{v}\right| \frac{\Delta u \Delta v}{\Delta u \Delta v} \ &=\left|\frac{V_{u}}{\Delta u} \times \frac{V_{v}}{\Delta v}\right| \Delta u \Delta v \end{aligned}

# 非参数统计|STAT 8560/STAT 261/MATH335/PSY 610.01W/STAT 368/STAT 425/STAT 7610/MATH 494Nonparametric Statistics代写

0

If $f \in L_{2}(a, b)$ then 1
$$f(x)=\sum_{j=1}^{\infty} \theta_{j} \phi_{j}(x)$$
where
$$\theta_{j}=\int_{a}^{b} f(x) \phi_{j}(x) d x .$$
Furthermore,
$$\int_{a}^{b} f^{2}(x) d x=\sum_{j=1}^{\infty} \theta_{j}^{2}$$
which is known as Parseval’s identity.

## STAT 8560/STAT 261/MATH335/PSY 610.01W/STAT 368/STAT 425/STAT 7610/MATH 494 COURSE NOTES ：

where $c_{\alpha}$ is the upper $\alpha$ quantile of a $\chi_{1}^{2}$ random variable,
$$\ell(\theta)=2 \sum_{i=1}^{n} \log \left(1+\lambda(\theta) W_{i}\left(Y_{i}-\theta\right)\right),$$
$\lambda(\theta)$ is defined by
$$\begin{gathered} \sum_{i=1}^{n} \frac{W_{i}\left(Y_{i}-\theta\right)}{1+\lambda(\theta) W_{i}\left(Y_{i}-\theta\right)}=0, \ W_{i}=K\left(\frac{x-X_{i}}{h}\right)\left(s_{n, 2}-\frac{\left(x-X_{i}\right) s_{n, 1}}{h}\right), \end{gathered}$$
and
$$s_{n, j}=\frac{1}{n h} \sum_{i=1}^{n} \frac{K\left(\frac{x-X_{i}}{h}\right)\left(x-X_{i}\right)^{j}}{h^{j}} .$$

# 多元统计分析|MA3066/STAT 505/STAT 530/EXST 7037/STAT 450/550/STA3200/MATH5855Multivariate Statistical Analysis代写

0

Providing the breeders’ equation holds over several generations, the cumulative response to $m$ generations of selection is just
$$R^{(m)}=\sum_{t=1}^{m} R(t)=\sum_{t=1}^{m} \mathbf{G}(t) \beta(t) .$$
In particular, if the genetic covariance matrix remains constant, then
$$R^{(m)}=\mathbf{G}\left(\sum_{t=1}^{m} \beta(t)\right)$$Hence, if we observe a total response to selection of $\mathbf{R}{\text {total }}$, then we can estimate the cumulative selection differential as $$\beta{\text {wotal }}=\sum \beta(t)=\mathbf{G}^{-1} \mathbf{R}_{\text {wotal }} \text {. }$$

## MA3066/STAT 505/STAT 530/EXST 7037/STAT 450/550/STA3200/MATH5855 COURSE NOTES ：

If one has access to a sample of generation means (as opposed to a starting and end points), then notes that a more powerful test can be obtained (from the theory of random walks) by considering $\Delta \mu^{}$, the largest absolute deviation from the starting mean anywhere along the series of $t$ generation means. Here, the rate of evolution is too fast for drift if $$\sigma_{m}^{2}<\frac{\left(\Delta \mu^{}\right)^{2}}{2 t(2.50)^{2}}=0.080 \frac{\left(\Delta \mu^{}\right)^{2}}{t}$$ while it is too slow for drift when $$\sigma_{m}^{2}>\frac{\left(\Delta \mu^{}\right)^{2}}{2 t(0.56)^{2}}=1.59 \frac{\left(\Delta \mu^{*}\right)^{2}}{t} .$$
Note by comparison with that the tests for too fast a divergence are very similar, while Bookstein’s test for too slow a divergence (a potential signature of stabilizing selection) is much less stringent than the Turelli-Gillespie-Lande test.