# 统计学习 Statistical Learning MATH5743M01

0

An iterative descent algorithm for solving
$$C^{}=\min {C} \sum{k=1}^{K} N_{k} \sum_{C(i)=k}\left|x_{i}-\bar{x}{k}\right|^{2}$$ can be obtained by noting that for any set of observations $S$ $$\bar{x}{S}=\operatorname{argmin}{m} \sum{i \in S}\left|x_{i}-m\right|^{2} .$$
Hence we can obtain $C^{}$ by solving the enlarged optimization problem

## MATH5743M01COURSE NOTES ：

$\rho>0$ is the (constant) density of the fluid and $\Omega$ is some element of $C^{2}(I, \mathbb{R}) \cap$ $C(\bar{I}, \mathbb{R})$. Here $I:=\left(R_{1}, R_{2}\right)$. Note that
$$v_{0}=\nabla \times\left(0,0, \psi_{0}\right)$$
where
$$\psi_{0}(x, y, z):=-\int_{R_{1}}^{\sqrt{x^{2}+y^{2}}} r Q(r) d r$$
for all $(x, y, z) \in U \times \mathbb{R}$ and that the vorticity $\omega_{0}:=\nabla \times \nu_{0}$ is given by
$$\omega_{0}(x, y, z)=\left(0,0,-\left(\Delta \psi_{0}\right)(x, y, z)\right)=\left(0,0, \omega_{0}\left(\sqrt{x^{2}+y^{2}}\right)\right)$$
for all $(x, y, z) \in U \times \mathbb{R}$. Here $\omega_{b e}: I \rightarrow \mathbb{R}$ is defined by
$$\omega_{Q}(r):=r \Omega^{\prime}(r)+2 \Omega(r)$$
for all $r \in I$. In the vorticity formulation the governing equation for reduced small axial variations of such a $\omega_{0}$ of the form
$$(0,0, \omega(r, z) \exp (i m \varphi))$$

# 高级汉密尔顿系统 Advanced Hamiltonian Systems MATH5356M01

0

Proof. Consider an arbitrary power series of the form
$$\lambda+a_{n} w^{n}+a_{n+1} w^{n+1}+\ldots$$
Let $n$ be even, and let $a_{n}>0$. Then this series is formally conjugate to the polynomial $\lambda+z^{n}$. The formula of the corresponding change is as follows:
$$z=w\left(a_{n}+a_{n+1} w+\ldots\right)^{1 / n}$$
Here, of course, we mean the formal Taylor expansion of the radical into power series in $w$. If $a_{n}<0$, then we set $z=-w\left(-a_{n}-a_{n+1} w-\ldots\right)^{1 / n}$.
If $n$ is odd, then the formula is analogous.

## MATH5356M01COURSE NOTES ：

where $\min$ and $\max$ are taken over the circle ${F=c}$ centered at $P_{0}$. Taking the limit as $c \rightarrow 0$, we obtain the equality
$$\Pi(0)=2 \pi \omega(0,0) .$$
On the other hand, the linearization of $w=\frac{\xi}{\omega(x, y)}$ at the equilibrium point $P_{0}$ has the form $\frac{\xi}{\omega(0,0)}=\frac{(-y, x)}{\omega(0,0)}$. Therefore, the eigenvalues of the linearized mapping $\sigma$ equal
$$\nu=\exp \left(\frac{\pm i}{\omega(0,0)}\right)$$
Comparing this expression with $\Pi(0)$, we get the desired for mula:
$$\nu=\exp \left(\frac{\pm 2 \pi i}{\Pi(0)}\right) .$$

# 精算学中的模型 Models in Actuarial Science MATH5325M01

0

Continuous random variables. In this case $\Omega$ is an interval of the real line. Probabilities are specified using a probability density function $f(y)$ with $f(y) \geq 0$ for $y \in \Omega$ and 0 otherwise. Areas under $f(y)$ correspond to probabilities:
$$P(a \leq y \leq b)=\int_{a}^{b} f(y) d y .$$
Hence the probability of $y$ taking on any given value is zero and
$$\int_{-\infty}^{\infty} f(y) d y=1 .$$
Analogous to the discrete case, the mean and variance of a continuous random variable $y$ are defined as
$$\mu=\mathrm{E}(y) \equiv \int_{-\infty}^{\infty} y f(y) d y, \quad \operatorname{Var}(y) \equiv \int_{-\infty}^{\infty}(y-\mu)^{2} f(y) d y$$

## MATH5325M01COURSE NOTES ：

The above formulation supposes $r$ is a positive integer. However, the negative binomial distribution can be defined for any positive values of $r$, by using the gamma function in place of factorials:
$$f(y)=\frac{\Gamma(y+r)}{y ! \Gamma(r)} \pi^{r}(1-\pi)^{y}, \quad y=0,1,2, \ldots .$$
In generalized linear modeling the following parametrization is convenient:
$$\mu=\frac{r(1-\pi)}{\pi}, \quad \kappa=\frac{1}{r} .$$
Using this notation, the probability function of $y$ is
$$f(y)=\frac{\Gamma\left(y+\frac{1}{\kappa}\right)}{y ! \Gamma\left(\frac{1}{\kappa}\right)}\left(\frac{1}{1+\kappa \mu}\right)^{\frac{1}{\kappa}}\left(\frac{\kappa \mu}{1+\kappa \mu}\right)^{y}, \quad y=0,1,2, \ldots$$
with
$$\mathrm{E}(y)=\mu, \quad \operatorname{Var}(y)=\mu(1+\kappa \mu)$$

# 高级微分几何学 Advanced Differential Geometry MATH5113M01

0

• Let $Y \rightarrow X$ be a vector bundle with a typical fibre $V$. By $Y^{} \rightarrow X$ is denoted the dual vector bundle with the typical fibre $V^{}$ dual of $V$. The interior product of $Y$ and $Y^{}$ is defined as a fibred morphism $$J: Y \otimes Y^{} \underset{X}{\longrightarrow} X \times \mathbb{R} .$$
• Let $Y \rightarrow X$ and $Y^{\prime} \rightarrow X$ be vector bundles with typical fibres $V$ and $V^{\prime}$, respectively. Their Whitney sum $Y \underset{X}{\oplus} Y^{\prime}$ is a vector bundle over $X$ with the typical fibre $V \oplus V^{\prime}$.
• Let $Y \rightarrow X$ and $Y^{\prime} \rightarrow X$ be vector bundles with typical fibres $V$ and $V^{\prime}$, respectively. Their tensor product $Y \otimes Y^{\prime}$ is a vector bundle over $X$ with the typical fibre $V \otimes V^{\prime}$. Similarly, the exterior product of vector bundles $Y \underset{X}{\wedge} Y^{\prime}$ is defined. The exterior product
is called the exterior bundle

## MATH5113M01COURSE NOTES ：

Vector fields on a manifold $Z$ are global sections of the tangent bundle $T Z \rightarrow Z$.

The set $\mathcal{T}(Z)$ of vector fields on $Z$ is both a $C^{\infty}(Z)$-module and a real Lie algebra with respect to the Lie bracket
\begin{aligned} &u=u^{\lambda} \partial_{\lambda}, \quad v=v^{\lambda} \partial_{\lambda} \ &{[v, u]=\left(v^{\lambda} \partial_{\lambda} u^{\mu}-u^{\lambda} \partial_{\lambda} v^{\mu}\right) \partial_{\mu^{-}}} \end{aligned}
Given a vector field $u$ on $X$, a curve
$$c: \mathbb{R} D(,) \rightarrow Z$$

# 高级证明和计算 Advanced Proof and Computation MATH5104M01

0

where $A=\left|\mathrm{a}{1} \times \mathrm{a}{2}\right|$ denotes the area of the unit cell. For analytic purposes, it is convenient to use the values $\mathbf{p}=0, z=\sqrt{A}=d$ (remember that $d$ is the pitch of the direct array). The above formula is characterized by faster convergence via integration with respect to $z$. The integer parameter $q$ gives the number of times the convergence of the lattice sums has been accelerated through integration and is thus called convergence acceleration index. The reciprocal unit cell is defined by the vectors[1]
$$\mathbf{a}^{1}=2 \pi \frac{\mathbf{a}{2} \times \mathbf{e}{2}}{A}, \quad \mathbf{a}^{2}=2 \pi \frac{\mathbf{e}{2} \times \mathbf{a}{1}}{A},$$
with the reciprocal lattice vectors
$$\mathbf{Q}{\mathrm{p}}=p{1} \mathbf{a}^{1}+p_{2} \mathbf{a}^{2}+\mathbf{k}, \quad \theta_{\mathrm{p}}=\arg \left(\mathbf{Q}{\mathrm{p}}\right) .$$ The lattice sums satisfy the identity $$S{-l}^{Y}\left(k_{\perp}, \mathbf{k}\right)=\overline{S_{l}^{Y}}\left(k_{\perp}, \mathbf{k}\right),$$

## MATH5104M01COURSE NOTES ：

$$M_{n}^{\xi \xi}=\mathcal{O}\left(\Gamma^{2}(n) n\left(\frac{1}{2} k_{\perp} r_{c}\right)^{-2 r}\right)$$
Similarly, one can show that, for the lattice sums,
$$S_{l}^{Y}\left(k_{\perp}, \mathbf{k}\right)=\mathcal{O}\left(\Gamma(l)\left(\frac{1}{2} k_{\perp} d\right)^{-l}\right), \quad \text { as } l \rightarrow+\infty .$$
This causes numerical difficulties when
$$\frac{k_{\perp} d}{2} \leqslant 1,$$
since the off-diagonal terms increase extremely rapidly with index $l$ :
$$z_{l}^{\xi}+\sum_{m=-\infty}^{+\infty} D_{l m}^{\xi \xi} z_{m}^{\xi}=0, \quad \forall \xi \in{E, H},$$
where
$$z_{l}^{\xi}=b_{l}^{\xi} \sqrt{\left|M_{l}^{\xi \xi}\right|}$$

# 统计理论 Statistical Theory MATH372301

0

If we use a Dirichlet parameter prior for all parameters in our network, then, as $M \rightarrow \infty$, we have that
$$\log P(\mathcal{D} \mid \mathcal{G})=\ell\left(\hat{\boldsymbol{\theta}}_{\mathcal{G}}: \mathcal{D}\right)-\frac{\log M}{2} \operatorname{Dim}[\mathcal{G}]+O(1),$$
where $\operatorname{Dim}[\mathcal{G}]$ is the number of independent parameters in $\mathcal{G}$.

From this we see that the Bayesian score tends precisely to trade off the likelihood – fit to data – on the one hand, and the model complexity on the other.
This approximation is called the Bayesian information criterion (BIC) score:
$$\text { score }{B I C}(\mathcal{G}: \mathcal{D})=\ell\left(\hat{\boldsymbol{\theta}}{\mathcal{G}}: \mathcal{D}\right)-\frac{\log M}{2} \operatorname{Dim}[\mathcal{G}]$$

## MATH372301COURSE NOTES ：

$$n I(p)=-E\left(\ell^{\prime \prime}(p)\right)=-E\left(-\frac{Y}{p^{2}}-\frac{(n-Y)}{(1-p)^{2}}\right)=\frac{n}{p(1-p)}$$
$(E(Y)=n E(X)=n p)$, and the Cramer-Rao lower bound is $p(1-p) / n$. Since
$$E(\widehat{p})=p \text { and } \operatorname{Var}(\widehat{p})=\frac{p(1-p)}{n},$$
the ML estimator $\widehat{p}$ is an efficient estimator of $p$.

# 线性和非线性波 Linear and Non-Linear Waves MATH337409

0

We now look within the framework of this more accurate theory for the solution to replace the one shown in Fig. 2.5. One obvious idea is to look for a steady profile solution in which
$$\rho=\rho(X), \quad X=x-U h_{1}$$
where $U$ is a constant still to be determined. Then from $(2,20)$,
$${c(\rho)-U} \rho_{x}=p \rho_{x x}$$

Integrating once, we have
$$Q(p)-U p+A=v p_{x}$$
where $A$ is a constant of integration. An implicit relation for $\rho(X)$ is obtained in the form
$$\frac{X}{y}=\int \frac{d \rho}{Q(\rho)-U \rho+A}$$

## MATH337409COURSE NOTES ：

$$Q-U p+A=-\alpha\left(\rho-\rho_{1}\right)\left(\rho_{2}-\rho\right)$$
where
$$U=\beta+\alpha\left(\rho_{1}+\rho_{2}\right), \quad A=\alpha \rho_{1} \rho_{2}-\mathrm{Y}$$
Then $(2.22)$ becomes
$$\frac{X}{v}=-\int \frac{d \rho}{a\left(\rho-\rho_{1}\right)\left(p_{2}-\rho\right)}=\frac{1}{\alpha\left(\rho_{2}-p_{1}\right)} \log \frac{\rho_{2}-\rho}{\rho-p_{1}}$$

# 汉密尔顿系统 Hamiltonian Systems MATH335501

0

Suppose that the position of a mechanical system with $d$ degrees of freedom is described by $q=\left(q_{1}, \ldots, q_{d}\right)^{T}$ as generalized coordinates (this can be for example Cartesian coordinates, angles, arc lengths along a curve, etc.). Consider the Lagrangian
$$L=T-U,$$
where $T=T(q, \dot{q})$ denotes the kinetic energy and $U=U(q)$ the potential energy. The motion of the system is described by Lagrange’s equation ${ }^{2}$
$$\frac{d}{d t}\left(\frac{\partial L}{\partial \dot{q}}\right)=\frac{\partial L}{\partial q},$$

which are just the Euler-Lagrange equations of the variational problem $S(q)=$ $\int_{a}^{b} L(q(t), \dot{q}(t)) \mathrm{d} t \rightarrow \min .$

Hamilton ${ }^{3}$ simplified the structure of Lagrange’s equations and turned them into a form that has remarkable symmetry, by

• introducing Poisson’s variables, the conjugate momenta
$$p_{k}=\frac{\partial L}{\partial \dot{q}_{k}}(q, \dot{q}) \quad \text { for } k=1, \ldots, d,$$
• considering the Hamiltonian
$$H:=p^{T} \dot{q}-L(q, \dot{q})$$

## MATH335501COURSE NOTES ：

$$\ddot{q}{1}=-\frac{q{1}}{\left(q_{1}^{2}+q_{2}^{2}\right)^{3 / 2}}, \quad \ddot{q}{2}=-\frac{q{2}}{\left(q_{1}^{2}+q_{2}^{2}\right)^{3 / 2}}$$
This is equivalent to a Hamiltonian system with the Hamiltonian
$$H\left(p_{1}, p_{2}, q_{1}, q_{2}\right)=\frac{1}{2}\left(p_{1}^{2}+p_{2}^{2}\right)-\frac{1}{\sqrt{q_{1}^{2}+q_{2}^{2}}}, \quad p_{i}=\dot{q}_{i} .$$

# 拓扑学 Topology MATH322501

0

Suppose a virrually sorsiont-free group $G$ acts properly and cocompactly on an acyclic conplex $Y$ whose cohonology wibh connpact supports is given by
$$H_{e}^{i}(Y) \cong \begin{cases}0 & \text { if } i \neq n \ Z & \text { if } i=n\end{cases}$$
Then $G$ is a virual $P^{n}$-group.

Since $Y / G$ is compact, $G$ is type $V F L$. By Lemma F.2.2,
$$H_{c}^{i}(G, Z G) \cong H_{c}^{i}(Y) \cong \begin{cases}0 & \text { if } i \neq n, \ Z & \text { if } i=n .\end{cases}$$
and the same formula holds for any torsion-free subgroup $\pi$ of finite index in $G$.

## MATH322501COURSE NOTES ：

where $a=5<n_{1}<\cdots I_{n}=b$ runs over all possible subdivisions of $[a, b]$. $(X, d)$ is a length grace if
$$d(x, y)=\inf {\Omega(\gamma) \mid \gamma \text { is a path from } x \text { to } y} .$$
(Here we allow oo as a possible value of $d$.) Thus, a length space is a geodesic space if the above infimum is alw ays realized and is $\neq \infty$.

# 组合学 Combinatorics MATH314301

0

Let $y \in G$, and write $y$ in the form (1). We can write
$$\varphi(x)-\varphi(x y)=\left[\varphi(x 8)-\varphi\left(x s_{1}\right)\right]+\ldots+\left[\varphi\left(x s_{1} \ldots s_{\ell-1}-\varphi(x y)\right] .\right.$$
It follows, for example, by the Canchy-Schwarz inequality that
$$(\varphi(x)-\varphi(x y))^{2} \leq \ell^{*} \sum^{\ell}\left(\varphi\left(x s_{1} \ldots s_{i-1}\right)-\varphi\left(x s_{1} \ldots s_{i}\right)\right)^{2}$$

where $\ell^{*}$ is the number of nonzero terms in the sum, and is bounded above by $d=\operatorname{diam}(\bar{C})$, since $\gamma$ is geodesic. Summing this inequality over $x \in G$ we get
$$\sum_{x \in G}(\varphi(x)-\varphi(x y))^{2} \leq d \sum_{t \in C, s \in S} N_{\gamma}(s, \bar{C})(\varphi(z)-\varphi(z s))^{2}$$
Since this holds for all $y \in G$, we may average the left hand side with respect to $y$ with weights $\approx(y)$ to get
$$\sum_{x, y \in C}(\varphi(x)-\varphi(x y))^{2} \tilde{\pi}(y) \leq d \sum_{z \in G, s \in S} N_{\gamma}(s, \bar{C})(\varphi(z)-\varphi(z s))^{2}$$

## MATH314301COURSE NOTES ：

Last time, we proved
Theorem 1 Let $C$ be a subset of the group $G, S=S^{-1}$ a symmetric generating set, $\pi=U(S)$ the uniform distribution on $S$, and $p_{\pi}$ the “one-step evolution” of the random walk (i.e. $p_{\pi} \varphi=U(S) * \varphi$ ). Then for any probability distribution $\varphi$,
$$\left|p_{n} \varphi\right|^{2} \leq\left(1-\frac{|G \backslash C|}{2 \cdot A \cdot|G|}\right) \cdot|\varphi|^{2}$$
where $A=d \cdot|S| \cdot \max {s \in S} \max {g \in C} \mu_{s}(g), d=\operatorname{diam}(\bar{C}, G)$.
We will use this theorem to bound the escape time of a random walk $X_{\mathrm{t}}$ generated by $S$. For a subset $C$ of $G$, set
$$\varphi_{t}(g)=\operatorname{Pr}\left[X_{t}=g \text { and } X_{i} \in C \forall i=1 \ldots t\right]$$
Obviously, supp $\varphi_{t} \subset C,\left|\varphi_{0}\right| \leq 1$ ( 1 if $C$ contains $1_{G}, 0$ otherwise) and