Extreme value theory

Extreme value theory or extreme value analysis (EVA) is a branch of  dealing with the extreme s from the  of s. It seeks to assess, from a given ordered  of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction, and geological engineering. For example, EVA might be used in the field of to estimate the probability of an unusually large flooding event, such as the. Similarly, for the design of a, a would seek to estimate the 50-year wave and design the structure accordingly.

Data analysis
Two approaches exist for practical extreme value analysis.

The first method relies on deriving block maxima (minima) series as a preliminary step. In many situations it is customary and convenient to extract the annual maxima (minima), generating an "Annual Maxima Series" (AMS).

The second method relies on extracting, from a continuous record, the peak values reached for any period during which values exceed a certain threshold (falls below a certain threshold). This method is generally referred to as the "Peak Over Threshold" method (POT).

For AMS data, the analysis may partly rely on the results of the, leading to the being selected for fitting. However, in practice, various procedures are applied to select between a wider range of distributions. The theorem here relates to the limiting distributions for the minimum or the maximum of a very large collection of s from the same distribution. Given that the number of relevant random events within a year may be rather limited, it is unsurprising that analyses of observed AMS data often lead to distributions other than the generalized extreme value distribution (GEVD) being selected.

For POT data, the analysis may involve fitting two distributions: one for the number of events in a time period considered and a second for the size of the exceedances.

A common assumption for the first is the, with the being used for the exceedances. A can be based on the.

Novak reserves the term “POT method” to the case where the threshold is non-random, and distinguishes it from the case where one deals with exceedances of a random threshold.

Applications
Applications of extreme value theory include predicting the probability distribution of:
 * Extreme s; The size of s
 * outbreaks
 * Maximum sizes of ecological populations
 * Side effects of drugs (e.g., )
 * The amounts of large losses
 * s; Day to day
 * Mutational events during
 * Large s
 * Environmental loads on structures
 * Estimate fastest time humans are capable of running the sprint and performances in other athletic disciplines.
 * Pipeline failures due to.
 * Anomalous IT network traffic, prevent attackers from reaching important data

History
The field of extreme value theory was pioneered by (1902–1985). Tippett was employed by the, where he worked to make cotton thread stronger. In his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibres. With the help of, Tippet obtained three asymptotic limits describing the distributions of extremes assuming independent variables. codified this theory in his 1958 book Statistics of Extremes, including the s that bear his name. These results can be extended to allowing for slight correlations between variables, but the classical theory does not extend to strong correlations of the order of the variance. One universality class of particular interest is that of, where the correlations decay logarithmically with the distance.

A summary of historically important publications relating to extreme value theory can be found in the article.

Univariate theory
Let $$X_1, \dots, X_n$$ be a sequence of random variables with  F and let $$M_n =\max(X_1,\dots,X_n)$$ denote the maximum.

In theory, the exact distribution of the maximum can be derived:

\begin{align} \Pr(M_n \leq z) & = \Pr(X_1 \leq z, \dots, X_n \leq z) \\ & = \Pr(X_1 \leq z) \cdots \Pr(X_n \leq z) = (F(z))^n. \end{align} $$

The associated $$I_n = I(M_n>z)$$ is a  with a success probability $$p(z)=1-(F(z))^n$$ that depends on the magnitude $$z$$ of the extreme event. The number of extreme events within $$n$$ trials thus follows a and the number of trials until an event occurs follows a  with expected value and standard deviation of the same order $$O(1/p(z))$$.

In practice, we might not have the distribution function $$F$$ but the provides an asymptotic result. If there exist sequences of constants $$a_n>0 $$ and $$b_n\in \mathbb R $$ such that


 * $$ \Pr\{(M_n-b_n)/a_n \leq z\} \rightarrow G(z) $$

as $$n \rightarrow \infty$$ then


 * $$ G(z) \propto \exp \left[-(1+\zeta z)^{-1/\zeta} \right] $$

where $$\zeta$$ depends on the tail shape of the distribution. When normalized, G belongs to one of the following non- families:

when the distribution of $$M_n$$ has a light tail with finite upper bound. Also known as Type 3.
 * $$ G(z) = \begin{cases} \exp\left\{-\left( -\left( \frac{z-b}{a} \right) \right)^\alpha\right\} & zb. \end{cases}$$ when the distribution of $$M_n$$ has a (including polynomial decay).  Also known as Type 2.

In all cases, $$\alpha>0$$.

Multivariate theory
Extreme value theory in more than one variable introduces additional issues that have to be addressed. One problem that arises is that one must specify what constitutes an extreme event. Although this is straightforward in the univariate case, there is no unambiguous way to do this in the multivariate case. The fundamental problem is that although it is possible to order a set of real-valued numbers, there is no natural way to order a set of vectors.

As an example, in the univariate case, given a set of observations $$x_i $$ it is straightforward to find the most extreme event simply by taking the maximum (or minimum) of the observations. However, in the bivariate case, given a set of observations $$ (x_i, y_i) $$, it is not immediately clear how to find the most extreme event. Suppose that one has measured the values $$(3, 4)$$ at a specific time and the values $$(5, 2)$$ at a later time. Which of these events would be considered more extreme? There is no universal answer to this question.

Another issue in the multivariate case is that the limiting model is not as fully prescribed as in the univariate case. In the univariate case, the model contains three parameters whose values are not predicted by the theory and must be obtained by fitting the distribution to the data. In the multivariate case, the model not only contains unknown parameters, but also a function whose exact form is not prescribed by the theory. However, this function must obey certain constraints.

As an example of an application, bivariate extreme value theory has been applied to ocean research.

Software

 * Extreme Value Statistics in R - Packages for extreme value statistics in R
 * ExtremeStats.jl - Extreme Value Statistics in Julia