# Probability

- Random variable x set of outcomes \(S=\{x_1,x_2,\cdots,x_N\}\) outcomes can be discrete or continuous. \(S_{coin}=\{h,t\}, S_{V_x}=\{V_x,|V_x|\lt\infty\}\).
- Event is any subset of outcomes \(E\subset S\) and is assigned a probability \(p(E)\).
- I.e. \(p_{dice}(\{1\})=\frac{1}{6}\), \(p_{dice}(\{1,3\})=\frac{1}{3}\).

- Union of Events consider events \(A\) (\(B\)) where sum of 2 dice is divisible by 3 (4). \(A\cup B\), A or B, is the set of outcomes where the sum of 2 dice is divisible by either 3 or 4.
- Intersection of Events \(A\cap B=AB\) (not sure about the last notation, but it was written in class), A and B, would be where the sum of 2 dice is divisible by both 3 and 4, i.e. divisible by 12.
- Disjoint of Events \(A\cap B=\varnothing\) (disconnected)
- Complement \(E_c=S\setminus E\)

## Axioms

- \(\forall E\subseteq S.p(E)\geq 0\)
- \(p(S)=1\)
- \(p(A\cup B)=p(A)+p(B)\) if \(A\) and \(B\) are disjoint events
- \(p(E_c)=1-p(E)\)
- Objective probabilities: \(p(A)=\lim_{N\to\infty}\frac{N_A}{N}\) observation
- Frequentist \(\leftrightarrow\) Bayesian statistics
- Subjective probabilities: theoretical estimates for the probabilities using a model
- Computing probabilities: \(S\) discrete and finite, \(S=\{x_1,x_2,\cdots,x_N\}\), assume \(p(\{x_i\})=\frac{1}{N}\), \(1\lt i\lt N\). Then, \(p(E)=\frac{\#\:outcomes\:in\:E}{\#\:outcomes\:in\:S}\).

## Combinatorics

### Exercises

#### Question 1

How many distinct ways you can arrange the 24 letters of the alphabet? W=24!

N distinct objects can be arranged in \(W=N!\) ways.

Letters in ’WHAT’: W=4!

#### Question 2

Letters in ’CHEESE’: W=6!/3! = Number of ways to arrange it/number of ways to arrange the non-distinct letters

#### Question 3

Letters in ’FREEZER’: W=7!/(3!2!1!1!).

3! from the Es, 2! from the Rs, 1! from the F, 1! from the Z.

Multinomial coefficient gives the number of arrangements W of N objectects from k distinct categories each of which appears \(N_j\) times, where \(N=\sum_{j=1}^kN_j\). \(W=\frac{N!}{N_1!\cdots N_{k}!}=\frac{N!}{\prod_j N_j!}\).

We get a special case for \(k=2\), the binomial coefficient. \(W=\frac{N!}{N_1!N_2!}=\frac{N!}{N_1!(N-N_1)!}=N\) choose \(N_1\). \(W=N\choose{M}\)

#### Question 4

For \(N\) coin tosses, how many are tails?

\(N_A:=\) Outcome \(A\) in \(N\) trials.

We want \(p_N(N_T)\).

For event \(E\): \(N_T=5\), \(N=12\). Then, \(W_E\) is 12 choose 5. \(W_S=(2!)^{12}\). \(P(5)=\frac{{12\choose 5}}{2^12}\).

So, \(P(N_T)=\frac{W_E}{W_S}=\frac{\frac{N!}{N_T!(N-N_T)!}}{(2!)^N}=\frac{1}{(p_T)^{N_T}(1-p_T)^{N-N_T}}({N\choose N_T})\).

## Binomial Distribution

2 outcomes with \(p_a,p_b=1-p_a\) in \(N\) trials, \(p_N(N_a)=(p_a)^{N_a}(1-p_a)^{N-N_a}{N\choose N_a}\).

## Multinomial Distribution

\(N\) trials with \(k\) outcomes with probabilities \(p_1,\cdots,p_k\). The probability of finding \(N_1,N_2,\cdots,N_k\) outcomes \(j\) with \(N=\sum_{j=1}^kN_j\). \(p_N(N_1,N_2,\cdots,N_k)=N!\prod_{j=1}^k \frac{1}{N_j!}p_j^{N_j}\).

### Stirling’s Approximation

\(ln N! = N\ln N - N\)

\(N! = \exp(N\ln N-N) = \frac{N^N}{\exp(N)}=\left(\frac{N}{e}\right)^{N}\).

Stirling’s: \(N! = \sqrt{2\pi N}\left(\frac{N}{e}\right)^N\).

How? \(N! = \int x^n\exp(-x)dx = \Gamma(n+1)\).

\((N+1/2)\ln(N+1/2) - (N+1/2) - (1/2\ln1/2-1/2) - (N\ln N - N)\) \((N+1/2)(\ln N + \ln(1+1/(2N))) - N-1/2 - 1/2\ln1/2+1/2 - N\ln N + N\) \(1/2\ln N + (N+1/2)\ln(1+1/(2N)) - 1/2\ln1/2\) \(\ln \sqrt{N} + ~0 - 1/2\ln 1/2\) Thus, the correction is of the order \(\sqrt{N}\), which is what Stirling’s formula has.

## Ways

\(W = \frac{N!}{\prod_{j=1}^kN_j!}\), \(\sum_{j=1}^kN_j=N\). \(p_j = N_j/N\).

### Approximating

#### Ways

\(W \approx = \sqrt{2\pi N}\left(\frac{N}{e}\right)^N\frac{1}{\prod_i N_i!} = \frac{1}{\prod_j p_j^{N_j}}\)

\(\ln W = -\sum_j N_j\ln p_j\)

\(\frac{\ln W}{N} = -\sum_j p_j\ln p_j = \frac{S}{Nk_B}\Rightarrow S = k_B\ln W = -k_B\sum_j p_j\ln p_j\). Thus, we recovered our typical entropy! This makes sense since Entropy is related to the number of ways a system can be arranged.

The entropy of a fair dice is: \(S = k_B\ln 6\).

No knowledge implies fiar dice maximizes the entropy.

Lets say we have a maximally unfair dice \(p_6=1,p_{i\neq 6}=0\). Then the entropy is minimized. Assume we have no missing information.

Lets say someone says, \(\langle N\rangle_{dice} = 3.5\). Then we would assume it is a fair dice.

If someone says \(\langle N\rangle_{dice} = 3.0\) we would assume it is an unfair dice (weighted lower). There is a distribution of probabilities for the individual dice probabilities to be:

- \(p_3 = 1\), \(p_{i\neq 3} = 0\)
- \(p_2=p_4=\frac{1}{2}\), \(p_{i\neq 2,4}=0\)

Thus, there is an increase in missing information.