In this series of posts, we're going to discuss stochastic processes and Markov models (as you might have guessed). There's some pretty fancy machinery being used currently which utilizes these tools, making it an important topic to know about.

I'll attempt to motivate these topics using intuition as a guide, but I will be using some mathematical notation. It will help significantly if you've seen some probability and statistics before reading this.

There are a number of ways one can define a stochastic process, but we'll pick one that's more straight-forward. More interested math-enthusiasts can find the measure-theoretic definition in most undergrad textbooks covering the subject.

Breaking down the term, the word *stochastic* is related to the Greek word *stokhastikos*, meaning, "capable of guessing" — essentially, this word has a notion of randomness and paints stochastic processes as somehow random. The word *process* gives us an idea that this will somehow involve steps, separated in time, which follow one-another.

These are basically true things, but let's actually write the definition down here.

A **stochastic process** $X(t)$ consists of events with probability measure $P$ defined on some sample space $S$, as well as a function $x(t,s)$ which, for each outcome $s\in S$, gives us a time function. This latter function $x(t,s)$ is called the **sample function**.

There's a lot in here, and it may not be all clear — and in fact, if you haven't seen stochastic processes before, you may still have no idea what they are. Let's do an example first to clear up some of these ideas.

Let's say we have a single, fair 6-sided die. Suppose at $T = 1, 2, 3, 4, \dots$ we roll the die and then record whatever we get; let's just call this value $f(T)$, since it's a function of $T$ (we do a new roll at every new time tick). We could have $X(1) = 4$ meaning that we rolled a 4 during the first second, for example.

Let's define the random process $X(t)$ such that if $T \leq t \lt T+1$ we will define $X(t) = f(T)$.

At $T = 1$ we roll the die, record the value, and plot that value for all $t$ between 1 and 2, including 1. At $T = 2$ we roll again and then record that value for all $t$ between 2 and 3, including 2.

What about that strange $x(t,s)$ function? That's going to be exactly the function we created above. Let's let $s = (1, 2, 3, 1, 2, 3, \dots)$ which will just stand for the sequence of values we get when we roll the die a bunch. Then $x(s,t)$ maps these values like we did in the previous paragraph: on $[1,2)$, which is all values between 1 and 2, including 1 and excluding 2, we have the value $1$; on $[2,3)$ we have the value $2$; on $[3,4)$ we have the value $3$, and so forth.

Put another way, $x(t,s)$ will take a sequence $s$ of rolls defined by the random process $X(t)$, and "glue" them to time steps $t$. For any $t_{0}$, we can find a value of $s$ by just checking $x(t_{0},s)$.

Let's do one more example.

Suppose that our values for $T$ are every nanosecond and $X(t)$ is the value of a particular stock (that we've already chosen) on a particular day (that we've already chosen). Because these intervals are so close together we can assume this is nearly continuous. We then have that $x(t,s)$ is the function which takes in $s$, which stands for a pre-chosen stock's values-per-nanosecond on some particular day, and assigns the values to their corresponding times.

For this previous example, we expect that the value of the stock at some time $t_{0}$ probably is dependent on the value of the stock slightly before $t_{0}$; we don't just have wild outlandish fluctuations of stocks constantly from nanosecond to nanosecond, stocks which are $40$ one second will probably stay around $40$ the next second.

This property, that the value at a future time depends on the values of the times that came before it, can be irritating to work with — especially if there's no easy way to represent the function that determines the values across all time.

Oftentimes it's good to look at simple things before we look at more complicated things so it would be nice if we were able to look at just those stochastic processes where, for each point, there was no dependency on the points that came before it. This would be nice, but it's a little *too weird*, not many things happen like this. But we can get pretty close. We can look at processes where the next step is only dependent on the current step. In particular,

Suppose that $X(t)$ is a stochastic process. Then this process is a **Markov process** if for each $n > 0$, for each time points
\[0 \leq t_{0} \lt t_{1} \lt \cdots \lt t_{n} \lt t_{n+1}\]
and for possible values $y_{0}, y_{1}, \dots, y_{n}$ we have
\[P(X(t_{n+1}) = y_{n+1}\,|\, X(t_{n}) = y_{n}, X(t_{n-1}) = y_{n-1}, \dots, X(t_{0}) = y_{0})\]
\[= P(X(t_{n+1}) = y_{n+1}\,|\, X(t_{n}) = y_{n})\]

In other words, the probability that $X(t_{n+1}) = y_{n+1}$ given that $X(t_{n}) = y_{n}$ and $X(t_{n-1}) = y_{n-1}$ and $\dots$ and $X(t_{0}) = y_{0}$ is the same as the probability that $X(t_{n+1}) = y_{n+1}$ given that $X(t_{n}) = y_{n}$; the other values of $t_{i}$ for $i \lt n$ don't affect $t_{n+1}$.

This is often said that this is the Markov "memoryless" property: that these processes can only remember the steps they're on ($n$) in order to figure out the next step ($n+1$) but don't "remember" the other steps.

This post just served to introduce stochastic and Markov processes. In the next post, we'll do a few examples of Markov processes and learn how to represent them in a nice way.