Data Philly, Feb. 2024
Dante Gates
Husband and Father
Philly data scientist
wrote a haiku once
Objective function
\begin{align*} \underset{\theta}{\text{argmin}}\ \mathcal{L}(y,f(\theta)) \end{align*}
Penalties (constraint, sort of)
\begin{align*} \underset{\theta}{\text{argmin}}\ &\mathcal{L}(y,f(\theta))\color{blue}{+\sum_{i=1}^{l}{g_{i}(\theta, \lambda_{i})}}\\ \text{where}&\\ &\color{blue}{g_{i}(\theta,\lambda_{i})>0,\ i=1\ldots l} \end{align*}
Continous inputs, continuous outputs
\begin{align*} \underset{\theta}{\text{argmin}}\ &\mathcal{L}(y,f(\theta))+\sum_{i=1}^{l}{g_{i}(\theta, \lambda_{i})}\\ \text{where}&\\ &g_{i}(\theta,\lambda_{i})>0,\ i=1\ldots l \\ &\color{blue}{f(\theta): \mathbb{R}^{n}\to \mathbb{R}^{m}} \end{align*}
Objective no longer explicitly references a target variable
\begin{align*} &\underset{\theta}{\text{argmin}}\ \color{blue}{f({\theta, X})} \\ \end{align*}
Ability to require arbitrary (sort of) constraints
\begin{align*} \underset{\theta}{\text{argmin}}\ &f({\theta, X}) \\ \text{s.t.}&\\ &\color{blue}{g_{i}(\theta, X) < C_{i}, i\ldots n} \end{align*}
Model parameters no longer bound to strictly reals
\begin{align*} \underset{\theta}{\text{argmin}}\ &f({\theta, X}) \\ \text{s.t.}&\\ &g_{i}(\theta, X) < C_{i}, i\ldots n \\ &\color{blue}{f(\theta, X): \{\mathbb{R},\mathbb{Z},\ldots\}^{n}\to \mathbb{R}\ \text{(Usually)}} \end{align*}
\underset{x_{i,j}}{\text{argmin}}\ \sum_{i=1}^{n}{\sum_{j=1,i\ne j}^{n}{c_{i,j}x_{i,j}}} | Minimize distance traveled |
x_{i,j}\in \{0, 1\} | Objective variable represents path assignments |
\sum_{i=1,i\ne j}^{n}{x_{i,j}}=1 | All cities have exactly one incoming path |
\sum_{j=1,i\ne j}^{n}{x_{i,j}}=1 | All cities have exactly one outgoing path |
“ML optimization” \begin{align*} \underset{\theta}{\text{argmin}}\ &\mathcal{L}(y,f(\theta))+\sum_{i=1}^{l}{g_{i}(\theta, \lambda_{i})}\\ \text{where}&\\ &g_{i}(\theta,\lambda_{i})>0,\ i=1\ldots l \\ &f(\theta): \mathbb{R}^{n}\to \mathbb{R}^{m} \end{align*}
\begin{align*} \underset{\theta}{\text{argmin}}\ &f({\theta, X}) \\ \text{s.t.}&\\ &g_{i}(\theta, X) < C_{i}, i\ldots n \\ &f(\theta, X): \{\mathbb{R},\mathbb{Z},\ldots\}^{n}\to \mathbb{R} \end{align*}
“ML optimization” \begin{align*} \underset{\theta}{\text{argmin}}\ &\mathcal{L}(y,f(\theta))+\sum_{i=1}^{l}{g_{i}(\theta, \lambda_{i})}\\ \text{where}&\\ &g_{i}(\theta,\lambda_{i})>0,\ i=1\ldots l \\ &f(\theta): \mathbb{R}^{n}\to \mathbb{R}^{m} \end{align*}
\begin{align*} \underset{\theta}{\text{argmin}}\ &f({\theta, X}) \\ \text{s.t.}&\\ &g_{i}(\theta, X) < C_{i}, i\ldots n \\ &f(\theta, X): \{\mathbb{R},\mathbb{Z},\ldots\}^{n}\to \mathbb{R} \end{align*}
Mathematical optimization is not special
(or, the truth, a half-truth and nothing but a lie, but not necessarily in that order)
x: 14 pages contain underlining
t_{x}: Last underline appears on page 84
T: 324 pages total
What was the final page read?
x: 14 pages contain underlining
t_{x}: Last underline appears on page 84
T: 324 pages total
❓ Objective: ??
✅ Decision variable: Rate of underlining (\lambda), final page read (\tau)
✅ Constraints: Parameters are positive
✅ Objective: Maximize likelihood of observed data (number of underlines and final page with an underline)
✅ Decision variable: Rate of underlining (\lambda), final page read (\tau)
✅ Constraints: Parameters are positive
\begin{equation}\tag{7} \text{LL}(r,\alpha,a,b)=\sum_{i=1}^{n}{\ln \left[\text{L}(r,\alpha,a,b\vert X_{i}=x_{i},t_{x},T_{i})\right]}\ \ \ \ \ \ \ \ \ \ \end{equation}
This is very easy to code in Excel—see Figure 1 for complete details.