Mathematically define the Transformer. Your definition must include all variable

Mathematically define the Transformer. Your definition must include all variables and equations given in the paper and more (e.g., Layer Norm, ReLU, Label Smoothing, etc equations) to make the definition complete. To be complete, the definition must include all components, how they are connected and work together. Scanned hand-written notes clearly written and legible.