Neural network models#
Let us begin with a simple illustration. A Facebook user uploads some
image files containing photographs of family and friends. The system
automatically highlights faces and is able to identify the names of many
of the individuals in those photos. How? In fact, how does it even know
what part of a complex, multifaceted image represents a human face? The
short answer is, they use biometric data combined with specialized, deep
learning algorithms based on artificial neural network (ANN) models.
These algorithms attempt to mimic the current scientific understanding
of how our human brain works, and how it learns new things.
In late 2021 Facebook announced the shutting down of its facial
recognition feature, for a variety of socio-political reasons. But the
technological innovations behind it continue to grow, and to power many
other cutting-edge, real world applications, such as self-driving cars,
voice recognition, credit card fraud detection, targeted advertising,
and more. This chapter introduces the foundational concepts underlying
ANNs and thier use in modern, data-centric applications. Our focus is
primarily on methods that belong to the category known as “supervised
learning” in machine learning nomenclature.
Conceptual overview#
An artificial neural network (ANN) is essentially a mathematical model that receives a set of numeric inputs, based on which it produces an output. To a mathematician this, of course, describes a function, which is certainly one way to conceptualize an ANN. From a model-building perspective, an ANN is a set of very simple neuron-like components that are interconnected in the form of a network, whose structure and parameters determine what precise output is generated. It is common to think of the structure of these networks as comprised of layers (see Figure 1.1{reference-type=”ref” reference=”F:ann_schematic”}). Typically, there is an input layer and an output layer, together with optional layers in-between, usually referred to as hidden layers.
To understand the functioning of a neural network, let us look at how a single component, which we will call a neuron, works. Figure 1.2{reference-type=”ref” reference=”F:neuron_sketch”} shows a schematic of a neuron receiving \(n\) inputs and, following a sequence of steps, producing an output \(\hat{y}\). Neurons typically take in multiple inputs and produce a single output. The effect of each input \(x_i\) is moderated by a numerical weight coefficient \(w_i\), which represents the strength of the connection between \(x_i\) and the output. Typically, we want to compare the sum \(\sum_{i=1}^n x_i \cdot w_i\) with some specified threshold value \(b\), known as the bias, to determine the neuron’s output. This is done by an activation function that takes the input \(\sum_{i=1}^n x_i \cdot w_i - b\) and produces the output \(\hat{y}\).
Let’s try to display some eqns [ f(x) = \left[ \frac{e^{x^3}}{4x+5x^4} \right] ]