The basic idea of physics-based sound
synthesis is that it models the sound generation mechanism of the
instrument rather than the generated sound itself (signal-based modeling), which
is more common in sound synthesis. This is illustrated in Fig. 1.
Fig. 1. Signal- and physics-based modeling
Despite that research on physics-based sound synthesis is going
on for three decades, its commercial application is still quite rare, mostly
because of its higher computational complexity compared to signal modeling.
However, by the increase of computational power and the appearance of better
models it is quite probable that physics-based sound synthesis will be able to
compete with the most common signal modeling technique, namely, sampling
synthesis. Sampling synthesis is based on playing back the recorded samples of
instrument sounds. A serious shortcoming of sampling synthesis is that it cannot
model the interaction of the different parts of the instrument (e.g., the
coupling of different strings). Moreover, all the variations of a single note
has to be stored that can be generated by the musician (different bow velocity,
bow force, etc.). These problems are automatically avoided in physics-based
sound synthesis, where the model blocks correspond to the main parts of the
instrument (in the case of string instruments: excitation, string, instrument
body - see Fig. 2.). The parameters of the model are physically meaningful
(e.g., string length, bow velocity), therefore the control of the virtual
instrument is straightforward. A further advantage of physics-based sound
synthesis is that it can provide useful information for the acousticians about
which are the most important phenomena during sound production and how would the
sound of the instrument change by varying its physical properties.
Fig. 2. The piano and its physical model
The first step of physics-based sound synthesis is to
understand how the instrument works, that is, the equations describing the main
parts of the instrument and the interactions of the different parts have to be
revealed. Naturally, most of this knowledge is obtainable from the literature,
as musical acoustics has a long tradition. However, for some specific parts of
the instrument model further investigations are necessary. The resulting precise
instrument model can be directly used for sound synthesis after spatial and
temporal discretization. However, the required computational complexity of such
a model is usually too high for real-time implementation. Therefore, efficient
sound synthesis algorithms have to be developed by neglecting the less important
features of the precise model. For that, one has to estimate which are those
phenomena that are less relevant in producing the characteristic sound of the
instrument.
The most important part of string instruments is
the string, as the string generates the periodic vibration in
the sound. The equation describing the ideal, infinite string is the wave
equation
where y is the transverse displacement, x is
the position along the string, t is the time, T is the
tension, and μ is the mass per unit length. In real strings losses and
dispersion also occur, which can be modeled by adding further terms. The
solution of the wave equation can be calculated by spatial and temporal
discretization, i.e., by substituting the derivatives with differences. This is
the finite difference method. While it is closely connected to
the physical reality, a drawback of the approach its high computational
demand.
Another common string modeling technique is modal
synthesis, where the motion of the string modes are computed and the
shape of the string is calculated by the summation of these modes as
where sin(kπx/L) is
the modal shape of mode k and L is the length of the string.
The instantaneous amplitudes of the modes are given by the functions
yk(t), which are typically exponentially decaying sinusoidal
functions implemented by second-order resonators in discrete-time.
The most efficient approach to string modeling is the
digital waveguide. The time-domain solution of the wave
equation is the superposition of two functions
where y+ and y- can be
considered as two traveling waves, which retain their shape during their
movement. The function y+ is the wave going to the right and
the function y- is the wave going to the left direction, and
c is the propagation speed. If the spatially sampled values of the two
components (y+ and y-) are stored in two
vectors, then the next state can be computed by shifting the two vectors to the
right and to the left. This corresponds to two delay lines, which can be
efficiently implemented by circular buffers. This is depicted in Fig. 3. The
reflections from the ideally rigid terminations of the string can
be realizied by multiplying with -1 at the end of the delay lines. The
losses and dispersion of the string, and the nonideality of the termination are
modeled by a digital filter H(z) in the delay loop. Thus, the
distributed losses and dispersion are lumped to one point of the structure. The
magnitude response of H(z) controls the decay times of the
generated partials, while its phase response sets the frequencies of the
partials, together with the delay line lenght M. Parameter estimation
for the digital waveguide lies in designing such a filter
H(z) that results in the required distribution of the partials
with required decay times.
Fig. 3. Digital waveguide string model
The string gains energy from the excitation, which can be
impulse like (plucking, striking) or continuous (bowing). It is common for all
the cases that the interaction of the string and the exciter is bidirectional,
i.e., the excitation force is a function of string shape. Modeling of the
excitation is carried out by the discretization of the (generally zero
dimensional) differential equation of the excitation. As the excitation is
nonlinear in most of the cases, the discretization is nontrivial and often leads
to numerical instabilities.
The string cannot efficiently radiate, because its radiation
impedance is not in the same order as the impedance of the air. The role of the
instrument body is providing an impedance match, thus, increasing the efficiency
of sound radiation. The most time-consuming operation in physics-based sound
synthesis is body modeling, because here the calculation of a two- or
three-dimensional vibration is necessary, contrary to the string (one dimension)
and the excitation (zero dimension). Therefore, it is common to model the effect
of the body as a force-pressure transfer function instead of a precise physical
model.