Creating Synthetic Earthquake Data

10 min readDec 27, 2020

Background

My work experience involves looking at a lot of structural connections, and, as an Albertan who’s lived in the seismically active regions of Japan and the west coast of Canada, I will admit the possibility of earthquakes is always in the back of my mind. Putting these two things together often leads me to wonder what exactly happens in a seismic event to the designs I draw and submit.

In 2019 I was introduced to the University of California at Berkeley’s Pacific Earthquake Engineering Research Center’s Ground Motion Database, who, as part of their academic nature, makes their data freely available for research purposes. The database contains thousands of records from different seismic events all over the world. Personally I find these seismic traces quite wonderful to look at, and their database offers the only records of this kind I’m aware of. In fact, despite many searches, I had actually never seen this type of visualization until I constructed it myself.

Databases of this type are used in architecture and engineering to simulate the effects of an earthquake on a building’s structural safety. The displacement, velocity, and acceleration records of a project’s local earthquake history, or records of similar earthquakes defined by a strict criteria, can be downloaded and fed into advanced simulation software. A building’s behaviour during a seismic event can then be predicted. Theoretically at least, having synthetic seismic data, indistinguishable from real, could help simulate regions with relatively spare historic records. This, however, was not the main aim of the project, though it could be modified as such in the future should time and resources allow.

Ultimately the project had two goals: Create synthetic earthquake data, and learn how to apply deep learning techniques to 3D geometry. While I can’t claim full success on the former, as I look back on the project, the struggle to try and solve the problems of my model forced me to look deeply at all facets of the topic, and for that I am grateful.

Anatomy of Earthquake Harmonics

XYZ plots of Niigata event Japan 2004 — FKSH01 Station

It’s common practice in the field of engineering to decompose forces into their XYZ components. This generates graphs of complex harmonic motion. One can see the violent movement start, and then slowly begin to dampen over time. Also remember seismic forces only impart themselves on an object through movement, acceleration being a component of momentum. An animation of a point’s displacement and acceleration in 3D space strengthens this relationship visually.

Displacement and acceleration animation. Niigata Japan 2004 — TCGH17 Station

Displacement and acceleration animation. Niigata Japan 2004 — FKSH01 Station

The shear variety of traces in the data set surprised me. Below is a selection of 140 Japanese traces. The human eye can see some samples have certain harmonics emphasized. I never took my research far enough to see if they were due to the type of earthquake, or an effect of measuring. One often expects to see such harmonic artifacts in a building’s response, which then must be controlled or dampened.

Sample of Displacement plots from various Japanese Seismic Events

Attempts to Reproduce

It would not be unkind to call these traces “random”, but for the purposes of reproducing them it was important to view them rather as really complicated. Mysterious, but solvable. One of the first steps earlier this year to help scope out the challenge was to approximate the traces using different methods. Again decomposing the displacement records into their XYZ components, I tried three different methods of approximating their movement.

Comparison of different approximation methods. Chuetsu event Japan 2007

The results were less than satisfying. Wolfram Mathematica’s out-of-the-box interpolation did well because it’s state-of-the-art and can be configured to capture every point, however, such a black-box algorithm doesn’t readily lend itself to analysis or creativity. The other pair show how much information can be lost in an approximation. The Fourier approximation uses 75 harmonic modes while the nonlinear model fit implemented a 10-degree polynomial. This reveals how big a challenge it would be to let neural network find the traces’ latent characteristics.

Reducing Error

Much of the computational effort in a machine learning algorithm is spent making tiny changes over time to reduce the measured error on a given distribution. In my examples below, I’ve taken an increasing number of modes of a Fourier Transform over a sample of training data. As the predicted trace converges on the original, the error decreases. Later in this piece we see this same approach complicated by multiple dimensions and calculus. Instead of simply increasing harmonic modes to reduce error, we calculate and recalculate, at great computational cost, a multivariable gradient field using matrix multiplication. This behaviour doesn’t really have any straightforward analogy in three dimensions, but nonetheless eventually “slopes” toward the minimal amount of error.

Niigata Japan 2004 — NIG011 Station. Red/Orange represents the measured error and purple the approximation.

Caveats

In an effort to be as transparent as possible, I wanted to be open with some of the liberties I took with the project which would make it easier for an individual researcher to undertake. These choices had varying degrees of affect on the outcome. Firstly, I only used 788 training samples. This is a bit on the low side for most deep neural nets in my experience. However, without formal academic backing there was no timely way to get more data. More training samples would have increased the project’s chance of success. Secondly, as part of preprocessing the data, I took steps to rescale all samples in the training data to range between -1 and 1 inclusive. This makes calculating the gradients between samples more accurate, but mares its use for engineering simulations where the magnitude of the movement is an important property of the prediction. Thirdly, I could have been more careful in sorting my training data into the different types of earthquakes. It’s a bit unknown how much of an effect this had on the results, but it was certainly less than optimal.

The Instability of GANs

Others who have tried to train a Generative Adversarial Network will be familiar with their notorious instability during the training process. Nonlinear equations are well-known for their unpredictable and chaotic nature, and balancing one against another — as generator and discriminator — so they converge neatly nearly leads one to edge of madness. It’s the drive and curiosity to fix these issues that leads to so much learning. In the following error and accuracy graphs, one sees very little convergence. Some of the best advice I got as the project progressed was to stop worrying about them, and concentrate on how the output looked to the human eye and its distributions.

Sample of loss from training the discriminator. Lots of fluctuations can be seen with little convergence.

Here we see the model getting stuck in a local minimum during training.

A sample of min and max weights from the generator changing during the training process. Numerically they are nearly flat, and if all arrays were plotted together it is likely no changes would be seen to the human eye.

Static 3D Geometry Method

128x128x128 arrays of training data represents over 2 million points to analyze per sample.

Of the two main approaches I tried, treating the samples volumetrically seemed to have the most potential based on others’ experience, but also had one major downside. This method transforms the series of XYZ points into a 4D tensor of ones and zeros negating time. A one is placed in the array if a point is identified, and left zero if void. Here the convolution kernel is itself a cube, and it scans the volume a step at a time in the XYZ direction as it identifies features. A major downside of this approach is that the model is immediately faced with an N³ computation. Resisting the desire to downsample too aggressively so as to keep as much of the fanciful detail as possible, I choose an array of 128x128x128. This represents more than 2 million points for the kernel to scan inside the volume. In timing experiments this meant processing less than one sample per second, and as a consequence would mean that to process enough evolutions for useful structure to emerge (greater than 2000 batches of 64 samples each) would have taken weeks on my laptop. This was not a reasonable approach given the resources available, and so the method was abandoned. Downsampling to 10x10x10 would have sped up the computation at the expense of loosing interesting detail.

Unfortunately, because the computation was not efficient, it limited the number of evolutions possible, and therefore the results were less than helpful.

This array plot (downsampled) shows slices through the results. Some structure can be seen developing.

XYZ Time Series Method

Keeping the data as XYZ points in series was much more amenable to computation. Here the kernel treats the data as essentially one dimensional, with each step of 4000 being composed of 3 features. This left only 12000 data points to analyze per sample, significantly lower than above. In testing this lead to about 25–30 examples analyzed per second, making greater than 2000 evolutions achievable overnight. The exact type of neural network structure implemented is called a DCGAN standing for “deep convolutional”. The kernel of a convolution is the mysterious mathematical object which scans and translates features into an abstract numeric format and then stored in an array. Much like the human brain, this is how successively higher levels of meaning are discovered.

Example of DCGAN network. Left: Discriminator and Right: Generator. From Mathematica 12.2

I’ve attached above a graph of one of the more advanced networks I developed for reference, but ultimately went through hundreds of model iterations in both Mathematica and Tensorflow2 that aren’t captured in this piece as I tweaked and tweaked the configuration trying to workaround obstacles. Progress was finally being made as I added more hidden layers with wider kernel sizes, allowing the net to uncover larger sized structures in the data. Unfortunately this came at a cost. As more and more large numeric arrays were added to my model, it saturated the 16GB of RAM I had available on my laptop, and slowed throughput significantly to around 2 samples per second. This meant for the model to see 200,000 samples would take over 110 hours. (My quad-core processor, on the other hand, was a champ. It processed everything in parallel automatically without extra code and was never the bottleneck.) Another challenge I encountered was that for whatever reason the output was especially susceptible to a type of GAN failure called “mode collapse”. After training the results from the generator tended to coverage on a similar pattern that was able to trick the discriminator. This is normally addressed by implementing what’s called a MiniBatch Standard Deviation Layer which penalizes the generator when there is little variance in the synthetic output, but I was unable to implement it correctly in time for publication. As alluded to in the final section, there might one day be a part two of this project as there are still several methods and strategies left untried.

Both the above examples are Synthetic data outputted from two different model configurations after about 1800 rounds of training. Some structure is starting to emerge. The mode collapse of the model is also evident.

Summary

Having stared at seismic traces on and off now for a year, I feel like I’ve gotten to know them really well; and I’m still not tired of their delicate wondering paths. It’s unfortunate I didn’t get the results expected, but reflecting back on the work I realize it was so original there was no Googling answers for this type of project. Posting a question on Stack Overflow or similar was often met with silence because of the complicated background required for others to respond fully. So how to judge my progress? For one, I can now think of many ways to extend this research. Setting aside hardware solutions, I’d love to implement both a mini-batch standard deviation layer (mentioned above), and also a method called a Progressive Growth GAN which has shown good results when upscaling complex functions (such as required by the generator). Another possible strategy I identified was to decompose the training samples into vectors of Fourier coefficients, and then do convolutions on those, as Fourier analysis is appropriate for the complex harmonics seen in the traces. There’s also an opportunity to get more training samples and classify them better before training. Mostly it boils down to subjecting the training data to way more convolutions. There’s a reason these nets are called deep, and some projects have extended the number of hidden layers into the hundreds or thousands.

Resources

Below is list of the main resources I used for this project.

Tensorflow 2 / Python 3.6 / Pandas 1.1.2. Git is available to the public.
Wolfram Research Mathematica 12.2. Code for the main neural network is available as of December 2020.
Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play by David Foster
Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
Machine Learning: A Probabilistic Perspective by Kevin P. Murphy
Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville
UC Berkeley’s PEER Ground Motion Database

Blair is a 3D modeler and computational designer in Vancouver B.C. Follow him on Instagram or connect with him on Linkedin.