WEBVTT Kind: captions Language: en-US 00:00:00.720 --> 00:00:03.840 So even amidst a surging global pandemic, 00:00:03.840 --> 00:00:07.576 machine learning in seismology is difficult to avoid. 00:00:07.600 --> 00:00:10.720 So I’m going to use the term “machine learning” and “deep learning” 00:00:10.720 --> 00:00:16.560 in this talk to mean a data analysis technique that involves taking a model, 00:00:16.560 --> 00:00:20.720 or a parameterized functional form, and optimizing the parameters 00:00:20.720 --> 00:00:25.576 in that model to produce some output, given an input. 00:00:25.600 --> 00:00:32.000 And so the reason that people are so excited about these new data analysis 00:00:32.000 --> 00:00:36.960 techniques is that they are both very efficient, because once trained, 00:00:36.960 --> 00:00:40.400 they’re essentially analytic, and also they’re high-performing. 00:00:40.400 --> 00:00:45.040 In many cases, machine learning or deep learning techniques can exceed 00:00:45.040 --> 00:00:49.576 the performance of humans when they’re given the same sort of task. 00:00:49.600 --> 00:00:54.080 And so this enables the sort of bulk analysis of tremendous 00:00:54.080 --> 00:00:58.560 quantities of data. And, as we all know, seismology is a very data-rich field. 00:00:58.560 --> 00:01:02.800 This is, on the right, the size of the IRIS DMC archive as a function of time. 00:01:02.800 --> 00:01:07.440 And, as of this month, it’s approaching around 800 tebibytes, so just a huge 00:01:07.440 --> 00:01:12.000 quantity of data that we want to analyze, right? 00:01:12.000 --> 00:01:15.200 So a number of ML algorithms have been developed in geophysics 00:01:15.200 --> 00:01:19.440 and seismology. They tackle everything from earthquake early warning 00:01:19.440 --> 00:01:22.160 to ground motion prediction. But what we’re going to focus on 00:01:22.160 --> 00:01:25.920 on this talk is basically using deep learning 00:01:25.920 --> 00:01:29.416 for earthquake detection and phase picking. 00:01:29.440 --> 00:01:32.720 So a typical deep learning approach to seismic phase picking would involve 00:01:32.720 --> 00:01:36.080 the following steps. So first you need to assemble a training data set, 00:01:36.080 --> 00:01:40.160 much like the one that’s on the right. And that would contain basically 00:01:40.160 --> 00:01:45.680 three-component seismic data and the associated P and S wave 00:01:45.680 --> 00:01:49.496 arrival times for each one of these records. 00:01:49.520 --> 00:01:54.080 Next you want to set aside some small portion of the data, which is called 00:01:54.080 --> 00:01:57.760 the testing data, and we use this later to assess model performance. 00:01:57.760 --> 00:02:02.320 And then you use the lion’s share of the data for what we call training, 00:02:02.320 --> 00:02:05.576 which is this parameter optimization process. 00:02:05.600 --> 00:02:09.120 We define a model architecture. In this particular situation, 00:02:09.120 --> 00:02:13.680 we use this thing which is called UNET, which is a particular type of 00:02:13.680 --> 00:02:16.880 convolutional neural network. We selected this just because it’s 00:02:16.880 --> 00:02:19.280 been shown in previous studies to work really well 00:02:19.280 --> 00:02:22.616 for this particular application. 00:02:22.640 --> 00:02:28.800 We define our model inputs, which are these three-component seismograms, 00:02:28.800 --> 00:02:34.400 and we also define model outputs. The model output is sort of something 00:02:34.400 --> 00:02:38.160 that you have the freedom to select, but again, with the previous studies, 00:02:38.160 --> 00:02:42.880 they’ve found that using a Gaussian functional form, where the Gaussian 00:02:42.880 --> 00:02:47.120 is centered on the analyst’s P wave arrival pick and the analyst’s S wave 00:02:47.120 --> 00:02:50.160 arrival pick, works really well. And so what we’re trying to do here 00:02:50.160 --> 00:02:53.520 is develop a tool where we can just basically take and apply it 00:02:53.520 --> 00:02:56.640 to continuous data streams. And anytime it thinks that there’s 00:02:56.640 --> 00:03:01.280 a P or S wave arrival, the output will be sort of a peak, okay? 00:03:01.280 --> 00:03:03.680 And the amplitude of that peak is a measure of how confident 00:03:03.680 --> 00:03:06.936 the network is that there’s an arrival there. 00:03:06.960 --> 00:03:10.640 Next, we train the model, and then we need to measure how well it performs. 00:03:10.640 --> 00:03:13.680 And, for the purposes of this talk, we’re going to use two different metrics. 00:03:13.680 --> 00:03:17.440 So the first is relevant to the classification problem. 00:03:17.440 --> 00:03:21.120 So just, if you give the network a particular window of continuous 00:03:21.120 --> 00:03:24.080 seismic data, how well can it tell you whether there’s an earthquake 00:03:24.080 --> 00:03:27.200 in that window or not? And, in particular, we’re going to 00:03:27.200 --> 00:03:30.480 focus on just the accuracy, which is maybe the most simple metric. 00:03:30.480 --> 00:03:35.120 It’s just a measure of, how often did the network get it right. 00:03:35.120 --> 00:03:39.416 Whether it said it was an earthquake or wasn’t an earthquake, when was it right. 00:03:39.440 --> 00:03:43.280 The next thing we’re going to focus on are picking metrics. 00:03:43.280 --> 00:03:46.800 And so, for the picking metrics, what you typically do is you take where 00:03:46.800 --> 00:03:50.640 the network thought the P wave arrival was, and you compare that to where the 00:03:50.640 --> 00:03:54.720 analyst thought the P wave arrival was. And you can make – you do this for 00:03:54.720 --> 00:03:58.080 many different examples, and you can make histograms of time differences 00:03:58.080 --> 00:04:00.800 between the network and the analyst, and that gives you some estimate 00:04:00.800 --> 00:04:03.896 of how well the network is doing. 00:04:03.920 --> 00:04:10.616 Okay, so that’s deep learning and phase picking in a nutshell. 00:04:10.640 --> 00:04:14.960 But, as I said at the beginning of the talk, you know, the promise 00:04:14.960 --> 00:04:19.520 of deep learning – and it’s just a tool, but the reason that we want to use it is it, 00:04:19.520 --> 00:04:24.880 you know, paves the way, maybe, for a next generation of really accurate 00:04:24.880 --> 00:04:30.376 earthquake catalogs. And these catalogs can be assembled really quickly, okay? 00:04:30.400 --> 00:04:34.080 And so one of the places that we’re data mining, i.e., trying to find 00:04:34.080 --> 00:04:37.520 as many earthquakes as possible, is in Cascadia. 00:04:37.520 --> 00:04:42.080 But you have to walk before you run, and thinking about Cascadia, it’s hard 00:04:42.080 --> 00:04:46.320 to ignore the fact that a tremendous amount of the seismic data in Cascadia 00:04:46.320 --> 00:04:52.560 is kind of non-traditional seismicity. So, in particular, what I’m talking about 00:04:52.560 --> 00:04:56.720 are low-frequency earthquakes. So Cascadia is a type locality for 00:04:56.720 --> 00:04:59.336 slow earthquakes, which many of you may know. 00:04:59.360 --> 00:05:04.080 The Cascadia subduction zone has slow earthquakes approximately 00:05:04.080 --> 00:05:08.160 every year in different parts of the margin. They occur down-dip of 00:05:08.160 --> 00:05:12.240 the sort of locked section of the megathrust in Cascadia. 00:05:12.240 --> 00:05:15.440 And these slow-slip events are accompanied by a very weak 00:05:15.440 --> 00:05:19.760 seismic signature know as non-volcanic or tectonic tremor. 00:05:19.760 --> 00:05:23.440 And so, if you – if a slow earthquake were going on, and you were to, 00:05:23.440 --> 00:05:28.880 you know, grab the seismic data across a network, what you would 00:05:28.880 --> 00:05:31.120 see is something like what’s up here on the right. 00:05:31.120 --> 00:05:36.056 So this is seismic data from many different stations in Oregon 00:05:36.080 --> 00:05:39.600 during a slow earthquake that happened in 2011. 00:05:39.600 --> 00:05:42.160 And so you can take, and you can filter these traces, 00:05:42.160 --> 00:05:45.520 and plot them up in this way. And what you can see is, A, if you were 00:05:45.520 --> 00:05:49.120 to compare this to a time when there was no slow earthquake going on, there’s 00:05:49.120 --> 00:05:53.120 far more energy in this frequency band than there otherwise would be. 00:05:53.120 --> 00:05:57.656 But also you can see these sort of coherent amplitude changes 00:05:57.680 --> 00:06:00.240 that tend to manifest across the network. 00:06:00.240 --> 00:06:02.720 And if you scrutinize these further, you can see that they have 00:06:02.720 --> 00:06:06.640 a particular move-out. And scrutinizing them even further, 00:06:06.640 --> 00:06:11.280 you can tell that some fraction of this tremor signal is made up by 00:06:11.280 --> 00:06:14.856 repeating seismic events known as low-frequency earthquakes. 00:06:14.880 --> 00:06:17.920 And so low-frequency earthquakes, they look a lot like typical 00:06:17.920 --> 00:06:22.080 small-magnitude earthquakes, so you have things like P and S wave arrivals, 00:06:22.080 --> 00:06:25.576 which you can see in the trace here. But they’re a little bit different, 00:06:25.600 --> 00:06:29.016 just because they tend to be really, really low amplitude. 00:06:29.040 --> 00:06:33.336 So signal-to-noise ratios are around 1 in many cases. 00:06:33.360 --> 00:06:37.120 And they also are depleted in high-frequency content relative to 00:06:37.120 --> 00:06:42.080 traditional earthquakes of the same size. And so I’m going to show you 00:06:42.080 --> 00:06:43.840 something that I thought was kind of interesting. 00:06:43.840 --> 00:06:47.440 So what we did was we took our – you know, our earthquake data set 00:06:47.440 --> 00:06:51.600 with the selected P and S waves. We trained a network just like PhaseNet, 00:06:51.600 --> 00:06:54.640 which is Zhu and Beroza’s convolutional neural network that 00:06:54.640 --> 00:06:58.880 was trained on data from California. And then what we did was just, like, 00:06:58.880 --> 00:07:02.880 apply it to a bunch of different examples of known earthquakes in the catalog. 00:07:02.880 --> 00:07:05.280 And that’s what I’m going to show you on the right-hand side. 00:07:05.280 --> 00:07:08.720 So on the top are the three traces – Channel 1, 2, and 3. 00:07:08.720 --> 00:07:13.200 Sometimes they’re identical because, if it’s only a vertical-component station, 00:07:13.200 --> 00:07:16.936 we copy the vertical to the other two components. 00:07:16.960 --> 00:07:20.560 So this is the input. And then down here is the output. 00:07:20.560 --> 00:07:24.400 And so what you can see, hopefully, is that the network seems to be 00:07:24.400 --> 00:07:27.680 doing pretty good. You know, when you have clear P wave arrivals, 00:07:27.680 --> 00:07:31.680 you see a purple Gaussian that’s very near 1 in amplitude. 00:07:31.680 --> 00:07:34.800 And, when you have clear S wave arrivals, you see a purple Gaussian 00:07:34.800 --> 00:07:37.440 that’s very near 1 in amplitude. 00:07:37.440 --> 00:07:41.576 And so this seems to work really, really great. Wonderful. 00:07:41.600 --> 00:07:44.480 One thing that we noticed [chuckles] though is, if we take that network and 00:07:44.480 --> 00:07:49.120 apply it to low-frequency earthquakes, it definitely does not do great. 00:07:49.120 --> 00:07:56.080 So, just to guide your eye, I’m showing you a stack of LFEs up here. 00:07:56.080 --> 00:07:58.800 So this does not change. So this is the Channel 1 stack, 00:07:58.800 --> 00:08:00.880 the Channel 2 stack, and the Channel 3 stack. 00:08:00.880 --> 00:08:04.960 And this is just to show you that there is P wave – there is a P wave here, 00:08:04.960 --> 00:08:09.120 and there is an S wave here. Down here, though, are the individual 00:08:09.120 --> 00:08:12.800 traces that go into making up the stack, and this is what actually goes 00:08:12.800 --> 00:08:15.760 into the network. And, when we dump these into the network, 00:08:15.760 --> 00:08:19.280 there actually is a prediction down here. You can hopefully see that 00:08:19.280 --> 00:08:23.920 occasionally they’re non-zero. But, by and large, the network is saying, 00:08:23.920 --> 00:08:28.616 this is mostly a zero, guys. Like, this is noise. It’s not signal. 00:08:28.640 --> 00:08:31.760 Okay. [laughs] 00:08:31.760 --> 00:08:34.160 But we know that there’s signal there, right? 00:08:34.160 --> 00:08:39.360 So what we want to do, or what we want to ask is, like, can we use the 00:08:39.360 --> 00:08:42.240 same deep learning approach that we’ve already talked about that’s been 00:08:42.240 --> 00:08:47.440 applied to traditional earthquakes and tune it to detect, or train it to detect, 00:08:47.440 --> 00:08:50.400 low-frequency earthquakes? And it may seem obvious that 00:08:50.400 --> 00:08:53.920 you can do this, but it’s definitely not obvious that it’ll work. 00:08:53.920 --> 00:08:59.200 And the reason is just because LFEs are really low-amplitude. 00:08:59.200 --> 00:09:03.360 I know I’ve said that before. But each one of these light gray traces 00:09:03.360 --> 00:09:06.560 here – this panel is Channel 1, this is Channel 2, and this is 00:09:06.560 --> 00:09:11.920 Channel 3 – you may not believe me, but each one actually contains an LFE. 00:09:11.920 --> 00:09:17.040 It’s just they’re so low-amplitude, you just can’t see it in the raw data. 00:09:17.040 --> 00:09:20.136 What you have to do is take and stack these guys. 00:09:20.160 --> 00:09:24.000 And ultimately, in the stack, you will start to see a P wave arrival 00:09:24.000 --> 00:09:27.520 and an S wave arrival pop out. So what we’re asking is, can we put 00:09:27.520 --> 00:09:31.976 this into a neural network and have it give us something meaningful? 00:09:32.000 --> 00:09:36.240 So the first place that we tried this is Parkfield, just because it has a really 00:09:36.240 --> 00:09:40.480 great LFE catalog that I’ve worked with a lot, and it was assembled 00:09:40.480 --> 00:09:45.600 by David Shelly in 2017. It has over a million LFEs in it. 00:09:45.600 --> 00:09:49.040 And so what we did is just follow that procedure that I talked about at 00:09:49.040 --> 00:09:52.720 the beginning of the talk. We put in three-component seismic data. 00:09:52.720 --> 00:09:59.600 We asked the network to predict when the LFE S wave arrival is. 00:09:59.600 --> 00:10:05.200 And the only really modification that we made is that we started expanding 00:10:05.200 --> 00:10:08.800 the width of the Gaussian that we used just to give it a little bit 00:10:08.800 --> 00:10:13.200 more allowance for their – for LFE arrivals being emergent 00:10:13.200 --> 00:10:19.760 and low-amplitude. So, okay, so that’s that. [laughs] 00:10:19.760 --> 00:10:21.840 Okay. So we did this. We trained it. 00:10:21.840 --> 00:10:26.800 Again, things are really low-amplitude. It’s really hard to see just individual 00:10:26.800 --> 00:10:32.616 LFEs in the continuous seismic data. And this is the result in a nutshell. 00:10:32.640 --> 00:10:36.960 The peak accuracy is about 86%, meaning, if you take this network 00:10:36.960 --> 00:10:39.920 and use it as a classification problem, so you give it some window and you 00:10:39.920 --> 00:10:44.560 say, is there an LFE in this window or not, it gets it right 86% of the time. 00:10:44.560 --> 00:10:47.280 And that is pretty impressive considering that PhaseNet, 00:10:47.280 --> 00:10:53.016 which is one of the many deep learning pickers, but it’s trained on earthquakes, 00:10:53.040 --> 00:10:57.360 its peak accuracy is, I think, about 95%. And so we’re not – 00:10:57.360 --> 00:11:01.176 we’re doing worse, but we’re not doing that much worse. [laughs] 00:11:01.200 --> 00:11:04.640 You can also compare the picking performance of the two. 00:11:04.640 --> 00:11:07.760 And so that’s what I’ve got here. PhaseNet was trained on 00:11:07.760 --> 00:11:13.520 100-hertz seismic data. So this is minus 20 samples to 20 samples. 00:11:13.520 --> 00:11:19.336 If you compare the kind of same interval on this plot down here, minus 20 to 20, 00:11:19.360 --> 00:11:24.776 you can see that the distribution is, like, not quite as sharp, but it still is peaked. 00:11:24.800 --> 00:11:29.576 And so it’s definitely getting in the neighborhood, 00:11:29.600 --> 00:11:33.280 but just because probably of the low amplitude and the emergent 00:11:33.280 --> 00:11:36.376 nature of LFEs, it’s just not quite doing as well. 00:11:36.400 --> 00:11:39.760 So you can take – after you have these trained networks, and you can 00:11:39.760 --> 00:11:42.400 go and you can apply them to continuous seismic data. 00:11:42.400 --> 00:11:48.560 I’ve got one example here for you. These are – each row here, each panel, 00:11:48.560 --> 00:11:54.056 is a different seismic station over some short time period. 00:11:54.080 --> 00:11:58.640 The black diamonds mark times of – arrival times of known LFEs, 00:11:58.640 --> 00:12:03.496 so LFEs from the Shelly et al. catalog – or, just the Shelly catalog. 00:12:03.520 --> 00:12:10.560 And the circles are when the network thinks that there could be an LFE. 00:12:10.560 --> 00:12:13.760 And what’s really neat about this is you can see that the network seems to 00:12:13.760 --> 00:12:18.136 be identifying low-frequency earthquakes that are in the catalog, 00:12:18.160 --> 00:12:22.560 but it also appears to be identifying a lot of things that actually do 00:12:22.560 --> 00:12:27.016 look like LFEs. So they have the right frequency content. 00:12:27.040 --> 00:12:30.960 They get detected, in many cases, sort of across the network. 00:12:30.960 --> 00:12:33.920 And, if you take and you go the next step, and you actually template-match 00:12:33.920 --> 00:12:38.320 them, you can get a P wave to emerge. So this network is detecting known, 00:12:38.320 --> 00:12:42.616 but also what I think are new, LFEs in Parkfield. 00:12:42.640 --> 00:12:47.336 And so we’re continuing to explore this today. 00:12:47.360 --> 00:12:52.160 So, just as a very last note here, we did the same thing. 00:12:52.160 --> 00:12:56.640 So we trained a different network to detect LFEs in Cascadia using 00:12:56.640 --> 00:12:59.440 Michael Bostock’s catalog from southern Vancouver Island. 00:12:59.440 --> 00:13:04.080 So this is Vancouver Island. Each circle is an LFE epicenter, 00:13:04.080 --> 00:13:08.080 and they’re color-coded by the depth. And then the triangles here are the 00:13:08.080 --> 00:13:11.920 network that Michael Bostock used to detect LFEs. 00:13:11.920 --> 00:13:15.280 So what we do is we go through the same training process 00:13:15.280 --> 00:13:20.400 as we did in Parkfield. And we’ve gotten a little bit further 00:13:20.400 --> 00:13:23.760 along in Cascadia because we’ve assessed network performance, but we’re actually 00:13:23.760 --> 00:13:26.480 starting to do the data mining, where we just take and apply 00:13:26.480 --> 00:13:28.960 this network to chunks of continuous seismic data and 00:13:28.960 --> 00:13:31.760 sort of scan through each day of data. 00:13:31.760 --> 00:13:35.736 And then we have catalog detections that we can explore. 00:13:35.760 --> 00:13:41.760 And so this is a figure from Michel et al. where they did slip inversions for 00:13:41.760 --> 00:13:45.120 slow earthquakes in Cascadia. And I just want to – I just want to 00:13:45.120 --> 00:13:49.520 bring your attention to this box right here – this red box. 00:13:49.520 --> 00:13:55.360 This is essentially like the latitude band that southern Vancouver Island is in. 00:13:55.360 --> 00:13:58.376 And so we’re going to use this in the next plot. 00:13:58.400 --> 00:14:01.680 So, when we take and we mine through continuous seismic data, 00:14:01.680 --> 00:14:06.480 we see some pretty exciting things. So first of all, each row here is 00:14:06.480 --> 00:14:09.360 a different station. Time is on the X axis, 00:14:09.360 --> 00:14:13.360 and we’re just plotting the number of detections per day, and we’re comparing 00:14:13.360 --> 00:14:16.160 that to Michael Bostock’s template-matched catalog. 00:14:16.160 --> 00:14:20.000 And what you see is that there are near-simultaneous peaks across 00:14:20.000 --> 00:14:25.440 the network in many cases. And, if we go back to those Michel 00:14:25.440 --> 00:14:30.080 results, these near-simultaneous peaks line up with known slow-slip events. 00:14:30.080 --> 00:14:35.016 So this thing is detecting LFEs on a grand scale. 00:14:35.040 --> 00:14:39.280 And it’s doing exactly what we hoped it would do, which is detect them during 00:14:39.280 --> 00:14:43.200 times of known slow-slip events. But also, if you look at these, 00:14:43.200 --> 00:14:47.200 they’re really rich in character. There’s really a lot here that 00:14:47.200 --> 00:14:49.760 should be explored. It looks like there may be smaller slow-slip 00:14:49.760 --> 00:14:52.400 events that generate LFEs and so on and so forth. 00:14:52.400 --> 00:14:56.678 So this is just a first cut, and it all looks very promising. 00:14:57.360 --> 00:15:01.600 I don’t have that much time left, but just briefly, like, deep learning does 00:15:01.600 --> 00:15:06.936 seem like it can successfully identify LFEs in a way that is useful for us. 00:15:06.960 --> 00:15:10.536 It’s capable of identifying new and known LFEs. 00:15:10.560 --> 00:15:15.760 And early tests in taking and applying these networks to continuous seismic 00:15:15.760 --> 00:15:20.080 data yield really promising results, i.e., we see lots of LFE detections 00:15:20.080 --> 00:15:22.720 at times we know we should see lots of LFE detections. 00:15:22.720 --> 00:15:27.736 And then there’s even more rich stuff in the catalog that we haven’t yet explored. 00:15:27.760 --> 00:15:33.496 It’s probably best to go ahead and use an approach that combines 00:15:33.520 --> 00:15:36.480 these deep neural networks with template-matching. 00:15:36.480 --> 00:15:41.280 And we can talk about that in the discussion section after this set of talks. 00:15:41.280 --> 00:15:45.200 But I just wanted to leave you over here with – if you remember the 00:15:45.200 --> 00:15:51.120 dismal performance of the CNN picker trained on earthquakes on LFEs, 00:15:51.120 --> 00:15:57.520 this is our LFE-trained picker looking at LFEs, and you can see that it seems 00:15:57.520 --> 00:16:03.789 to be doing a little bit better. So thank you for your attention.