WEBVTT
Kind: captions
Language: en-US

00:00:00.720 --> 00:00:03.840
So even amidst a surging
global pandemic,

00:00:03.840 --> 00:00:07.576
machine learning in seismology
is difficult to avoid.

00:00:07.600 --> 00:00:10.720
So I’m going to use the term
“machine learning” and “deep learning”

00:00:10.720 --> 00:00:16.560
in this talk to mean a data analysis
technique that involves taking a model,

00:00:16.560 --> 00:00:20.720
or a parameterized functional form,
and optimizing the parameters

00:00:20.720 --> 00:00:25.576
in that model to produce
some output, given an input.

00:00:25.600 --> 00:00:32.000
And so the reason that people are
so excited about these new data analysis

00:00:32.000 --> 00:00:36.960
techniques is that they are both
very efficient, because once trained,

00:00:36.960 --> 00:00:40.400
they’re essentially analytic,
and also they’re high-performing.

00:00:40.400 --> 00:00:45.040
In many cases, machine learning or
deep learning techniques can exceed

00:00:45.040 --> 00:00:49.576
the performance of humans when
they’re given the same sort of task.

00:00:49.600 --> 00:00:54.080
And so this enables the sort of
bulk analysis of tremendous

00:00:54.080 --> 00:00:58.560
quantities of data. And, as we all know,
seismology is a very data-rich field.

00:00:58.560 --> 00:01:02.800
This is, on the right, the size of the IRIS
DMC archive as a function of time.

00:01:02.800 --> 00:01:07.440
And, as of this month, it’s approaching
around 800 tebibytes, so just a huge

00:01:07.440 --> 00:01:12.000
quantity of data that
we want to analyze, right?

00:01:12.000 --> 00:01:15.200
So a number of ML algorithms
have been developed in geophysics

00:01:15.200 --> 00:01:19.440
and seismology. They tackle everything
from earthquake early warning

00:01:19.440 --> 00:01:22.160
to ground motion prediction.
But what we’re going to focus on

00:01:22.160 --> 00:01:25.920
on this talk is basically
using deep learning

00:01:25.920 --> 00:01:29.416
for earthquake detection
and phase picking.

00:01:29.440 --> 00:01:32.720
So a typical deep learning approach
to seismic phase picking would involve

00:01:32.720 --> 00:01:36.080
the following steps. So first you
need to assemble a training data set,

00:01:36.080 --> 00:01:40.160
much like the one that’s on the right.
And that would contain basically

00:01:40.160 --> 00:01:45.680
three-component seismic data
and the associated P and S wave

00:01:45.680 --> 00:01:49.496
arrival times for each
one of these records.

00:01:49.520 --> 00:01:54.080
Next you want to set aside some small
portion of the data, which is called

00:01:54.080 --> 00:01:57.760
the testing data, and we use this
later to assess model performance.

00:01:57.760 --> 00:02:02.320
And then you use the lion’s share
of the data for what we call training,

00:02:02.320 --> 00:02:05.576
which is this parameter
optimization process.

00:02:05.600 --> 00:02:09.120
We define a model architecture.
In this particular situation,

00:02:09.120 --> 00:02:13.680
we use this thing which is called UNET,
which is a particular type of

00:02:13.680 --> 00:02:16.880
convolutional neural network.
We selected this just because it’s

00:02:16.880 --> 00:02:19.280
been shown in previous
studies to work really well

00:02:19.280 --> 00:02:22.616
for this particular
application.

00:02:22.640 --> 00:02:28.800
We define our model inputs, which are
these three-component seismograms,

00:02:28.800 --> 00:02:34.400
and we also define model outputs.
The model output is sort of something

00:02:34.400 --> 00:02:38.160
that you have the freedom to select,
but again, with the previous studies,

00:02:38.160 --> 00:02:42.880
they’ve found that using a Gaussian
functional form, where the Gaussian

00:02:42.880 --> 00:02:47.120
is centered on the analyst’s P wave
arrival pick and the analyst’s S wave

00:02:47.120 --> 00:02:50.160
arrival pick, works really well.
And so what we’re trying to do here

00:02:50.160 --> 00:02:53.520
is develop a tool where we can
just basically take and apply it

00:02:53.520 --> 00:02:56.640
to continuous data streams.
And anytime it thinks that there’s

00:02:56.640 --> 00:03:01.280
a P or S wave arrival, the output
will be sort of a peak, okay?

00:03:01.280 --> 00:03:03.680
And the amplitude of that peak
is a measure of how confident

00:03:03.680 --> 00:03:06.936
the network is that
there’s an arrival there.

00:03:06.960 --> 00:03:10.640
Next, we train the model, and then we
need to measure how well it performs.

00:03:10.640 --> 00:03:13.680
And, for the purposes of this talk,
we’re going to use two different metrics.

00:03:13.680 --> 00:03:17.440
So the first is relevant to
the classification problem.

00:03:17.440 --> 00:03:21.120
So just, if you give the network a
particular window of continuous

00:03:21.120 --> 00:03:24.080
seismic data, how well can it tell
you whether there’s an earthquake

00:03:24.080 --> 00:03:27.200
in that window or not?
And, in particular, we’re going to

00:03:27.200 --> 00:03:30.480
focus on just the accuracy, which
is maybe the most simple metric.

00:03:30.480 --> 00:03:35.120
It’s just a measure of, how often
did the network get it right.

00:03:35.120 --> 00:03:39.416
Whether it said it was an earthquake or
wasn’t an earthquake, when was it right.

00:03:39.440 --> 00:03:43.280
The next thing we’re going to
focus on are picking metrics.

00:03:43.280 --> 00:03:46.800
And so, for the picking metrics,
what you typically do is you take where

00:03:46.800 --> 00:03:50.640
the network thought the P wave arrival
was, and you compare that to where the

00:03:50.640 --> 00:03:54.720
analyst thought the P wave arrival was.
And you can make – you do this for

00:03:54.720 --> 00:03:58.080
many different examples, and you can
make histograms of time differences

00:03:58.080 --> 00:04:00.800
between the network and the analyst,
and that gives you some estimate

00:04:00.800 --> 00:04:03.896
of how well the
network is doing.

00:04:03.920 --> 00:04:10.616
Okay, so that’s deep learning
and phase picking in a nutshell.

00:04:10.640 --> 00:04:14.960
But, as I said at the beginning
of the talk, you know, the promise

00:04:14.960 --> 00:04:19.520
of deep learning – and it’s just a tool,
but the reason that we want to use it is it,

00:04:19.520 --> 00:04:24.880
you know, paves the way, maybe,
for a next generation of really accurate

00:04:24.880 --> 00:04:30.376
earthquake catalogs. And these catalogs
can be assembled really quickly, okay?

00:04:30.400 --> 00:04:34.080
And so one of the places that
we’re data mining, i.e., trying to find

00:04:34.080 --> 00:04:37.520
as many earthquakes
as possible, is in Cascadia.

00:04:37.520 --> 00:04:42.080
But you have to walk before you run,
and thinking about Cascadia, it’s hard

00:04:42.080 --> 00:04:46.320
to ignore the fact that a tremendous
amount of the seismic data in Cascadia

00:04:46.320 --> 00:04:52.560
is kind of non-traditional seismicity.
So, in particular, what I’m talking about

00:04:52.560 --> 00:04:56.720
are low-frequency earthquakes.
So Cascadia is a type locality for

00:04:56.720 --> 00:04:59.336
slow earthquakes, which
many of you may know.

00:04:59.360 --> 00:05:04.080
The Cascadia subduction zone has
slow earthquakes approximately

00:05:04.080 --> 00:05:08.160
every year in different parts of
the margin. They occur down-dip of

00:05:08.160 --> 00:05:12.240
the sort of locked section of
the megathrust in Cascadia.

00:05:12.240 --> 00:05:15.440
And these slow-slip events are
accompanied by a very weak

00:05:15.440 --> 00:05:19.760
seismic signature know as
non-volcanic or tectonic tremor.

00:05:19.760 --> 00:05:23.440
And so, if you – if a slow earthquake
were going on, and you were to,

00:05:23.440 --> 00:05:28.880
you know, grab the seismic data
across a network, what you would

00:05:28.880 --> 00:05:31.120
see is something like what’s
up here on the right.

00:05:31.120 --> 00:05:36.056
So this is seismic data from
many different stations in Oregon

00:05:36.080 --> 00:05:39.600
during a slow earthquake
that happened in 2011.

00:05:39.600 --> 00:05:42.160
And so you can take,
and you can filter these traces,

00:05:42.160 --> 00:05:45.520
and plot them up in this way.
And what you can see is, A, if you were

00:05:45.520 --> 00:05:49.120
to compare this to a time when there was
no slow earthquake going on, there’s

00:05:49.120 --> 00:05:53.120
far more energy in this frequency band
than there otherwise would be.

00:05:53.120 --> 00:05:57.656
But also you can see these
sort of coherent amplitude changes

00:05:57.680 --> 00:06:00.240
that tend to manifest
across the network.

00:06:00.240 --> 00:06:02.720
And if you scrutinize these further,
you can see that they have

00:06:02.720 --> 00:06:06.640
a particular move-out.
And scrutinizing them even further,

00:06:06.640 --> 00:06:11.280
you can tell that some fraction of
this tremor signal is made up by

00:06:11.280 --> 00:06:14.856
repeating seismic events known
as low-frequency earthquakes.

00:06:14.880 --> 00:06:17.920
And so low-frequency earthquakes,
they look a lot like typical

00:06:17.920 --> 00:06:22.080
small-magnitude earthquakes, so you
have things like P and S wave arrivals,

00:06:22.080 --> 00:06:25.576
which you can see in the trace here.
But they’re a little bit different,

00:06:25.600 --> 00:06:29.016
just because they tend to
be really, really low amplitude.

00:06:29.040 --> 00:06:33.336
So signal-to-noise ratios
are around 1 in many cases.

00:06:33.360 --> 00:06:37.120
And they also are depleted in
high-frequency content relative to

00:06:37.120 --> 00:06:42.080
traditional earthquakes of the same size.
And so I’m going to show you

00:06:42.080 --> 00:06:43.840
something that I thought
was kind of interesting.

00:06:43.840 --> 00:06:47.440
So what we did was we took our –
you know, our earthquake data set

00:06:47.440 --> 00:06:51.600
with the selected P and S waves.
We trained a network just like PhaseNet,

00:06:51.600 --> 00:06:54.640
which is Zhu and Beroza’s
convolutional neural network that

00:06:54.640 --> 00:06:58.880
was trained on data from California.
And then what we did was just, like,

00:06:58.880 --> 00:07:02.880
apply it to a bunch of different examples
of known earthquakes in the catalog.

00:07:02.880 --> 00:07:05.280
And that’s what I’m going to show
you on the right-hand side.

00:07:05.280 --> 00:07:08.720
So on the top are the three traces –
Channel 1, 2, and 3.

00:07:08.720 --> 00:07:13.200
Sometimes they’re identical because,
if it’s only a vertical-component station,

00:07:13.200 --> 00:07:16.936
we copy the vertical to
the other two components.

00:07:16.960 --> 00:07:20.560
So this is the input.
And then down here is the output.

00:07:20.560 --> 00:07:24.400
And so what you can see, hopefully,
is that the network seems to be

00:07:24.400 --> 00:07:27.680
doing pretty good. You know,
when you have clear P wave arrivals,

00:07:27.680 --> 00:07:31.680
you see a purple Gaussian that’s
very near 1 in amplitude.

00:07:31.680 --> 00:07:34.800
And, when you have clear S wave
arrivals, you see a purple Gaussian

00:07:34.800 --> 00:07:37.440
that’s very near 1
in amplitude.

00:07:37.440 --> 00:07:41.576
And so this seems to work
really, really great. Wonderful.

00:07:41.600 --> 00:07:44.480
One thing that we noticed [chuckles]
though is, if we take that network and

00:07:44.480 --> 00:07:49.120
apply it to low-frequency earthquakes,
it definitely does not do great.

00:07:49.120 --> 00:07:56.080
So, just to guide your eye, I’m showing
you a stack of LFEs up here.

00:07:56.080 --> 00:07:58.800
So this does not change.
So this is the Channel 1 stack,

00:07:58.800 --> 00:08:00.880
the Channel 2 stack,
and the Channel 3 stack.

00:08:00.880 --> 00:08:04.960
And this is just to show you that there
is P wave – there is a P wave here,

00:08:04.960 --> 00:08:09.120
and there is an S wave here.
Down here, though, are the individual

00:08:09.120 --> 00:08:12.800
traces that go into making up the stack,
and this is what actually goes

00:08:12.800 --> 00:08:15.760
into the network. And, when we
dump these into the network,

00:08:15.760 --> 00:08:19.280
there actually is a prediction down here.
You can hopefully see that

00:08:19.280 --> 00:08:23.920
occasionally they’re non-zero.
But, by and large, the network is saying,

00:08:23.920 --> 00:08:28.616
this is mostly a zero, guys.
Like, this is noise. It’s not signal.

00:08:28.640 --> 00:08:31.760
Okay.
[laughs]

00:08:31.760 --> 00:08:34.160
But we know that
there’s signal there, right?

00:08:34.160 --> 00:08:39.360
So what we want to do, or what we
want to ask is, like, can we use the

00:08:39.360 --> 00:08:42.240
same deep learning approach that
we’ve already talked about that’s been

00:08:42.240 --> 00:08:47.440
applied to traditional earthquakes and
tune it to detect, or train it to detect,

00:08:47.440 --> 00:08:50.400
low-frequency earthquakes?
And it may seem obvious that

00:08:50.400 --> 00:08:53.920
you can do this, but it’s definitely
not obvious that it’ll work.

00:08:53.920 --> 00:08:59.200
And the reason is just because
LFEs are really low-amplitude.

00:08:59.200 --> 00:09:03.360
I know I’ve said that before.
But each one of these light gray traces

00:09:03.360 --> 00:09:06.560
here – this panel is Channel 1,
this is Channel 2, and this is

00:09:06.560 --> 00:09:11.920
Channel 3 – you may not believe me,
but each one actually contains an LFE.

00:09:11.920 --> 00:09:17.040
It’s just they’re so low-amplitude,
you just can’t see it in the raw data.

00:09:17.040 --> 00:09:20.136
What you have to do is
take and stack these guys.

00:09:20.160 --> 00:09:24.000
And ultimately, in the stack,
you will start to see a P wave arrival

00:09:24.000 --> 00:09:27.520
and an S wave arrival pop out.
So what we’re asking is, can we put

00:09:27.520 --> 00:09:31.976
this into a neural network and have it
give us something meaningful?

00:09:32.000 --> 00:09:36.240
So the first place that we tried this is
Parkfield, just because it has a really

00:09:36.240 --> 00:09:40.480
great LFE catalog that I’ve worked
with a lot, and it was assembled

00:09:40.480 --> 00:09:45.600
by David Shelly in 2017.
It has over a million LFEs in it.

00:09:45.600 --> 00:09:49.040
And so what we did is just follow
that procedure that I talked about at

00:09:49.040 --> 00:09:52.720
the beginning of the talk. We put
in three-component seismic data.

00:09:52.720 --> 00:09:59.600
We asked the network to predict
when the LFE S wave arrival is.

00:09:59.600 --> 00:10:05.200
And the only really modification that
we made is that we started expanding

00:10:05.200 --> 00:10:08.800
the width of the Gaussian that we
used just to give it a little bit

00:10:08.800 --> 00:10:13.200
more allowance for their –
for LFE arrivals being emergent

00:10:13.200 --> 00:10:19.760
and low-amplitude.
So, okay, so that’s that. [laughs]

00:10:19.760 --> 00:10:21.840
Okay. So we did this.
We trained it.

00:10:21.840 --> 00:10:26.800
Again, things are really low-amplitude.
It’s really hard to see just individual

00:10:26.800 --> 00:10:32.616
LFEs in the continuous seismic data.
And this is the result in a nutshell.

00:10:32.640 --> 00:10:36.960
The peak accuracy is about 86%,
meaning, if you take this network

00:10:36.960 --> 00:10:39.920
and use it as a classification problem,
so you give it some window and you

00:10:39.920 --> 00:10:44.560
say, is there an LFE in this window or
not, it gets it right 86% of the time.

00:10:44.560 --> 00:10:47.280
And that is pretty impressive
considering that PhaseNet,

00:10:47.280 --> 00:10:53.016
which is one of the many deep learning
pickers, but it’s trained on earthquakes,

00:10:53.040 --> 00:10:57.360
its peak accuracy is, I think,
about 95%. And so we’re not –

00:10:57.360 --> 00:11:01.176
we’re doing worse, but we’re not
doing that much worse. [laughs]

00:11:01.200 --> 00:11:04.640
You can also compare the
picking performance of the two.

00:11:04.640 --> 00:11:07.760
And so that’s what I’ve got here.
PhaseNet was trained on

00:11:07.760 --> 00:11:13.520
100-hertz seismic data. So this is
minus 20 samples to 20 samples.

00:11:13.520 --> 00:11:19.336
If you compare the kind of same interval
on this plot down here, minus 20 to 20,

00:11:19.360 --> 00:11:24.776
you can see that the distribution is, like,
not quite as sharp, but it still is peaked.

00:11:24.800 --> 00:11:29.576
And so it’s definitely
getting in the neighborhood,

00:11:29.600 --> 00:11:33.280
but just because probably of the
low amplitude and the emergent

00:11:33.280 --> 00:11:36.376
nature of LFEs, it’s just
not quite doing as well.

00:11:36.400 --> 00:11:39.760
So you can take – after you have
these trained networks, and you can

00:11:39.760 --> 00:11:42.400
go and you can apply them
to continuous seismic data.

00:11:42.400 --> 00:11:48.560
I’ve got one example here for you.
These are – each row here, each panel,

00:11:48.560 --> 00:11:54.056
is a different seismic station
over some short time period.

00:11:54.080 --> 00:11:58.640
The black diamonds mark times of –
arrival times of known LFEs,

00:11:58.640 --> 00:12:03.496
so LFEs from the Shelly et al. catalog –
or, just the Shelly catalog.

00:12:03.520 --> 00:12:10.560
And the circles are when the network
thinks that there could be an LFE.

00:12:10.560 --> 00:12:13.760
And what’s really neat about this is
you can see that the network seems to

00:12:13.760 --> 00:12:18.136
be identifying low-frequency
earthquakes that are in the catalog,

00:12:18.160 --> 00:12:22.560
but it also appears to be identifying
a lot of things that actually do

00:12:22.560 --> 00:12:27.016
look like LFEs. So they have
the right frequency content.

00:12:27.040 --> 00:12:30.960
They get detected, in many cases,
sort of across the network.

00:12:30.960 --> 00:12:33.920
And, if you take and you go the next
step, and you actually template-match

00:12:33.920 --> 00:12:38.320
them, you can get a P wave to emerge.
So this network is detecting known,

00:12:38.320 --> 00:12:42.616
but also what I think
are new, LFEs in Parkfield.

00:12:42.640 --> 00:12:47.336
And so we’re continuing
to explore this today.

00:12:47.360 --> 00:12:52.160
So, just as a very last note here,
we did the same thing.

00:12:52.160 --> 00:12:56.640
So we trained a different network
to detect LFEs in Cascadia using

00:12:56.640 --> 00:12:59.440
Michael Bostock’s catalog from
southern Vancouver Island.

00:12:59.440 --> 00:13:04.080
So this is Vancouver Island.
Each circle is an LFE epicenter,

00:13:04.080 --> 00:13:08.080
and they’re color-coded by the depth.
And then the triangles here are the

00:13:08.080 --> 00:13:11.920
network that Michael
Bostock used to detect LFEs.

00:13:11.920 --> 00:13:15.280
So what we do is we go through
the same training process

00:13:15.280 --> 00:13:20.400
as we did in Parkfield.
And we’ve gotten a little bit further

00:13:20.400 --> 00:13:23.760
along in Cascadia because we’ve assessed
network performance, but we’re actually

00:13:23.760 --> 00:13:26.480
starting to do the data mining,
where we just take and apply

00:13:26.480 --> 00:13:28.960
this network to chunks of
continuous seismic data and

00:13:28.960 --> 00:13:31.760
sort of scan through
each day of data.

00:13:31.760 --> 00:13:35.736
And then we have catalog
detections that we can explore.

00:13:35.760 --> 00:13:41.760
And so this is a figure from Michel et al.
where they did slip inversions for

00:13:41.760 --> 00:13:45.120
slow earthquakes in Cascadia.
And I just want to – I just want to

00:13:45.120 --> 00:13:49.520
bring your attention to this
box right here – this red box.

00:13:49.520 --> 00:13:55.360
This is essentially like the latitude band
that southern Vancouver Island is in.

00:13:55.360 --> 00:13:58.376
And so we’re going to
use this in the next plot.

00:13:58.400 --> 00:14:01.680
So, when we take and we mine
through continuous seismic data,

00:14:01.680 --> 00:14:06.480
we see some pretty exciting things.
So first of all, each row here is

00:14:06.480 --> 00:14:09.360
a different station.
Time is on the X axis,

00:14:09.360 --> 00:14:13.360
and we’re just plotting the number of
detections per day, and we’re comparing

00:14:13.360 --> 00:14:16.160
that to Michael Bostock’s
template-matched catalog.

00:14:16.160 --> 00:14:20.000
And what you see is that there are
near-simultaneous peaks across

00:14:20.000 --> 00:14:25.440
the network in many cases.
And, if we go back to those Michel

00:14:25.440 --> 00:14:30.080
results, these near-simultaneous peaks
line up with known slow-slip events.

00:14:30.080 --> 00:14:35.016
So this thing is detecting
LFEs on a grand scale.

00:14:35.040 --> 00:14:39.280
And it’s doing exactly what we hoped it
would do, which is detect them during

00:14:39.280 --> 00:14:43.200
times of known slow-slip events.
But also, if you look at these,

00:14:43.200 --> 00:14:47.200
they’re really rich in character.
There’s really a lot here that

00:14:47.200 --> 00:14:49.760
should be explored. It looks like
there may be smaller slow-slip

00:14:49.760 --> 00:14:52.400
events that generate LFEs
and so on and so forth.

00:14:52.400 --> 00:14:56.678
So this is just a first cut,
and it all looks very promising.

00:14:57.360 --> 00:15:01.600
I don’t have that much time left,
but just briefly, like, deep learning does

00:15:01.600 --> 00:15:06.936
seem like it can successfully identify
LFEs in a way that is useful for us.

00:15:06.960 --> 00:15:10.536
It’s capable of identifying
new and known LFEs.

00:15:10.560 --> 00:15:15.760
And early tests in taking and applying
these networks to continuous seismic

00:15:15.760 --> 00:15:20.080
data yield really promising results,
i.e., we see lots of LFE detections

00:15:20.080 --> 00:15:22.720
at times we know we should
see lots of LFE detections.

00:15:22.720 --> 00:15:27.736
And then there’s even more rich stuff in
the catalog that we haven’t yet explored.

00:15:27.760 --> 00:15:33.496
It’s probably best to go ahead and
use an approach that combines

00:15:33.520 --> 00:15:36.480
these deep neural networks
with template-matching.

00:15:36.480 --> 00:15:41.280
And we can talk about that in the
discussion section after this set of talks.

00:15:41.280 --> 00:15:45.200
But I just wanted to leave you over
here with – if you remember the

00:15:45.200 --> 00:15:51.120
dismal performance of the CNN
picker trained on earthquakes on LFEs,

00:15:51.120 --> 00:15:57.520
this is our LFE-trained picker looking at
LFEs, and you can see that it seems

00:15:57.520 --> 00:16:03.789
to be doing a little bit better.
So thank you for your attention.