subtitle extraction

This commit is contained in:
grqx_wsl 2024-10-06 00:24:46 +13:00
parent 5b1a168cea
commit 772e292c33
6 changed files with 4535 additions and 10 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,896 @@
WEBVTT FILE
1
00:00:03.800 --> 00:00:12.480
After only his second weekend off in
two months, Corporal Willie Apiata,
VC, is hard at work for the NZSAS.
2
00:00:12.520 --> 00:00:14.520
BOMBS EXPLODE
3
00:00:15.080 --> 00:00:17.080
GUNS FIRE
4
00:00:18.120 --> 00:00:20.120
MEN SHOUT INDISTINCTLY
5
00:00:20.680 --> 00:00:26.400
His time is a constant juggle
between the intense training
required of a Special Forces soldier
6
00:00:26.440 --> 00:00:30.480
and the demands placed on
being the country's newest hero.
7
00:00:30.520 --> 00:00:38.160
This calibre of this award is
something you can't hide, something
you couldn't keep under wraps.
8
00:00:38.200 --> 00:00:44.840
Among a stream of invitations
Willie receives, one has great
personal significance.
9
00:00:44.880 --> 00:00:48.960
It's from the 375 residents
of a tiny coastal community
10
00:00:49.000 --> 00:00:54.200
in the Eastern Bay of Plenty
called Te Kaha, Willie's hometown.
11
00:00:54.240 --> 00:01:00.840
They're very forthcoming and humble
people, and they really looked after
us when we moved there.
12
00:01:00.880 --> 00:01:06.800
They teach you to value the land
where you live, appreciate what
you have.
13
00:01:06.840 --> 00:01:13.520
Living down there is unspoilt and
hasn't been commercialised, as you
could say.
14
00:01:16.280 --> 00:01:19.800
Up until now, Willie has been
supported by a close team,
15
00:01:19.840 --> 00:01:25.440
personally led by his commanding
officer. But this weekend, that all
changes.
16
00:01:25.480 --> 00:01:29.400
It wasn't about us. It wasn't about
the SAS. It was wider than that.
17
00:01:29.440 --> 00:01:31.880
It was about Willie
Apiata and his iwi.
18
00:01:31.920 --> 00:01:36.280
The homecoming at Te Kaha is
a trip Willie must take alone.
19
00:01:36.320 --> 00:01:38.520
MAN CHANTS MAORI GREETING
20
00:01:51.040 --> 00:01:57.640
The elders of this community have
pulled out all the stops, as the
population swells to nearly 4000,
21
00:01:57.680 --> 00:02:03.000
making Willie's visit home the
biggest gathering of Maori in the
district in over a century.
22
00:02:03.040 --> 00:02:05.040
WOMAN CHANTS KARANGA
23
00:02:12.240 --> 00:02:14.240
It was a huge thing.
24
00:02:14.520 --> 00:02:17.720
All those people
down there, you know.
25
00:02:18.480 --> 00:02:23.160
It sort of took something like this
to really bring them all together.
26
00:02:23.200 --> 00:02:25.280
ALL SING 'WHAKAARIA MAI'
27
00:02:26.280 --> 00:02:32.600
Decreed a hui of national
significance, the homecoming is not
just to honour Willie.
28
00:02:32.640 --> 00:02:38.920
You see all those photos of the
fallen soldiers. They're there
watching the occasion as well.
29
00:02:38.960 --> 00:02:45.120
Soldiers like Lieutenant
Te Moana-Nui-a-Kiwa Ngarimu -
awarded the Victoria Cross,
30
00:02:45.160 --> 00:02:48.560
but killed in action
during World War II.
31
00:02:49.400 --> 00:02:55.480
You could feel in the air... at
every stage, you could feel our
ancestors.
32
00:02:55.640 --> 00:02:57.720
You feel their presence.
33
00:02:59.040 --> 00:03:02.240
ALL CONTINUE SINGING
'WHAKAARIA MAI'
34
00:03:03.520 --> 00:03:09.560
Tomorrow will be the official
celebration, but tonight, a local
son is welcomed home.
35
00:03:09.600 --> 00:03:14.440
Before the announcement of the
Victoria Cross, few knew Willie
was a soldier,
36
00:03:14.480 --> 00:03:18.920
let alone a decorated member of
the country's elite fighting unit.
37
00:03:18.960 --> 00:03:20.640
Welcome home, boy.
38
00:03:20.680 --> 00:03:26.360
For them to find out like that,
I think they're still trying to get
over the shock of it, as well.
39
00:03:26.400 --> 00:03:28.120
Really proud people.
40
00:03:28.160 --> 00:03:34.960
The one-time mischievous kid has
returned a mature and much-loved
warrior hero.
41
00:03:36.640 --> 00:03:40.320
Tena koe. We're proud
of you. So proud of you.
42
00:03:47.000 --> 00:03:49.000
MACHINE GUNS FIRE
43
00:03:49.720 --> 00:03:53.600
A NZ Special Forces patrol in
Afghanistan was on the offensive
44
00:03:53.640 --> 00:04:00.840
after being attacked in the early
hours of the morning by a large
number of insurgents.
45
00:04:05.000 --> 00:04:08.080
For us, it was like a
blind hit from nowhere,
46
00:04:08.120 --> 00:04:13.920
and it's a quick orientation to the
threat, which is what the guys did.
47
00:04:14.200 --> 00:04:19.280
Despite his brush with death,
Corporal Willie Apiata threw
himself into the fray.
48
00:04:19.320 --> 00:04:23.560
What I saw was Willie covered in
blood, wrapped up with a machine-gun
belt,
49
00:04:23.600 --> 00:04:30.240
carrying a GPMG - General Purpose
Machine Gun - looking like he wanted
to do business.
50
00:04:30.280 --> 00:04:32.280
MACHINE GUNS FIRE
51
00:04:41.720 --> 00:04:44.400
Started getting some fire back.
52
00:04:50.760 --> 00:04:56.040
I honestly believed that they
bit off more than they could chew.
53
00:05:04.600 --> 00:05:08.400
They'd stopped firing...
long before we did.
54
00:05:17.560 --> 00:05:22.680
On Saturday morning, Willie Apiata
is formally welcomed on to the
marae,
55
00:05:22.720 --> 00:05:26.280
and the emotion of the
occasion is overwhelming.
56
00:05:26.320 --> 00:05:33.320
I didn't really realise the scale
the event was going to be until I
turned up that day.
57
00:05:35.240 --> 00:05:37.240
WOMEN CHANT POWHIRI
58
00:05:49.400 --> 00:05:56.280
Those people were so proud to have
me come back home. That will hang
with me forever.
59
00:06:04.800 --> 00:06:07.800
Looking good, Wills. Looking good.
60
00:06:07.960 --> 00:06:12.800
Wearing a precious cloak and seated
at the forefront of all dignitaries,
61
00:06:12.840 --> 00:06:20.920
Willie receives a line-up of
speakers whose oratory represents
the feelings of the many guests.
62
00:06:21.000 --> 00:06:23.000
(SPEAKS MAORI)
63
00:06:37.440 --> 00:06:39.440
(SPEAKS MAORI)
64
00:06:45.120 --> 00:06:47.120
(SPEAKS MAORI)
65
00:07:01.920 --> 00:07:07.600
Emotions that I was feeling that
day,... there are no words for them.
66
00:07:08.800 --> 00:07:13.320
Willie is called to the whare nui,
where he is handed a centuries-old
mere,
67
00:07:13.360 --> 00:07:19.680
carved from the highest strike of
greenstone. Flanked by the images of
past generations of soldiers,
68
00:07:19.720 --> 00:07:26.600
Willie turns and formally
acknowledges the people,
his tears barely kept in check.
69
00:07:26.640 --> 00:07:28.640
CROWD CHANTS HAKA
70
00:08:09.320 --> 00:08:11.800
Willie. Well done. Well done.
71
00:08:12.560 --> 00:08:14.560
Welcome home.
72
00:08:15.200 --> 00:08:20.040
Warrior to warrior, Willie now
greets the surviving war veterans.
73
00:08:20.080 --> 00:08:23.720
Thank you for the honour.
Thank you for knowing you.
74
00:08:23.760 --> 00:08:30.840
Their respect is such that the
sight of the Victoria Cross is
too much for some to bear.
75
00:08:34.800 --> 00:08:42.280
I feel all the mana coming off
those men. They're so proud.
They're so honoured to be there.
76
00:08:43.320 --> 00:08:49.320
It takes Willie several minutes
to make his way through the
well-wishers to the official dinner.
77
00:08:49.360 --> 00:08:53.640
Can we move back? More space
to move straight through.
78
00:08:53.680 --> 00:08:59.080
It must seem a long way for Willie,
from wherever it was in Afghanistan,
79
00:08:59.120 --> 00:09:02.920
back home to be celebrated
for what he's done.
80
00:09:04.880 --> 00:09:12.280
In government, we hear whenever our
people are involved in some kind of
action overseas.
81
00:09:13.080 --> 00:09:20.360
Word comes back. 'There's been an
incident, an action.' Eventually,
we hear a bit more.
82
00:09:21.560 --> 00:09:28.560
In this case, eventually, the Chief
of Defence came in and he said,
'Something extraordinary happened.'
83
00:09:28.600 --> 00:09:35.480
We started documenting this story,
looking at the precedents, and,
on that basis,
84
00:09:37.560 --> 00:09:45.160
I was advised to recommend to
Her Majesty the Queen that Corporal
Willie Apiata be recognised.
85
00:09:45.200 --> 00:09:47.200
APPLAUSE
86
00:09:52.960 --> 00:09:57.880
It wasn't just me there that day.
There was a lot of other men on the
ground.
87
00:09:57.920 --> 00:10:03.760
And... we all know we are a tight
family and we just look after each
other,
88
00:10:03.800 --> 00:10:06.280
and that's just the way we are.
89
00:10:07.120 --> 00:10:14.920
There's an old saying, 'Who are
ye in rags and rotten shoes?
The bearded ones blocking the way.
90
00:10:15.200 --> 00:10:23.800
'We are the pilgrims, master.' We're
just humble men - just ordinary
blokes, just like everybody here.
91
00:10:25.680 --> 00:10:31.360
My heart goes out to everybody here.
Everyone. We are one. One Maori.
92
00:10:33.560 --> 00:10:36.440
Tena koutou. Kia ora koutou katoa.
93
00:10:41.160 --> 00:10:44.160
# Maori Battalion
march to victory,
94
00:10:45.280 --> 00:10:48.280
# Maori Battalion
staunch and true.
95
00:10:49.840 --> 00:10:52.640
# Maori Battalion march to glory,
96
00:10:54.080 --> 00:10:57.360
# take the honour of
the people with you.
97
00:10:58.520 --> 00:11:02.320
# And we will march,
march, march to the enemy.
98
00:11:02.880 --> 00:11:05.880
# And we will fight
right to the end. #
99
00:11:23.480 --> 00:11:28.600
Passchendaele - out of the seasonal
mist unique to this corner of Europe
100
00:11:28.640 --> 00:11:36.520
comes a line-up of Commonwealth war
veterans, high-ranking officials and
national leaders.
101
00:11:36.760 --> 00:11:41.400
They have come from all over to
pay their respects at the
90th anniversary
102
00:11:41.440 --> 00:11:45.240
of one of the bloodiest
conflicts of the First World War.
103
00:11:45.280 --> 00:11:49.040
Passchendaele represents the
greatest loss of life to NZ
servicemen
104
00:11:49.080 --> 00:11:53.800
in any battle or campaign or war
that NZers have ever been involved
in.
105
00:11:53.840 --> 00:11:56.640
It exceeds Gallipoli in
terms of the loss of life.
106
00:11:56.680 --> 00:12:01.160
It's extraordinary, and so many
NZers went to their death at
Passchendaele,
107
00:12:01.200 --> 00:12:04.840
over literally several
hundred metres of dirt.
108
00:12:04.880 --> 00:12:09.560
Among the NZ dignitaries is
Corporal Willie Apiata, VC.
109
00:12:10.680 --> 00:12:16.400
Tremendously important, but Willie,
as our most recent VC winner -
the first since World War II -
110
00:12:16.440 --> 00:12:20.320
was there for the country,
but also for Willie.
111
00:12:20.720 --> 00:12:23.480
This day will prove a
significant turning point
112
00:12:23.520 --> 00:12:28.800
in Willie's understanding of the
medal he now so proudly bears.
113
00:12:29.520 --> 00:12:36.920
I felt very honoured, going over,
but, you know, I was seeking
knowledge at the same time.
114
00:12:37.440 --> 00:12:43.320
When the official party arrives at
Tyne Cot, the largest Commonwealth
cemetery in the world,
115
00:12:43.360 --> 00:12:47.160
the true sense of scale
and loss is felt by all.
116
00:12:49.880 --> 00:12:52.600
This is an occasion to honour
and remember those
117
00:12:52.640 --> 00:12:57.320
who made the supreme sacrifice
in the War to End all Wars.
118
00:12:58.200 --> 00:13:00.200
# Hold thou thy cross
119
00:13:02.600 --> 00:13:04.680
# before my closing eyes.
120
00:13:11.120 --> 00:13:14.800
# Shine through the
gloom and point me to... #
121
00:13:19.920 --> 00:13:26.120
With hundreds of NZers laid to rest
here, Tyne Cot has a profound effect
on Willie.
122
00:13:26.160 --> 00:13:30.080
I saw the walls with all their names
on it. That's when it hits home -
123
00:13:30.120 --> 00:13:34.400
that so many men just walked
out of their trench, eh,
124
00:13:36.920 --> 00:13:39.520
and knew they were going to die.
125
00:13:39.760 --> 00:13:41.760
# Abide with me. #
126
00:13:54.440 --> 00:14:01.920
For the rest of the day, Willie
visits sites where thousands of
young men lost their lives.
127
00:14:02.560 --> 00:14:07.160
The Kiwis took 20 minutes to get
out of the front line, get across
no-man's-land
128
00:14:07.200 --> 00:14:10.040
and to the German front
line, so they moved fast.
129
00:14:10.080 --> 00:14:18.560
The Germans knew where they were and
started to shell them. That's when
we took most of our casualties.
130
00:14:20.640 --> 00:14:24.400
Almighty God, in whose hands
are the living and the dead.
131
00:14:24.440 --> 00:14:31.080
We give you thanks for all your
servants who have laid down their
lives in service of our country.
132
00:14:31.120 --> 00:14:36.000
Grant them your mercy in the
light of your presence. Amene.
133
00:14:36.440 --> 00:14:38.440
CHURCH BELL TOLLS
134
00:14:38.800 --> 00:14:43.680
Confronted with the huge loss of
life on an unprecedented scale at
Passchendaele,
135
00:14:43.720 --> 00:14:48.000
Willie's own brush with death
now takes on a new meaning.
136
00:14:48.040 --> 00:14:54.120
All the people I know that have
been awarded them are just ordinary
blokes, and that's all they are.
137
00:14:54.160 --> 00:15:02.640
Just normal blokes just looking
out for their mates and doing what
people call extraordinary things.
138
00:15:05.360 --> 00:15:11.000
It takes some time to sink in, but I
think Passchendaele was a watershed
time
139
00:15:11.040 --> 00:15:16.320
for him to really understand,
now, what it was really all about.
140
00:15:17.480 --> 00:15:25.880
Departing from their official
schedule, Willie returns to Tyne Cot
Cemetery, this time in private.
141
00:15:33.120 --> 00:15:38.520
We went back to Tyne Cot, and that's
where, you know, I had my time.
142
00:15:42.240 --> 00:15:45.520
Quite heavy on the
chest there, Willie.
143
00:15:52.080 --> 00:15:54.080
WHISPERS: Jesus.
144
00:15:55.040 --> 00:15:57.120
What were you doing at 21?
145
00:15:58.480 --> 00:16:01.680
It's the first Kiwi
one we've seen, eh?
146
00:16:04.720 --> 00:16:06.640
There's another one of the brothers.
147
00:16:06.680 --> 00:16:10.120
It makes you feel quite unworthy,
walking amongst all those
gravestones,
148
00:16:10.160 --> 00:16:15.960
knowing that they put more forward
than any man could ever ask of them.
149
00:16:48.800 --> 00:16:56.000
As dusk falls, the two soldiers find
a grave with the familiar markings
of a Victoria Cross.
150
00:16:56.040 --> 00:16:58.040
Tena koe, e hoa.
151
00:17:07.320 --> 00:17:10.200
That's one of your brothers, mate.
152
00:17:13.000 --> 00:17:21.280
Like so many other Victoria Cross
winners, Canadian Private JP
Robertson died on the battlefield.
153
00:17:31.200 --> 00:17:35.720
It's a tribe that you've
been... basically awarded into.
154
00:17:35.760 --> 00:17:41.360
And that's a forefather that's gone
before you, even though he's not a
Kiwi.
155
00:17:41.400 --> 00:17:47.600
But it's someone that now- That
carried the burden that I have to
carry now.
156
00:18:10.640 --> 00:18:15.600
It's four months since Corporal
Willie Apiata of the NZ
Special Air Service
157
00:18:15.640 --> 00:18:19.240
was awarded the
Victoria Cross for valour,
158
00:18:19.440 --> 00:18:24.960
and in that time he has come to
terms with the fact that he is now
a national hero
159
00:18:25.000 --> 00:18:29.880
and the latest member of a
truly unique association of men.
160
00:18:30.640 --> 00:18:36.520
Suddenly been automatically enlisted
into a club of very few members.
161
00:18:38.360 --> 00:18:43.640
Accompanied by his commanding
officer, Willie has requested
a meeting with his lawyers.
162
00:18:43.680 --> 00:18:45.960
Hi, Jim.
Nice to see you.
And you.
163
00:18:46.000 --> 00:18:49.000
He has reached a momentous decision.
164
00:18:49.040 --> 00:18:53.120
Thought long and hard about
what I wanna do with the medal.
165
00:18:53.160 --> 00:18:59.800
Wasn't just earned by me. It was
earned by all those men that were
out there that day.
166
00:18:59.840 --> 00:19:04.600
In simple terms, it seems to me that
you're giving away more than you've
ever had,
167
00:19:04.640 --> 00:19:06.720
almost before you've got it.
168
00:19:06.760 --> 00:19:10.760
It'll never be sold, or there will
never be any quarrels over it.
169
00:19:10.800 --> 00:19:17.000
As you know, what this does is gift
the Victoria Cross, effectively, to
NZ.
170
00:19:23.640 --> 00:19:25.640
Congratulations.
171
00:19:32.440 --> 00:19:40.520
When my life has passed, my son, his
sons and our bloodline will be able
to wear it and represent me.
172
00:19:43.080 --> 00:19:47.080
The resting place for it
will be here in the unit.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,43 @@
WEBVTT FILE
1
00:00:01.800 --> 00:00:04.800
# Maori Battalion
march to victory.
2
00:00:05.680 --> 00:00:08.680
# Maori Battalion
staunch and true.
3
00:00:10.360 --> 00:00:13.160
# Maori Battalion march to glory,
4
00:00:14.880 --> 00:00:18.160
# take the honour of
the people with you.
5
00:00:19.200 --> 00:00:23.000
# And we will march,
march, march to the enemy,
6
00:00:23.640 --> 00:00:26.640
# and we will fight
right to the end. #
7
00:00:26.640 --> 00:00:28.640
Captions by Able.
8
00:00:28.640 --> 00:00:30.640
www.able.co.nz
9
00:00:30.640 --> 00:00:32.640
Copyright Able 2016

View File

@ -5,6 +5,7 @@
strip_or_none,
traverse_obj,
url_or_none,
urlhandle_detect_ext,
)
@ -87,12 +88,16 @@ class NZOnScreenIE(InfoExtractor):
'format_id': 'hi',
'height': 360,
'width': 640,
'subtitles': {
'en': [{'ext': 'SRT', 'data': 'md5:c2469f71020a32e55e228b532ded908f'}],
},
'title': 'Reluctant Hero (clip 1)',
'description': 'Part one of four from this full length documentary.',
'display_id': 'reluctant-hero-2008',
'duration': 1108.0,
'thumbnail': r're:https://www\.nzonscreen\.com/content/images/.+\.jpg',
},
'params': {'writesubtitles': True},
}]
def _extract_formats(self, playlist):
@ -108,12 +113,19 @@ def _extract_formats(self, playlist):
'filesize_approx': float_or_none(traverse_obj(playlist, ('h264', f'{id_}_res_mb')), invscale=1024**2),
})
if formats:
formats[-1].update({
'height': int_or_none(playlist.get('height')),
'width': int_or_none(playlist.get('width')),
})
formats[-1].update(traverse_obj(playlist, {
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
}))
return formats
def _get_subtitles(self, playinfo, video_id):
if caption := traverse_obj(playinfo, ('h264', 'caption_url')):
subtitle, urlh = self._download_webpage_handle(
'https://www.nzonscreen.com' + caption, video_id, 'Downloading subtitles')
if subtitle:
return {'en': [{'ext': urlhandle_detect_ext(urlh), 'data': subtitle}]}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@ -121,21 +133,21 @@ def _real_extract(self, url):
self._html_extract_title(webpage, default=None)
or self._og_search_title(webpage)).rsplit('|', 2)[0])
playlist = self._download_json(
f'https://www.nzonscreen.com/html5/video_data/{video_id}', video_id, 'downloading media data')
f'https://www.nzonscreen.com/html5/video_data/{video_id}', video_id, 'Downloading media data')
# TODO: extract subtitles
if len(playlist) == 1:
playinfo = playlist[0]
return {
'alt_title': title,
'display_id': video_id,
'formats': list(self._extract_formats(playinfo)),
'http_headers': {
'Referer': 'https://www.nzonscreen.com/',
'Origin': 'https://www.nzonscreen.com/',
},
'subtitles': self.extract_subtitles(playinfo, video_id),
**traverse_obj(playinfo, {
'id': ('uuid'),
'formats': {self._extract_formats},
'id': 'uuid',
'title': ('label', {strip_or_none}),
'description': ('description', {strip_or_none}),
'thumbnail': ('thumbnail', 'path'),
@ -145,13 +157,14 @@ def _real_extract(self, url):
else:
return self.playlist_result([{
'display_id': video_id,
'formats': list(self._extract_formats(playinfo)),
'http_headers': {
'Referer': 'https://www.nzonscreen.com/',
'Origin': 'https://www.nzonscreen.com/',
},
'subtitles': self.extract_subtitles(playinfo, video_id),
**traverse_obj(playinfo, {
'id': ('uuid'),
'formats': {self._extract_formats},
'id': 'uuid',
'title': ('label', {strip_or_none}),
'description': ('description', {strip_or_none}),
'thumbnail': ('thumbnail', 'path'),