使用定制的NER模型的Stanford OpenIE（nerlove模型）

25-02-14 15

如果您对使用定制的NER模型的StanfordOpenIE和nerlove模型感兴趣，那么这篇文章一定是您不可错过的。我们将详细讲解使用定制的NER模型的StanfordOpenIE的各种细节，并对n

如果您对使用定制的NER模型的Stanford OpenIE和nerlove模型感兴趣，那么这篇文章一定是您不可错过的。我们将详细讲解使用定制的NER模型的Stanford OpenIE的各种细节，并对nerlove模型进行深入的分析，此外还有关于2013 Stanford公开课 Developing iOS 7 Apps for iPhone and iPad 讲义分享、58同城AI Lab在WeNet中开源Efficient Conformer模型、Commencement Address at Stanford University、Conformer模型的构建和特性的实用技巧。

本文目录一览：

使用定制的NER模型的Stanford OpenIE（nerlove模型）
2013 Stanford公开课 Developing iOS 7 Apps for iPhone and iPad 讲义分享
58同城AI Lab在WeNet中开源Efficient Conformer模型
Commencement Address at Stanford University
Conformer模型的构建和特性

使用定制的NER模型的Stanford OpenIE（nerlove模型）

我正在尝试使用斯坦福大学的OpenIE（版本3.6.0）基于我在化学领域训练的NER模型提取关系三元组。但是，我无法让OpenIE根据我自己的NER模型提取关系三元组。看来OpenIE仅基于软件包中提供的默认NER模型提取关系三元组。

以下是我训练和部署NER模型所做的工作：

根据http://nlp.stanford.edu/software/crf-faq.html#a训练NER模型。
在CoreNLP服务器中部署NER模型，然后重新启动服务器。我在中修改了props属性corenlpserver.sh。props现在，该属性如下所示：

props=”-Dner.model=$scriptdir/my_own_chemistry.ser.gz,edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz”

请在此处查看NER + OpenIE结果示例。在此示例中，我希望OpenIE在通过NER模型重新识别的实体（例如Cl，Br和Windjana）上建立关系三元组，但事实并非如此。OpenIE是否可以基于自训练的NER模型提取关系三元组？如果是这样，请您给我一些简短的说明。

提前致谢！

答案1

小编典典

与OpenIE的作者联系，作者确认OpenIE或多或少完全忽略了NER。希望这可以帮助其他有相同问题的人。

2013 Stanford公开课 Developing iOS 7 Apps for iPhone and iPad 讲义分享

itunes上已经更新了2013年最新的基于iOS7的公开课，依旧是斯坦福的公开课，讲师也依旧是哪位性感小白胡须的小老头。

视频太大啦。家里宽带拙计。建议各位客观去itunes观看吧，itunes的下载速度基本都能达到峰值，因为现在苹果再国内貌似是建立的有数据中心。但是如果很慢的话，建议你配置一下DNS就可以了。到这里：http://dns.v2ex.com/ 用它的DNS。实测还是速度很好的。

建议再itunes里看还有一个原因就是，带有英文的字幕。基本8090%应该是能看懂的。

我用的上海长城宽带，没有特地配置DNS，速度都是峰值。还不错。

讲义下载地址：

http://pan.baidu.com/s/1zit3q

58同城AI Lab在WeNet中开源Efficient Conformer模型

目

录

1.模型介绍

2.模型实现

3.流式推理

4.实验结果

2022年8月，58同城TEG-AI Lab语音技术团队完成了WeNet端到端语音识别的大规模落地，替换了此前基于Kaldi的系统，并针对业务需求对识别效果和推理速度展开优化，取得了优异的效果，当前录音文件识别引擎处理语音时长达1000万小时/年，流式语音识别引擎支持语音对话量超过5000万次/年，详细工作可以参考《58同城：WeNet端到端语音识别大规模落地方案[1]》。

在优化工作中，我们复现了Efficient Conformer[2]模型，在实际场景数据上，与Kaldi最优模型相比，识别效果上CER绝对降低3%，解码性能提升61%。与Conformer相比，识别效果上CER从10.01%降低至9.30%，解码性能提升10%，结合int8量化，解码性能可提升60%。我们也在AISHELL-1公开数据集上进行了评测，CER为4.56%（No LM）。模型代码已开源至WeNet[3]。

本文主要介绍我们对Efficient Conformer的复现工作，包含：模型介绍、模型实现、流式推理支持以及相关实验结果。

模型介绍

Conformer结构在语音识别领域取得了非常好的效果，已经被广泛应用于各种模型架构中。为了降低Conformer的计算复杂度、加快推理速度、减少所需的计算资源，Efficient Conformer对Conformer做出改进，提出了几种高效模型结构。实验结果证明，Efficient Conformer相比Conformer模型取得了更好的识别效果，以及更快的训练和解码速度。

Efficient Conformer的主要改进点如下：

Progressive Downsampling：Efficient Conformer Block在卷积模块中增加了下采样操作，降低时间维度，从而减小下采样后的(Efficient) Conformer Block的计算复杂度；
Grouped Attention：Efficient Conformer Block改进了Multi-Head Self Attention，增加grouped操作将自注意力模块的计算复杂度从O(n2d)降低为O(n2d/g)，n为时间维度，d为隐层维度，g为group_size。

1.1 Progressive Downsampling

Efficient Conformer Encoder不同于典型Conformer，Conformer Block之前的下采样层使用1/2 subsampling（conv2d2）替换原始Conformer的1/4 subsampling（conv2d），其后一共包含三个stage，如上图右侧所示。前两个stage在N个Conformer Block之后叠加Downsampling Block，沿着时间维度进行下采样；最后一个stage叠加N个Conformer Block，不再叠加Downsampling Block。

Downsampling Block结构如下图所示，实现方式为将Conformer Block中的DepthwiseConv对应的stride设置为大于1的值，从而实现时间维度下采样。因为下采样后输出的shape比输入的shape小，因此残差模块需要增加Pointwise Projection模块将输入和输出映射到相同的维度。

1.2 Grouped Multi-head Self-Attention（Grouped MHSA）

传统Multi-head Self-Attention模块中Q、K、V大小为(n, d)，该模块的计算复杂度为O(n2d)；Grouped MHSA 首先将Q、K、V的维度变换为(n/g, d*g)，其中g为group_size，再进行attention计算，最后将维度变换为原始的(n, d)。变换后，注意力模块的计算复杂度可降低为O(n2d/g)。

此外，作者还提出了Stride Multi-Head Self-Attention、

Relative Multi-Head Self-Attention 和 Local Multi-Head Self-Attention 等高效MHSA结构，感兴趣的朋友可以查阅原论文和代码。

模型实现

我们参考Efficient Conformer的代码[4]在WeNet开源项目上进行了复现，即在wenet文件夹下增加了efficient_conformer模块。

2022年12月26日我们向WeNet开源项目提交PR，贡献代码1426行，2023年1月4日被WeNet正式合并，主要开发者：周维、王亚如、李咏泽。

主要模块的实现细节如下：

2.1 Strided Convolution

（1）在depthwise_conv模块初始化定义时传入下采样步长stride参数，实现沿时间维度的下采样（convolution.py）。

self.depthwise_conv = nn.Conv1d( channels, channels, kernel_size, stride=stride, # for depthwise_conv in StrideConv padding=padding, groups=channels, bias=bias,)

（2）mask同步下采样（convolution.py）：卷积模块增加下采样后，用于返回的输出数据的shape减小，对应返回的mask也应进行相同stride的下采样，以保持输出数据和mask的匹配。

# mask batch paddingif mask_pad.size(2) > 0: # time > 0 if mask_pad.size(2) != x.size(2): mask_pad = mask_pad[:, :, ::self.stride] x.masked_fill_(~mask_pad, 0.0)

（3）带pointwise projection layer的残差结构（encoder_layer.py）：由于卷积模块增加了下采样，导致输出维度小于输入维度，因此卷积模块对应的残差模块需要增加下采样操作。

# add pointwise_conv for efficient conformerif self.pointwise_conv_layer is not None: residual = residual.transpose(1, 2) residual = self.pointwise_conv_layer(residual) residual = residual.transpose(1, 2) assert residual.size(0) == x.size(0) assert residual.size(1) == x.size(1) assert residual.size(2) == x.size(2)

2.2 Grouped Multi-Head Self-Attention

（1）初始化中传入group_size参数，并重新定义位置编码偏置矩阵大小（attention.py）。

class GroupedRelPositionMultiHeadedAttention(MultiHeadedAttention): def __init__(self, n_head, n_feat, dropout_rate, group_size=3): """Construct an RelPositionMultiHeadedAttention object.""" super().__init__(n_head, n_feat, dropout_rate) # linear transformation for positional encoding self.linear_pos = nn.Linear(n_feat, n_feat, bias=False) self.group_size = group_size self.d_k = n_feat // n_head # for GroupedAttention self.n_feat = n_feat # these two learnable bias are used in matrix c and matrix d # as described in https://arxiv.org/abs/1901.02860 Section 3.3 self.pos_bias_u = nn.Parameter(torch.Tensor(self.h, self.d_k * self.group_size)) self.pos_bias_v = nn.Parameter(torch.Tensor(self.h, self.d_k * self.group_size)) torch.nn.init.xavier_uniform_(self.pos_bias_u) torch.nn.init.xavier_uniform_(self.pos_bias_v)

（2）增加padding和reshape函数pad4group（attention.py）：对Q、K、V、P在时间维度上根据group_size进行padding，保证可被group_size整除；padding会进行补0操作，不丢弃原始数据；padding之后即可按照group_size对Q、K、V、P进行维度变换；由于维度变换之后时间维度降低，同步地mask也需要降维，此处直接对mask下采样即可。

def pad4group(self, Q, K, V, P, mask, group_size: int = 3): # Compute Overflows overflow_Q = Q.size(2) % group_size overflow_KV = K.size(2) % group_size

padding_Q = (group_size - overflow_Q) * int( overflow_Q // (overflow_Q + 0.00000000000000001)) padding_KV = (group_size - overflow_KV) * int( overflow_KV // (overflow_KV + 0.00000000000000001))

batch_size, _, seq_len_KV, _ = K.size()

# Input Padding (B, T, D) -> (B, T + P, D) Q = F.pad(Q, (0, 0, 0, padding_Q), value=0.0) K = F.pad(K, (0, 0, 0, padding_KV), value=0.0) V = F.pad(V, (0, 0, 0, padding_KV), value=0.0)

if mask is not None and mask.size(2) > 0 : # time2 > 0: mask = mask[:, ::group_size, ::group_size]

Q = Q.transpose(1, 2).contiguous().view( batch_size, -1, self.h, self.d_k * group_size).transpose(1, 2) K = K.transpose(1, 2).contiguous().view( batch_size, -1, self.h, self.d_k * group_size).transpose(1, 2) V = V.transpose(1, 2).contiguous().view( batch_size, -1, self.h, self.d_k * group_size).transpose(1, 2)

# process pos_emb P_batch_size = P.size(0) overflow_P = P.size(1) % group_size padding_P = group_size - overflow_P if overflow_P else 0 P = F.pad(P, (0, 0, 0, padding_P), value=0.0) P = P.view(P_batch_size, -1, self.h, self.d_k * group_size).transpose(1, 2)

return Q, K, V, P, mask, padding_Q

（3）forward_attention函数中（attention.py），在attention计算完毕后，输出数据需要变换为padding之后的维度，再去掉padding部分，截取有效输出。

# n_feat!=h*d_k may be happened in GroupAttentionx = (x.transpose(1, 2).contiguous().view(n_batch, -1, self.n_feat) ) # (batch, time1, d_model)if padding_q is not None: # for GroupedAttention in efficent conformer x = x[:, :x.size(1) - padding_q]

2.3 其它细节实现

（1）pointwise projection（encoder.py）

由于卷积模块进行了下采样，导致输入和输出维度不匹配，残差模块需要增加pointwise projection layer或下采样层将输入和输出维度进行统一。该模块在实现上有多种选择，比如可以使用卷积下采样或Pooling下采样。

我们在实验中对比了卷积下采样（kernel=3, stride=2, causal=true）和Pooling下采样（AvgPool1d）两种实现方式。实验结果显示，卷积下采样效果不如Pooling；且考虑到卷积的流式实现需要设置causal为true、并开辟和维护cache，而Pooling没有参数、不需要cache，简单适配即可直接用于流式训练和解码，最终我们采用AvgPool1d实现卷积模块的残差下采样。

# conformer module definitionif i in self.stride_layer_idx: # conformer block with downsampling convolution_layer_args_stride = ( output_size, self.cnn_module_kernels[index], activation, cnn_module_norm, causal, True, self.stride[index]) layers.append(StrideConformerEncoderLayer( output_size, encoder_selfattn_layer(*encoder_selfattn_layer_args), positionwise_layer(*positionwise_layer_args), positionwise_layer( *positionwise_layer_args) if macaron_style else None, convolution_layer( *convolution_layer_args_stride) if use_cnn_module else None, torch.nn.AvgPool1d( kernel_size=self.stride[index], stride=self.stride[index], padding=0, ceil_mode=True, count_include_pad=False), # pointwise_conv_layer dropout_rate, normalize_before, concat_after, ))

（2）不同encoder layer attention维度不变（encoder.py）

原始论文中，为了平衡下采样前后不同层的计算量，不同stage的Attention维度不同，且逐渐增大；我们的实现中为了简单方便灵活、节约计算资源、减小计算量，所有layer的Attention维度均保持一致；仅使用Grouped MHSA对不同layer的计算复杂度进行平衡，可以灵活设计添加Grouped MHSA模块的位置及group_size。通常选择连续对Downsampling Block之前的N层增加grouped操作。

流式推理

WeNet框架在流式推理时调用encoder中的forward_chunk接口，音频被划分为chunk_size大小输入到模型，在Conformer架构下实现流式需要记录Attention的K和V作为下一个chunk模型推理的att_cache。另外，流式场景下depthwize_conv通常使用因果卷积（Casual Convolution），因此需要记录隐变量的后 kernel_size-1 维向量作为cnn_cache，以便在下一个chunk卷积计算时使用，如下图所示。

而Efficient Conformer在Strided Convolution层会对输入向量在时间维度做下采样操作，导致cache在时间维度缩短，为了保证接口不变，需要在forward_chunk中对每层的cache做适当的“填充”。

我们首先参考Squeezeformer的实现方式，用以下函数计算每层的下采样倍数（encoder.py）。

def calculate_downsampling_factor(self, i: int) -> int: factor = 1 for idx, stride_idx in enumerate(self.stride_layer_idx): if i > stride_idx: factor *= self.stride[idx] return factor

对于att_cache，我们在时间维度重复factor倍数，并在使用att_cache时按照对应的factor进行下采样（encoder.py）。

for i, layer in enumerate(self.encoders): factor = self.calculate_downsampling_factor(i) xs, _, new_att_cache, new_cnn_cache = layer( xs, att_mask, pos_emb, mask_pad=mask_pad, att_cache=att_cache[i:i + 1, :, ::factor, :], cnn_cache=cnn_cache[i, :, :, :] if cnn_cache.size(0) > 0 else cnn_cache ) # shape(new_att_cache) = [batch, head, time2, outdim] new_att_cache = new_att_cache[:, :, next_cache_start // factor:, :] # use repeat_interleave to new_att_cache new_att_cache = new_att_cache.repeat_interleave(repeats=factor, dim=2)

对于cnn_cache，则直接padding到 kernel_size-1 即可（encoder.py）。

# shape(new_cnn_cache) = [1, batch, outdim, cache_t2]new_cnn_cache = new_cnn_cache.unsqueeze(0)# padding new_cnn_cache to cnn.lorder for casual convolutionnew_cnn_cache = F.pad( new_cnn_cache, (self.cnn_module_kernel - 1 - new_cnn_cache.size(3), 0))

由于Grouped MHSA实际为reshape操作，所以new_att_cache的记录需要在reshape操作之前进行，这样可以保证cache的维度与常规MHSA一致（attention.py）。

if cache.size(0) > 0: # use attention cache key_cache, value_cache = torch.split( cache, cache.size(-1) // 2, dim=-1) k = torch.cat([key_cache, k], dim=2) v = torch.cat([value_cache, v], dim=2)new_cache = torch.cat((k, v), dim=-1)

# May be k and p does not match. eg. time2=18+18/2=27 > mask=36/2=18if mask is not None and mask.size(2) > 0: time2 = mask.size(2) k = k[:, :, -time2:, :] v = v[:, :, -time2:, :]

# q k v p: (batch, head, time1, d_k)q, k, v, p, mask, padding_q = self.pad4group(q, k, v, p, mask, self.group_size)

forward_chunk接口会输入offset用于计算流式模式下的相对位置编码，由于Efficient Conformer会对时间维度下采样，导致输出与输入维度不匹配，因此通过y.size(1)计算的offset为下采样后的值，需要在使用时按照模型整体下采样倍数恢复（encoder.py）。

# using downsampling factor to recover offsetoffset *= self.calculate_downsampling_factor(self.num_blocks + 1)

实验结果

使用Efficient Conformer，只需要在配置文件中配置encoder参数即可。

encoder: efficientConformer

在自有场景下测试，Efficient Conformer获得了好于Conformer的效果，具体可参考文章[1]。

同时我们在AISHELL-1上进行了两版模型实验。V1为我们线上使用的结构，即在Encoder的前1/3处使用Strided Convolution下采样（共12层），前4层均为Grouped MHSA结构，同时cnn_module_kernel会在Strided Convolution之后缩减同样倍数（如15->7），前置下采样使用1/4 subsampling的conv2d。V1 large将output维度从256增加至512，同时cnn_module_kernel从15增加至31。

efficient_conf: stride_layer_idx: [3] # layer id with StrideConv stride: [2] # stride size of each StrideConv group_layer_idx: [0, 1, 2, 3] # layer id with GroupedAttention group_size: 3 # group size of every GroupedAttention layer stride_kernel: true # true: recompute cnn kernels with stride

V2符合原始Efficient Conformer论文结构，前置下采样为1/2 subsampling的conv2d2，在Encoder的1/3和2/3处做两次下采样，cnn_module_kernel固定不变。

efficient_conf: stride_layer_idx: [3, 7] # layer id with StrideConv stride: [2, 2] # stride size of each StrideConv group_layer_idx: [3, 7] # layer id with GroupedAttention group_size: 3 # group size of every GroupedAttention layer stride_kernel: false # true: recompute cnn kernels with stride

在不使用语言模型的情况下效果如下：

可见，在WeNet项目当前已合并的模型中（截止2023-1-9），Efficient Conformer在AISHELL-1公开数据集上目前最优的CER为4.56%，超过Conformer效果4.61%。

后续计划：

（1）进一步完善开源数据上的测试效果

（2）支持Efficient Conformer的ONNX导出，和GPU流式部署

参考文献

[1] 58同城：WeNet端到端语音识别大规模落地方案

[2] Efficient Conformer: https://arxiv.org/pdf/2109.01163.pdf

[3] WeNet Efficient Conformer PR：https://github.com/wenet-e2e/wenet/pull/1636

[4] Efficient Conformer Code: https://github.com/burchim/EfficientConformer

作者介绍：

周维，58同城TEG-AI Lab算法架构师，语音算法部负责人，负责语音识别、语音合成算法研发。

王亚如，58同城TEG-AI Lab语音算法部算法高级工程师，主要负责端到端语音识别算法研发。

58同城AI Lab部门简介

58同城AI Lab隶属TEG技术工程平台群，旨在推动AI技术在58同城的落地，打造AI中台能力，以提高前台业务人效、收入和用户体验。

推荐阅读

3人半年打造语音识别引擎——58同城语音识别自研之路

PPT+视频回放 | 语音技术在58同城的应用

人物 | 王焱：58同城流式语音识别引擎应用实践

流式和离线语音场景下VAD语音端点检测算法实践

沙龙干货 | 58同城语音识别技术的探索和实践

语音识别中的WFST和语言模型

语音识别中Chain Model的原理和实践

本文分享自微信公众号 - 58技术（architects_58）。
如有侵权，请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一起分享。

Commencement Address at Stanford University

　''You''ve got to find what you love,'' Jobs says

　　This is the text of the Commencement address by Steve Jobs, CEO of Apple Computer and of Pixar Animation Studios, delivered on June 12, 2005.

　　I am honored to be with you today at your commencement from one of the finest universities in the world. I never graduated from college. Truth be told, this is the closest I''ve ever gotten to a college graduation. Today I want to tell you three stories from my life. That''s it. No big deal. Just three stories.

　　The first story is about connecting the dots.

　　I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out?

　　It started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him?" They said: "Of course." My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school. She refused to sign the final adoption papers. She only relented a few months later when my parents promised that I would someday go to college.And 17 years later I did go to college. But I naively chose a college that was almost as expensive as Stanford, and all of my working-class parents'' savings were being spent on my college tuition. After six months, I couldn''t see the value in it. I had no idea what I wanted to do with my life and no idea how college was going to help me figure it out. And here I was spending all of the money my parents had saved their entire life. So I decided to drop out and trust that it would all work out OK. It was pretty scary at the time, but looking back it was one of the best decisions I ever made. The minute I dropped out I could stop taking the required classes that didn''t interest me, and begin dropping in on the ones that looked interesting。It wasn''t all romantic. I didn''t have a dorm room, so I slept on the floor in friends'' rooms, I returned coke bottles for the 5¢ deposits to buy food with, and I would walk the 7 miles across town every Sunday night to get one good meal a week at the Hare Krishna temple. I loved it. And much of what I stumbled into by following my curiosity and intuition turned out to be priceless later on. Let me give you one example:

　　Reed College at that time offered perhaps the best calligraphy instruction in the country. Throughout the campus every poster, every label on every drawer, was beautifully hand calligraphed. Because I had dropped out and didn''t have to take the normal classes, I decided to take a calligraphy class to learn how to do this. I learned about serif and san serif typefaces, about varying the amount of space between different letter combinations, about what makes great typography great. It was beautiful, historical, artistically subtle in a way that science can''t capture, and I found it fascinating.

　　None of this had even a hope of any practical application in my life. But ten years later, when we were designing the first Macintosh computer, it all came back to me. And we designed it all into the Mac. It was the first computer with beautiful typography. If I had never dropped in on that single course in college, the Mac would have never had multiple typefaces or proportionally spaced fonts. And since Windows just copied the Mac, its likely that no personal computer would have them. If I had never dropped out, I would have never dropped in on this calligraphy class, and personal computers might not have the wonderful typography that they do. Of course it was impossible to connect the dots looking forward when I was in college. But it was very, very clear looking backwards ten years later.

　　Again, you can''t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future. You have to trust in something — your gut, destiny, life, karma, whatever. This approach has never let me down, and it has made all the difference in my life.

　　My second story is about love and loss.

　　I was lucky — I found what I loved to do early in life. Woz and I started Apple in my parents garage when I was 20. We worked hard, and in 10 years Apple had grown from just the two of us in a garage into a $2 billion company with over 4000 employees. We had just released our finest creation — the Macintosh — a year earlier, and I had just turned 30. And then I got fired. How can you get fired from a company you started? Well, as Apple grew we hired someone who I thought was very talented to run the company with me, and for the first year or so things went well. But then our visions of the future began to diverge and eventually we had a falling out. When we did, our Board of Directors sided with him. So at 30 I was out. And very publicly out. What had been the focus of my entire adult life was gone, and it was devastating.I really didn''t know what to do for a few months. I felt that I had let the previous generation of entrepreneurs down - that I had dropped the baton as it was being passed to me. I met with David Packard and Bob Noyce and tried to apologize for screwing up so badly. I was a very public failure, and I even thought about running away from the valley. But something slowly began to dawn on me — I still loved what I did. The turn of events at Apple had not changed that one bit. I had been rejected, but I was still in love. And so I decided to start over.I didn''t see it then, but it turned out that getting fired from Apple was the best thing that could have ever happened to me. The heaviness of being successful was replaced by the lightness of being a beginner again, less sure about everything. It freed me to enter one of the most creative periods of my life.

　　During the next five years, I started a company named NeXT, another company named Pixar, and fell in love with an amazing woman who would become my wife. Pixar went on to create the worlds first computer animated feature film, Toy Story, and is now the most successful animation studio in the world. In a remarkable turn of events, Apple bought NeXT, I retuned to Apple, and the technology we developed at NeXT is at the heart of Apple''s current renaissance. And Laurene and I have a wonderful family together.I''m pretty sure none of this would have happened if I hadn''t been fired from Apple. It was awful tasting medicine, but I guess the patient needed it. Sometimes life hits you in the head with a brick. Don''t lose faith. I''m convinced that the only thing that kept me going was that I loved what I did. You''ve got to find what you love. And that is as true for your work as it is for your lovers. Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven''t found it yet, keep looking. Don''t settle. As with all matters of the heart, you''ll know when you find it. And, like any great relationship, it just gets better and better as the years roll on. So keep looking until you find it. Don''t settle.

　　My third story is about death.

　　When I was 17, I read a quote that went something like: "If you live each day as if it was your last, someday you''ll most certainly be right." It made an impression on me, and since then, for the past 33 years, I have looked in the mirror every morning and asked myself: "If today were the last day of my life, would I want to do what I am about to do today?" And whenever the answer has been "No" for too many days in a row, I know I need to change something.

　　Remembering that I''ll be dead soon is the most important tool I''ve ever encountered to help me make the big choices in life. Because almost everything — all external expectations, all pride, all fear of embarrassment or failure - these things just fall away in the face of death, leaving only what is truly important. Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose. You are already naked. There is no reason not to follow your heart.

　　About a year ago I was diagnosed with cancer. I had a scan at 7:30 in the morning, and it clearly showed a tumor on my pancreas. I didn''t even know what a pancreas was. The doctors told me this was almost certainly a type of cancer that is incurable, and that I should expect to live no longer than three to six months. My doctor advised me to go home and get my affairs in order, which is doctor''s code for prepare to die. It means to try to tell your kids everything you thought you''d have the next 10 years to tell them in just a few months. It means to make sure everything is buttoned up so that it will be as easy as possible for your family. It means to say your goodbyes.I lived with that diagnosis all day. Later that evening I had a biopsy, where they stuck an endoscope down my throat, through my stomach and into my intestines, put a needle into my pancreas and got a few cells from the tumor. I was sedated, but my wife, who was there, told me that when they viewed the cells under a microscope the doctors started crying because it turned out to be a very rare form of pancreatic cancer that is curable with surgery. I had the surgery and I''m fine now.This was the closest I''ve been to facing death, and I hope its the closest I get for a few more decades. Having lived through it, I can now say this to you with a bit more certainty than when death was a useful but purely intellectual concept:No one wants to die. Even people who want to go to heaven don''t want to die to get there. And yet death is the destination we all share. No one has ever escaped it. And that is as it should be, because Death is very likely the single best invention of Life. It is Life''s change agent. It clears out the old to make way for the new. Right now the new is you, but someday not too long from now, you will gradually become the old and be cleared away. Sorry to be so dramatic, but it is quite true.

　　Your time is limited, so don''t waste it living someone else''s life. Don''t be trapped by dogma — which is living with the results of other people''s thinking. Don''t let the noise of others'' opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary.

　　When I was young, there was an amazing publication called The Whole Earth Catalog, which was one of the bibles of my generation. It was created by a fellow named Stewart Brand not far from here in Menlo Park, and he brought it to life with his poetic touch. This was in the late 1960''s, before personal computers and desktop publishing, so it was all made with typewriters, scissors, and polaroid cameras. It was sort of like Google in paperback form, 35 years before Google came along: it was idealistic, and overflowing with neat tools and great notions.

　　Stewart and his team put out several issues of The Whole Earth Catalog, and then when it had run its course, they put out a final issue. It was the mid-1970s, and I was your age. On the back cover of their final issue was a photograph of an early morning country road, the kind you might find yourself hitchhiking on if you were so adventurous. Beneath it were the words: "Stay Hungry. Stay Foolish." It was their farewell message as they signed off. Stay Hungry. Stay Foolish. And I have always wished that for myself. And now, as you graduate to begin anew, I wish that for you. Stay Hungry. Stay Foolish.

Conformer模型的构建和特性

conformer模型的结构和特点

Conformer是一种基于自注意力机制的序列模型，它在语音识别、语言建模、机器翻译等任务中取得了出色的表现。与Transformer模型相似，Conformer模型结构也包含了多头自注意力层和前馈神经网络层。然而，Conformer在一些方面进行了改进，使得它更适用于序列建模任务。 Conformer模型的一个改进是引入了卷积神经网络层，用于捕捉局部上下文信息。这种结构的引入使得模型能够更好地处理序列中的局部特征，提高了模型的泛化能力。此外，Conformer还引入了一种新的位置编码方式，称为深度可分离卷积位置编码。相比于传统的位置编码方式，深度可分离卷积位置编码可以更好地捕捉序列中的位置信息，提高了模型对序列顺序的建模能力。总之，

基本结构

Conformer模型的基本结构由多个Conformer Block组成。每个Conformer Block包含两个子模块：多头自注意力模块和卷积模块。多头自注意力模块用于捕捉序列中不同位置之间的交互信息，通过计算注意力权重来加强重要位置的表示。而卷积模块则用于对序列进行局部特征提取，通过卷积操作来捕捉局部上下文信息。这两个子模块相互结合，使得Conformer模型能够同时考虑全局和局部信息，从而有效地建模序列数据。

多头自注意力模块通过改进Transformer模型的注意力机制实现，具体改进包括相对位置编码和位置无关的信息交互方式。相对位置编码能够更好地处理序列中的位置信息，而位置无关的信息交互方式则适用于长序列的处理。这些改进使得多头自注意力模块在处理序列数据时具有更好的性能和效果。

卷积模块由深度可分离卷积层和残差连接组成，既减少了参数数量，又加速了训练和推理。残差连接缓解模型退化问题，加快收敛速度。

特点

与传统的序列模型相比，Conformer模型具有以下特点：

1.更好的序列建模能力

Conformer模型采用了多头自注意力机制，可以更好地捕捉序列中不同位置之间的交互信息。同时，它还采用了卷积模块，可以更好地进行局部特征提取。这些特点使得Conformer模型在序列建模任务中具有更好的性能。

2.更高的模型效率

Conformer模型采用了深度可分离卷积层和残差连接，可以有效地减少模型参数数量，并加速模型训练和推理过程。这些特点使得Conformer模型在实际应用中具有更高的效率。

3.更好的泛化能力

Conformer模型采用了相对位置编码和位置无关的信息交互方式，可以更好地处理长序列，并具有更好的泛化能力。这些特点使得Conformer模型在应对复杂任务时具有更好的适应性。

以上就是Conformer模型的构建和特性的详细内容，更多请关注php中文网其它相关文章！

今天关于使用定制的NER模型的Stanford OpenIE和nerlove模型的讲解已经结束，谢谢您的阅读，如果想了解更多关于2013 Stanford公开课 Developing iOS 7 Apps for iPhone and iPad 讲义分享、58同城AI Lab在WeNet中开源Efficient Conformer模型、Commencement Address at Stanford University、Conformer模型的构建和特性的相关知识，请在本站搜索。

本文标签：