GLUE 数据集合的介绍:

  • 自然语言处理(NLP)主要自然语言理解(NLU)和自然语言生成(NLG)。GLUE(General Language Understanding Evaluation)由纽约大学,华盛顿大学,Google 联合推出,涵盖不同 NLP 任务类型,截止至 2020 年 1 月其中包括 11 个子任务数据集,成为衡量 NLP 研究发展的衡量标准.
  • GLUE 九项任务涉及到自然语言推断、文本蕴含、情感分析、语义相似等多个任务。像 BERT、XLNet、RoBERTa、ERINE、T5 等知名模型都会在此基准上进行测试。

文章目录


GLUE 数据集合包含以下数据集

  • CoLA 数据集
  • SST-2 数据集
  • MRPC 数据集
  • STS-B 数据集
  • QQP 数据集
  • MNLI 数据集
  • SNLI 数据集
  • QNLI 数据集
  • RTE 数据集
  • WNLI 数据集
  • diagnostics 数据集 (官方未完善)

  • GLUE 数据集合的下载方式:

下载脚本代码:https://gluebenchmark.com/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
''' Script for downloading all GLUE data.'''
import os
import sys
import shutil
import argparse
import tempfile
import urllib.request
import zipfile

TASKS = ["CoLA", "SST", "MRPC", "QQP", "STS", "MNLI", "SNLI", "QNLI", "RTE", "WNLI", "diagnostic"]
TASK2PATH = {"CoLA":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FCoLA.zip?alt=media&token=46d5e637-3411-4188-bc44-5809b5bfb5f4',
"SST":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FSST-2.zip?alt=media&token=aabc5f6b-e466-44a2-b9b4-cf6337f84ac8',
"MRPC":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2Fmrpc_dev_ids.tsv?alt=media&token=ec5c0836-31d5-48f4-b431-7480817f1adc',
"QQP":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FQQP.zip?alt=media&token=700c6acf-160d-4d89-81d1-de4191d02cb5',
"STS":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FSTS-B.zip?alt=media&token=bddb94a7-8706-4e0d-a694-1109e12273b5',
"MNLI":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FMNLI.zip?alt=media&token=50329ea1-e339-40e2-809c-10c40afff3ce',
"SNLI":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FSNLI.zip?alt=media&token=4afcfbb2-ff0c-4b2d-a09a-dbf07926f4df',
"QNLI": 'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FQNLIv2.zip?alt=media&token=6fdcf570-0fc5-4631-8456-9505272d1601',
"RTE":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FRTE.zip?alt=media&token=5efa7e85-a0bb-4f19-8ea2-9e1840f077fb',
"WNLI":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FWNLI.zip?alt=media&token=068ad0a0-ded7-4bd7-99a5-5e00222e0faf',
"diagnostic":'https://storage.googleapis.com/mtl-sentence-representations.appspot.com/tsvsWithoutLabels%2FAX.tsv?GoogleAccessId=firebase-adminsdk-0khhl@mtl-sentence-representations.iam.gserviceaccount.com&Expires=2498860800&Signature=DuQ2CSPt2Yfre0C%2BiISrVYrIFaZH1Lc7hBVZDD4ZyR7fZYOMNOUGpi8QxBmTNOrNPjR3z1cggo7WXFfrgECP6FBJSsURv8Ybrue8Ypt%2FTPxbuJ0Xc2FhDi%2BarnecCBFO77RSbfuz%2Bs95hRrYhTnByqu3U%2FYZPaj3tZt5QdfpH2IUROY8LiBXoXS46LE%2FgOQc%2FKN%2BA9SoscRDYsnxHfG0IjXGwHN%2Bf88q6hOmAxeNPx6moDulUF6XMUAaXCSFU%2BnRO2RDL9CapWxj%2BDl7syNyHhB7987hZ80B%2FwFkQ3MEs8auvt5XW1%2Bd4aCU7ytgM69r8JDCwibfhZxpaa4gd50QXQ%3D%3D'}

MRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt'
MRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'

def download_and_extract(task, data_dir):
print("Downloading and extracting %s..." % task)
data_file = "%s.zip" % task
urllib.request.urlretrieve(TASK2PATH[task], data_file)
with zipfile.ZipFile(data_file) as zip_ref:
zip_ref.extractall(data_dir)
os.remove(data_file)
print("\tCompleted!")

def format_mrpc(data_dir, path_to_data):
print("Processing MRPC...")
mrpc_dir = os.path.join(data_dir, "MRPC")
if not os.path.isdir(mrpc_dir):
os.mkdir(mrpc_dir)
if path_to_data:
mrpc_train_file = os.path.join(path_to_data, "msr_paraphrase_train.txt")
mrpc_test_file = os.path.join(path_to_data, "msr_paraphrase_test.txt")
else:
print("Local MRPC data not specified, downloading data from %s" % MRPC_TRAIN)
mrpc_train_file = os.path.join(mrpc_dir, "msr_paraphrase_train.txt")
mrpc_test_file = os.path.join(mrpc_dir, "msr_paraphrase_test.txt")
urllib.request.urlretrieve(MRPC_TRAIN, mrpc_train_file)
urllib.request.urlretrieve(MRPC_TEST, mrpc_test_file)
assert os.path.isfile(mrpc_train_file), "Train data not found at %s" % mrpc_train_file
assert os.path.isfile(mrpc_test_file), "Test data not found at %s" % mrpc_test_file
urllib.request.urlretrieve(TASK2PATH["MRPC"], os.path.join(mrpc_dir, "dev_ids.tsv"))

dev_ids = []
with open(os.path.join(mrpc_dir, "dev_ids.tsv"), encoding="utf8") as ids_fh:
for row in ids_fh:
dev_ids.append(row.strip().split('\t'))

with open(mrpc_train_file, encoding="utf8") as data_fh, \
open(os.path.join(mrpc_dir, "train.tsv"), 'w', encoding="utf8") as train_fh, \
open(os.path.join(mrpc_dir, "dev.tsv"), 'w', encoding="utf8") as dev_fh:
header = data_fh.readline()
train_fh.write(header)
dev_fh.write(header)
for row in data_fh:
label, id1, id2, s1, s2 = row.strip().split('\t')
if [id1, id2] in dev_ids:
dev_fh.write("%s\t%s\t%s\t%s\t%s\n" % (label, id1, id2, s1, s2))
else:
train_fh.write("%s\t%s\t%s\t%s\t%s\n" % (label, id1, id2, s1, s2))

with open(mrpc_test_file, encoding="utf8") as data_fh, \
open(os.path.join(mrpc_dir, "test.tsv"), 'w', encoding="utf8") as test_fh:
header = data_fh.readline()
test_fh.write("index\t#1 ID\t#2 ID\t#1 String\t#2 String\n")
for idx, row in enumerate(data_fh):
label, id1, id2, s1, s2 = row.strip().split('\t')
test_fh.write("%d\t%s\t%s\t%s\t%s\n" % (idx, id1, id2, s1, s2))
print("\tCompleted!")

def download_diagnostic(data_dir):
print("Downloading and extracting diagnostic...")
if not os.path.isdir(os.path.join(data_dir, "diagnostic")):
os.mkdir(os.path.join(data_dir, "diagnostic"))
data_file = os.path.join(data_dir, "diagnostic", "diagnostic.tsv")
urllib.request.urlretrieve(TASK2PATH["diagnostic"], data_file)
print("\tCompleted!")
return

def get_tasks(task_names):
task_names = task_names.split(',')
if "all" in task_names:
tasks = TASKS
else:
tasks = []
for task_name in task_names:
assert task_name in TASKS, "Task %s not found!" % task_name
tasks.append(task_name)
return tasks

def main(arguments):
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', help='directory to save data to', type=str, default='glue_data')
parser.add_argument('--tasks', help='tasks to download data for as a comma separated string',
type=str, default='all')
parser.add_argument('--path_to_mrpc', help='path to directory containing extracted MRPC data, msr_paraphrase_train.txt and msr_paraphrase_text.txt',
type=str, default='')
args = parser.parse_args(arguments)

if not os.path.isdir(args.data_dir):
os.mkdir(args.data_dir)
tasks = get_tasks(args.tasks)

for task in tasks:
if task == 'MRPC':
format_mrpc(args.data_dir, args.path_to_mrpc)
elif task == 'diagnostic':
download_diagnostic(args.data_dir)
else:
download_and_extract(task, args.data_dir)


if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))

运行脚本下载所有数据集:

1
2
3
# 假设你已经将以上代码copy到download_glue_data.py文件中
# 运行这个python脚本, 你将同目录下得到一个glue文件夹
python download_glue_data.py

输出效果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Downloading and extracting CoLA...
Completed!
Downloading and extracting SST...
Completed!
Processing MRPC...
Local MRPC data not specified, downloading data from https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt
Completed!
Downloading and extracting QQP...
Completed!
Downloading and extracting STS...
Completed!
Downloading and extracting MNLI...
Completed!
Downloading and extracting SNLI...
Completed!
Downloading and extracting QNLI...
Completed!
Downloading and extracting RTE...
Completed!
Downloading and extracting WNLI...
Completed!
Downloading and extracting diagnostic...
Completed!

GLUE 数据集合

1 CoLA 数据集

CoLA (The Corpus of Linguistic Acceptability,语言可接受性语料库),单句子分类任务,语料来自语言理论的书籍和期刊,每个句子被标注为是否合乎语法的单词序列。本任务是一个二分类任务,标签共两个,分别是 0 和 1,其中 0 表示不合乎语法,1 表示合乎语法。

样本个数:训练集 8, 551 个,开发集 1, 043 个,测试集 1, 063 个。

任务:可接受程度,合乎语法与不合乎语法二分类。

文件样式

1
2
3
4
5
- CoLA/
- dev.tsv
- original/
- test.tsv
- train.tsv

文件样式说明:

  • 在使用中常用到的文件是 train.tsv, dev.tsv, test.tsv, 分别代表训练集,验证集和测试集。其中 train.tsv 与 dev.tsv 数据样式相同,都是带有标签的数据,其中 test.tsv 是不带有标签的数据.

train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
12
...
gj04 1 She coughed herself awake as the leaf landed on her nose.
gj04 1 The worm wriggled onto the carpet.
gj04 1 The chocolate melted onto the carpet.
gj04 0 * The ball wriggled itself loose.
gj04 1 Bill wriggled himself loose.
bc01 1 The sinking of the ship to collect the insurance was very devious.
bc01 1 The ship's sinking was very devious.
bc01 0 * The ship's sinking to collect the insurance was very devious.
bc01 1 The testing of such drugs on oneself is too risky.
bc01 0 * This drug's testing on oneself is too risky.
...

train.tsv 数据样式说明:

  • train.tsv 中的数据内容共分为 4 列,第一列数据,如 gj04, bc01 等代表每条文本数据的来源即出版物代号;第二列数据,0 或 1, 代表每条文本数据的语法是否正确,0 代表不正确,1 代表正确;第三列数据,'’, 是作者最初的正负样本标记,与第二列意义相同,'' 表示不正确;第四列即是被标注的语法使用是否正确的文本句子.

test.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
12
index   sentence
0 Bill whistled past the house.
1 The car honked its way down the road.
2 Bill pushed Harry off the sofa.
3 the kittens yawned awake and played.
4 I demand that the more John eats, the more he pay.
5 If John eats more, keep your mouth shut tighter, OK?
6 His expectations are always lower than mine are.
7 The sooner you call, the more carefully I will word the letter.
8 The more timid he feels, the more people he interviews without asking questions of.
9 Once Janet left, Fred became a lot crazier.
...

test.tsv 数据样式说明:

  • test.tsv 中的数据内容共分为 2 列,第一列数据代表每条文本数据的索引;第二列数据代表用于测试的句子.

CoLA 数据集的任务类型:

  • 二分类任务
  • 评估指标为: MCC (马修斯相关系数,在正负样本分布十分不均衡的情况下使用的二分类评估指标)

2 SST-2 数据集

SST-2 (The Stanford Sentiment Treebank,斯坦福情感树库),单句子分类任务,包含电影评论中的句子和它们情感的人类注释。这项任务是给定句子的情感,类别分为两类正面情感(positive,样本标签对应为 1)和负面情感(negative,样本标签对应为 0),并且只用句子级别的标签。也就是,本任务也是一个二分类任务,针对句子级别,分为正面和负面情感。

样本个数:训练集 67, 350 个,开发集 873 个,测试集 1, 821 个。

任务:情感分类,正面情感和负面情感二分类。

评价准则:accuracy。

文件样式

1
2
3
4
5
- SST-2/
- dev.tsv
- original/
- test.tsv
- train.tsv

文件样式说明:

  • 在使用中常用到的文件是 train.tsv, dev.tsv, test.tsv, 分别代表训练集,验证集和测试集。其中 train.tsv 与 dev.tsv 数据样式相同,都是带有标签的数据,其中 test.tsv 是不带有标签的数据.

train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
sentence    label
hide new secretions from the parental units 0
contains no wit , only labored gags 0
that loves its characters and communicates something rather beautiful about human nature 1
remains utterly satisfied to remain the same throughout 0
on the worst revenge-of-the-nerds clichés the filmmakers could dredge up 0
that 's far too tragic to merit such superficial treatment 0
demonstrates that the director of such hollywood blockbusters as patriot games can still turn out a small , personal film with an emotional wallop . 1
of saucy 1
a depressed fifteen-year-old 's suicidal poetry 0
...

train.tsv 数据样式说明:

  • train.tsv 中的数据内容共分为 2 列,第一列数据代表具有感情色彩的评论文本;第二列数据,0 或 1, 代表每条文本数据是积极或者消极的评论,0 代表消极,1 代表积极.

test.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
12
index   sentence
0 uneasy mishmash of styles and genres .
1 this film 's relationship to actual tension is the same as what christmas-tree flocking in a spray can is to actual snow : a poor -- if durable -- imitation .
2 by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .
3 director rob marshall went out gunning to make a great one .
4 lathan and diggs have considerable personal charm , and their screen rapport makes the old story seem new .
5 a well-made and often lovely depiction of the mysteries of friendship .
6 none of this violates the letter of behan 's book , but missing is its spirit , its ribald , full-throated humor .
7 although it bangs a very cliched drum at times , this crowd-pleaser 's fresh dialogue , energetic music , and good-natured spunk are often infectious .
8 it is not a mass-market entertainment but an uncompromising attempt by one artist to think about another .
9 this is junk food cinema at its greasiest .
...

test.tsv 数据样式说明: * test.tsv 中的数据内容共分为 2 列,第一列数据代表每条文本数据的索引;第二列数据代表用于测试的句子.


SST-2 数据集的任务类型:

  • 二分类任务
  • 评估指标为: ACC

3 MRPC 数据集

MRPC (The Microsoft Research Paraphrase Corpus,微软研究院释义语料库),相似性和释义任务,是从在线新闻源中自动抽取句子对语料库,并人工注释句子对中的句子是否在语义上等效。类别并不平衡,其中 68% 的正样本,所以遵循常规的做法,报告准确率(accuracy)和 F1 值。

样本个数:训练集 3, 668 个,开发集 408 个,测试集 1, 725 个。

任务:是否释义二分类,是释义,不是释义两类。

评价准则:准确率(accuracy)和 F1 值。

文件样式

1
2
3
4
5
6
7
- MRPC/
- dev.tsv
- test.tsv
- train.tsv
- dev_ids.tsv
- msr_paraphrase_test.txt
- msr_paraphrase_train.txt

文件样式说明:

  • 在使用中常用到的文件是 train.tsv, dev.tsv, test.tsv, 分别代表训练集,验证集和测试集。其中 train.tsv 与 dev.tsv 数据样式相同,都是带有标签的数据,其中 test.tsv 是不带有标签的数据.

train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
Quality #1 ID   #2 ID   #1 String   #2 String
1 702876 702977 Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence . Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .
0 2108705 2108831 Yucaipa owned Dominick 's before selling the chain to Safeway in 1998 for $ 2.5 billion . Yucaipa bought Dominick 's in 1995 for $ 693 million and sold it to Safeway for $ 1.8 billion in 1998 .
1 1330381 1330521 They had published an advertisement on the Internet on June 10 , offering the cargo for sale , he added . On June 10 , the ship 's owners had published an advertisement on the Internet , offering the explosives for sale .
0 3344667 3344648 Around 0335 GMT , Tab shares were up 19 cents , or 4.4 % , at A $ 4.56 , having earlier set a record high of A $ 4.57 . Tab shares jumped 20 cents , or 4.6 % , to set a record closing high at A $ 4.57 .
1 1236820 1236712 The stock rose $ 2.11 , or about 11 percent , to close Friday at $ 21.51 on the New York Stock Exchange . PG & E Corp. shares jumped $ 1.63 or 8 percent to $ 21.03 on the New York Stock Exchange on Friday .
1 738533 737951 Revenue in the first quarter of the year dropped 15 percent from the same period a year earlier . With the scandal hanging over Stewart 's company , revenue the first quarter of the year dropped 15 percent from the same period a year earlier .
0 264589 264502 The Nasdaq had a weekly gain of 17.27 , or 1.2 percent , closing at 1,520.15 on Friday . The tech-laced Nasdaq Composite .IXIC rallied 30.46 points , or 2.04 percent , to 1,520.15 .
1 579975 579810 The DVD-CCA then appealed to the state Supreme Court . The DVD CCA appealed that decision to the U.S. Supreme Court .
...

train.tsv 数据样式说明:

  • train.tsv 中的数据内容共分为 5 列,第一列数据,0 或 1, 代表每对句子是否具有相同的含义,0 代表含义不相同,1 代表含义相同。第二列和第三列分别代表每对句子的 id, 第四列和第五列分别具有相同 / 不同含义的句子对.

test.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
index   #1 ID   #2 ID   #1 String   #2 String
0 1089874 1089925 PCCW 's chief operating officer , Mike Butcher , and Alex Arena , the chief financial officer , will report directly to Mr So . Current Chief Operating Officer Mike Butcher and Group Chief Financial Officer Alex Arena will report to So .
1 3019446 3019327 The world 's two largest automakers said their U.S. sales declined more than predicted last month as a late summer sales frenzy caused more of an industry backlash than expected . Domestic sales at both GM and No. 2 Ford Motor Co. declined more than predicted as a late summer sales frenzy prompted a larger-than-expected industry backlash .
2 1945605 1945824 According to the federal Centers for Disease Control and Prevention ( news - web sites ) , there were 19 reported cases of measles in the United States in 2002 . The Centers for Disease Control and Prevention said there were 19 reported cases of measles in the United States in 2002 .
3 1430402 1430329 A tropical storm rapidly developed in the Gulf of Mexico Sunday and was expected to hit somewhere along the Texas or Louisiana coasts by Monday night . A tropical storm rapidly developed in the Gulf of Mexico on Sunday and could have hurricane-force winds when it hits land somewhere along the Louisiana coast Monday night .
4 3354381 3354396 The company didn 't detail the costs of the replacement and repairs . But company officials expect the costs of the replacement work to run into the millions of dollars .
5 1390995 1391183 The settling companies would also assign their possible claims against the underwriters to the investor plaintiffs , he added . Under the agreement , the settling companies will also assign their potential claims against the underwriters to the investors , he added .
6 2201401 2201285 Air Commodore Quaife said the Hornets remained on three-minute alert throughout the operation . Air Commodore John Quaife said the security operation was unprecedented .
7 2453843 2453998 A Washington County man may have the countys first human case of West Nile virus , the health department said Friday . The countys first and only human case of West Nile this year was confirmed by health officials on Sept . 8 .
...

test.tsv 数据样式说明: * test.tsv 中的数据内容共分为 5 列,第一列数据代表每条文本数据的索引;其余列的含义与 train.tsv 中相同.


MRPC 数据集的任务类型:

  • 句子对二分类任务
  • 评估指标为: ACC 和 F1

4 STS-B 数据集

STSB (The Semantic Textual Similarity Benchmark,语义文本相似性基准测试),相似性和释义任务,是从新闻标题、视频标题、图像标题以及自然语言推断数据中提取的句子对的集合,每对都是由人类注释的,其相似性评分为 0-5 (大于等于 0 且小于等于 5 的浮点数,原始 paper 里写的是 1-5,可能是作者失误)。任务就是预测这些相似性得分,本质上是一个回归问题,但是依然可以用分类的方法,可以归类为句子对的文本五分类任务。

样本个数:训练集 5, 749 个,开发集 1, 379 个,测试集 1, 377 个。

任务:回归任务,预测为 1-5 之间的相似性得分的浮点数。但是依然可以使用分类的方法,作为五分类。

评价准则:Pearson and Spearman correlation coefficients。

文件样式

1
2
3
4
5
6
7
- STS-B/
- dev.tsv
- test.tsv
- train.tsv
- LICENSE.txt
- readme.txt
- original/

文件样式说明:

  • 在使用中常用到的文件是 train.tsv, dev.tsv, test.tsv, 分别代表训练集,验证集和测试集。其中 train.tsv 与 dev.tsv 数据样式相同,都是带有标签的数据,其中 test.tsv 是不带有标签的数据.

train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
12
index   genre   filename    year    old_index   source1 source2 sentence1   sentence2   score
0 main-captions MSRvid 2012test 0001 none none A plane is taking off. An air plane is taking off. 5.000
1 main-captions MSRvid 2012test 0004 none none A man is playing a large flute. A man is playing a flute. 3.800
2 main-captions MSRvid 2012test 0005 none none A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 3.800
3 main-captions MSRvid 2012test 0006 none none Three men are playing chess.Two men are playing chess. 2.600
4 main-captions MSRvid 2012test 0009 none none A man is playing the cello.A man seated is playing the cello. 4.250
5 main-captions MSRvid 2012test 0011 none none Some men are fighting. Two men are fighting. 4.250
6 main-captions MSRvid 2012test 0012 none none A man is smoking. A man is skating. 0.500
7 main-captions MSRvid 2012test 0013 none none The man is playing the piano. The man is playing the guitar. 1.600
8 main-captions MSRvid 2012test 0014 none none A man is playing on a guitar and singing. A woman is playing an acoustic guitar and singing. 2.200
9 main-captions MSRvid 2012test 0016 none none A person is throwing a cat on to the ceiling. A person throws a cat on the ceiling. 5.000
...

train.tsv 数据样式说明:

  • train.tsv 中的数据内容共分为 10 列,第一列数据是数据索引;第二列代表每对句子的来源,如 main-captions 表示来自字幕;第三列代表来源的具体保存文件名,第四列代表出现时间 (年); 第五列代表原始数据的索引;第六列和第七列分别代表句子对原始来源;第八列和第九列代表相似程度不同的句子对;第十列代表句子对的相似程度由低到高,值域范围是 [0, 5].

test.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
12
13
index   genre   filename    year    old_index   source1 source2 sentence1   sentence2
0 main-captions MSRvid 2012test 0024 none none A girl is styling her hair. A girl is brushing her hair.
1 main-captions MSRvid 2012test 0033 none none A group of men play soccer on the beach. A group of boys are playing soccer on the beach.
2 main-captions MSRvid 2012test 0045 none none One woman is measuring another woman's ankle. A woman measures another woman's ankle.
3 main-captions MSRvid 2012test 0063 none none A man is cutting up a cucumber. A man is slicing a cucumber.
4 main-captions MSRvid 2012test 0066 none none A man is playing a harp. A man is playing a keyboard.
5 main-captions MSRvid 2012test 0074 none none A woman is cutting onions. A woman is cutting tofu.
6 main-captions MSRvid 2012test 0076 none none A man is riding an electric bicycle. A man is riding a bicycle.
7 main-captions MSRvid 2012test 0082 none none A man is playing the drums. A man is playing the guitar.
8 main-captions MSRvid 2012test 0092 none none A man is playing guitar. A lady is playing the guitar.
9 main-captions MSRvid 2012test 0095 none none A man is playing a guitar. A man is playing a trumpet.
10 main-captions MSRvid 2012test 0096 none none A man is playing a guitar. A man is playing a trumpet.
...

test.tsv 数据样式说明:

  • test.tsv 中的数据内容共分为 9 列,含义与 train.tsv 前 9 列相同.

STS-B 数据集的任务类型:

  • 句子对多分类任务 / 句子对回归任务
  • 评估指标为: Pearson-Spearman Corr

5 QQP 数据集

QQP (The Quora Question Pairs, Quora 问题对数集),相似性和释义任务,是社区问答网站 Quora 中问题对的集合。任务是确定一对问题在语义上是否等效。与 MRPC 一样,QQP 也是正负样本不均衡的,不同是的 QQP 负样本占 63%,正样本是 37%,所以我们也是报告准确率和 F1 值。我们使用标准测试集,为此我们从作者那里获得了专用标签。我们观察到测试集与训练集分布不同。

样本个数:训练集 363, 870 个,开发集 40, 431 个,测试集 390, 965 个。

任务:判定句子对是否等效,等效、不等效两种情况,二分类任务。

评价准则:准确率(accuracy)和 F1 值。

文件样式

1
2
3
4
5
- QQP/
- dev.tsv
- original/
- test.tsv
- train.tsv

文件样式说明:

  • 在使用中常用到的文件是 train.tsv, dev.tsv, test.tsv, 分别代表训练集,验证集和测试集。其中 train.tsv 与 dev.tsv 数据样式相同,都是带有标签的数据,其中 test.tsv 是不带有标签的数据.

train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
id  qid1    qid2    question1   question2   is_duplicate
133273 213221 213222 How is the life of a math student? Could you describe your own experiences?Which level of prepration is enough for the exam jlpt5? 0
402555 536040 536041 How do I control my horny emotions? How do you control your horniness? 1
360472 364011 490273 What causes stool color to change to yellow? What can cause stool to come out as little balls? 0
150662 155721 7256 What can one do after MBBS? What do i do after my MBBS ? 1
183004 279958 279959 Where can I find a power outlet for my laptop at Melbourne Airport? Would a second airport in Sydney, Australia be needed if a high-speed rail link was created between Melbourne and Sydney? 0
119056 193387 193388 How not to feel guilty since I am Muslim and I'm conscious we won't have sex together? I don't beleive I am bulimic, but I force throw up atleast once a day after I eat something and feel guilty. Should I tell somebody, and if so who? 0
356863 422862 96457 How is air traffic controlled? How do you become an air traffic controller?0
106969 147570 787 What is the best self help book you have read? Why? How did it change your life? What are the top self help books I should read? 1
...

train.tsv 数据样式说明:

  • train.tsv 中的数据内容共分为 6 列,第一列代表文本数据索引;第二列和第三列数据分别代表问题 1 和问题 2 的 id; 第四列和第五列代表需要进行’是否重复’判定的句子对;第六列代表上述问题是 / 不是重复性问题的标签,0 代表不重复,1 代表重复.

test.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
id  question1   question2
0 Would the idea of Trump and Putin in bed together scare you, given the geopolitical implications? Do you think that if Donald Trump were elected President, he would be able to restore relations with Putin and Russia as he said he could, based on the rocky relationship Putin had with Obama and Bush?
1 What are the top ten Consumer-to-Consumer E-commerce online? What are the top ten Consumer-to-Business E-commerce online?
2 Why don't people simply 'Google' instead of asking questions on Quora? Why do people ask Quora questions instead of just searching google?
3 Is it safe to invest in social trade biz? Is social trade geniune?
4 If the universe is expanding then does matter also expand? If universe and space is expanding? Does that mean anything that occupies space is also expanding?
5 What is the plural of hypothesis? What is the plural of thesis?
6 What is the application form you need for launching a company? What is the application form you need for launching a company in Austria?
7 What is Big Theta? When should I use Big Theta as opposed to big O? Is O(Log n) close to O(n) or O(1)?
8 What are the health implications of accidentally eating a small quantity of aluminium foil?What are the implications of not eating vegetables?
...

test.tsv 数据样式说明:

  • test.tsv 中的数据内容共分为 3 列,第一列数据代表每条文本数据的索引;第二列和第三列数据代表用于测试的问题句子对.

QQP 数据集的任务类型:

  • 句子对二分类任务
  • 评估指标为: ACC/F1

6 (MNLI/SNLI) 数据集

MNLI (The Multi-Genre Natural Language Inference Corpus, 多类型自然语言推理数据库),自然语言推断任务,是通过众包方式对句子对进行文本蕴含标注的集合。给定前提(premise)语句和假设(hypothesis)语句,任务是预测前提语句是否包含假设(蕴含,entailment),与假设矛盾(矛盾,contradiction)或者两者都不(中立,neutral)。前提语句是从数十种不同来源收集的,包括转录的语音,小说和政府报告。

样本个数:训练集 392, 702 个,开发集 dev-matched 9, 815 个,开发集 dev-mismatched9, 832 个,测试集 test-matched 9, 796 个,测试集 test-dismatched9, 847 个。因为 MNLI 是集合了许多不同领域风格的文本,所以又分为了 matched 和 mismatched 两个版本的数据集,matched 指的是训练集和测试集的数据来源一致,mismached 指的是训练集和测试集来源不一致。

任务:句子对,一个前提,一个是假设。前提和假设的关系有三种情况:蕴含(entailment),矛盾(contradiction),中立(neutral)。句子对三分类问题。

评价准则:matched accuracy/mismatched accuracy。

文件样式

1
2
3
4
5
6
7
- (MNLI/SNLI)/
- dev_matched.tsv
- dev_mismatched.tsv
- original/
- test_matched.tsv
- test_mismatched.tsv
- train.tsv

文件样式说明:

  • 在使用中常用到的文件是 train.tsv, dev_matched.tsv, dev_mismatched.tsv, test_matched.tsv, test_mismatched.tsv 分别代表训练集,与训练集一同采集的验证集,与训练集不是一同采集验证集,与训练集一同采集的测试集,与训练集不是一同采集测试集。其中 train.tsv 与 dev_matched.tsv 和 dev_mismatched.tsv 数据样式相同,都是带有标签的数据,其中 test_matched.tsv 与 test_mismatched.tsv 数据样式相同,都是不带有标签的数据.

train.tsv 数据样式:

1
2
3
4
5
6
index   promptID    pairID  genre   sentence1_binary_parse  sentence2_binary_parse  sentence1_parse sentence2_parse sentence1   sentence2   label1  gold_label
0 31193 31193n government ( ( Conceptually ( cream skimming ) ) ( ( has ( ( ( two ( basic dimensions ) ) - ) ( ( product and ) geography ) ) ) . ) ) ( ( ( Product and ) geography ) ( ( are ( what ( make ( cream ( skimming work ) ) ) ) ) . ) ) (ROOT (S (NP (JJ Conceptually) (NN cream) (NN skimming)) (VP (VBZ has) (NP (NP (CD two) (JJ basic) (NNS dimensions)) (: -) (NP (NN product) (CC and) (NN geography)))) (. .))) (ROOT (S (NP (NN Product) (CC and) (NN geography)) (VP (VBP are) (SBAR (WHNP (WP what)) (S (VP (VBP make) (NP (NP (NN cream)) (VP (VBG skimming) (NP (NN work)))))))) (. .))) Conceptually cream skimming has two basic dimensions - product and geography. Product and geography are what make cream skimming work. neutral neutral
1 101457 101457e telephone ( you ( ( know ( during ( ( ( the season ) and ) ( i guess ) ) ) ) ( at ( at ( ( your level ) ( uh ( you ( ( ( lose them ) ( to ( the ( next level ) ) ) ) ( if ( ( if ( they ( decide ( to ( recall ( the ( the ( parent team ) ) ) ) ) ) ) ) ( ( the Braves ) ( decide ( to ( call ( to ( ( recall ( a guy ) ) ( from ( ( triple A ) ( ( ( then ( ( a ( double ( A guy ) ) ) ( ( goes up ) ( to ( replace him ) ) ) ) ) and ) ( ( a ( single ( A guy ) ) ) ( ( goes up ) ( to ( replace him ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ( You ( ( ( ( lose ( the things ) ) ( to ( the ( following level ) ) ) ) ( if ( ( the people ) recall ) ) ) . ) ) (ROOT (S (NP (PRP you)) (VP (VBP know) (PP (IN during) (NP (NP (DT the) (NN season)) (CC and) (NP (FW i) (FW guess)))) (PP (IN at) (IN at) (NP (NP (PRP$ your) (NN level)) (SBAR (S (INTJ (UH uh)) (NP (PRP you)) (VP (VBP lose) (NP (PRP them)) (PP (TO to) (NP (DT the) (JJ next) (NN level))) (SBAR (IN if) (S (SBAR (IN if) (S (NP (PRP they)) (VP (VBP decide) (S (VP (TO to) (VP (VB recall) (NP (DT the) (DT the) (NN parent) (NN team)))))))) (NP (DT the) (NNPS Braves)) (VP (VBP decide) (S (VP (TO to) (VP (VB call) (S (VP (TO to) (VP (VB recall) (NP (DT a) (NN guy)) (PP (IN from) (NP (NP (RB triple) (DT A)) (SBAR (S (S (ADVP (RB then)) (NP (DT a) (JJ double) (NNP A) (NN guy)) (VP (VBZ goes) (PRT (RP up)) (S (VP (TO to) (VP (VB replace) (NP (PRP him))))))) (CC and) (S (NP (DT a) (JJ single) (NNP A) (NN guy)) (VP (VBZ goes) (PRT (RP up)) (S (VP (TO to) (VP (VB replace) (NP (PRP him)))))))))))))))))))))))))))) (ROOT (S (NP (PRP You)) (VP (VBP lose) (NP (DT the) (NNS things)) (PP (TO to) (NP (DT the) (JJ following) (NN level))) (SBAR (IN if) (S (NP (DT the) (NNS people)) (VP (VBP recall))))) (. .))) you know during the season and i guess at at your level uh you lose them to the next level if if they decide to recall the the parent team the Braves decide to call to recall a guy from triple A then a double A guy goes up to replace him and a single A guy goes up to replace him You lose the things to the following level if the people recall. entailment entailment
2 134793 134793e fiction ( ( One ( of ( our number ) ) ) ( ( will ( ( ( carry out ) ( your instructions ) ) minutely ) ) . ) ) ( ( ( A member ) ( of ( my team ) ) ) ( ( will ( ( execute ( your orders ) ) ( with ( immense precision ) ) ) ) . ) ) (ROOT (S (NP (NP (CD One)) (PP (IN of) (NP (PRP$ our) (NN number)))) (VP (MD will) (VP (VB carry) (PRT (RP out)) (NP (PRP$ your) (NNS instructions)) (ADVP (RB minutely)))) (. .))) (ROOT (S (NP (NP (DT A) (NN member)) (PP (IN of) (NP (PRP$ my) (NN team)))) (VP (MD will) (VP (VB execute) (NP (PRP$ your) (NNS orders)) (PP (IN with) (NP (JJ immense) (NN precision))))) (. .))) One of our number will carry out your instructions minutely. A member of my team will execute your orders with immense precision. entailment entailment
3 37397 37397e fiction ( ( How ( ( ( do you ) know ) ? ) ) ( ( All this ) ( ( ( is ( their information ) ) again ) . ) ) ) ( ( This information ) ( ( belongs ( to them ) ) . ) ) (ROOT (S (SBARQ (WHADVP (WRB How)) (SQ (VBP do) (NP (PRP you)) (VP (VB know))) (. ?)) (NP (PDT All) (DT this)) (VP (VBZ is) (NP (PRP$ their) (NN information)) (ADVP (RB again))) (. .))) (ROOT (S (NP (DT This) (NN information)) (VP (VBZ belongs) (PP (TO to) (NP (PRP them)))) (. .))) How do you know? All this is their information again. This information belongs to them. entailment entailment
...

train.tsv 数据样式说明:

  • train.tsv 中的数据内容共分为 12 列,第一列代表文本数据索引;第二列和第三列数据分别代表句子对的不同类型 id; 第四列代表句子对的来源;第五列和第六列代表具有句法结构分析的句子对表示;第七列和第八列代表具有句法结构和词性标注的句子对表示,第九列和第十列代表原始的句子对,第十一和第十二列代表不同标准的标注方法产生的标签,在这里,他们始终相同,一共有三种类型的标签,neutral 代表两个句子既不矛盾也不蕴含,entailment 代表两个句子具有蕴含关系,contradiction 代表两个句子观点矛盾.

test_matched.tsv 数据样式:

1
2
3
4
5
6
index   promptID    pairID  genre   sentence1_binary_parse  sentence2_binary_parse  sentence1_parse sentence2_parse sentence1   sentence2
0 31493 31493 travel ( ( ( ( ( ( ( ( Hierbas , ) ( ans seco ) ) , ) ( ans dulce ) ) , ) and ) frigola ) ( ( ( are just ) ( ( a ( few names ) ) ( worth ( ( keeping ( a look-out ) ) for ) ) ) ) . ) ) ( Hierbas ( ( is ( ( a name ) ( worth ( ( looking out ) for ) ) ) ) . ) ) (ROOT (S (NP (NP (NNS Hierbas)) (, ,) (NP (NN ans) (NN seco)) (, ,) (NP (NN ans) (NN dulce)) (, ,) (CC and) (NP (NN frigola))) (VP (VBP are) (ADVP (RB just)) (NP (NP (DT a) (JJ few) (NNS names)) (PP (JJ worth) (S (VP (VBG keeping) (NP (DT a) (NN look-out)) (PP (IN for))))))) (. .))) (ROOT (S (NP (NNS Hierbas)) (VP (VBZ is) (NP (NP (DT a) (NN name)) (PP (JJ worth) (S (VP (VBG looking) (PRT (RP out)) (PP (IN for))))))) (. .))) Hierbas, ans seco, ans dulce, and frigola are just a few names worth keeping a look-out for. Hierbas is a name worth looking out for.
1 92164 92164 government ( ( ( The extent ) ( of ( the ( behavioral effects ) ) ) ) ( ( would ( ( depend ( in ( part ( on ( ( the structure ) ( of ( ( ( the ( individual ( account program ) ) ) and ) ( any limits ) ) ) ) ) ) ) ) ( on ( accessing ( the funds ) ) ) ) ) . ) ) ( ( Many people ) ( ( would ( be ( very ( unhappy ( to ( ( loose control ) ( over ( their ( own money ) ) ) ) ) ) ) ) ) . ) ) (ROOT (S (NP (NP (DT The) (NN extent)) (PP (IN of) (NP (DT the) (JJ behavioral) (NNS effects)))) (VP (MD would) (VP (VB depend) (PP (IN in) (NP (NP (NN part)) (PP (IN on) (NP (NP (DT the) (NN structure)) (PP (IN of) (NP (NP (DT the) (JJ individual) (NN account) (NN program)) (CC and) (NP (DT any) (NNS limits)))))))) (PP (IN on) (S (VP (VBG accessing) (NP (DT the) (NNS funds))))))) (. .))) (ROOT (S (NP (JJ Many) (NNS people)) (VP (MD would) (VP (VB be) (ADJP (RB very) (JJ unhappy) (PP (TO to) (NP (NP (JJ loose) (NN control)) (PP (IN over) (NP (PRP$ their) (JJ own) (NN money)))))))) (. .))) The extent of the behavioral effects would depend in part on the structure of the individual account program and any limits on accessing the funds. Many people would be very unhappy to loose control over their own money.
2 9662 9662 government ( ( ( Timely access ) ( to information ) ) ( ( is ( in ( ( the ( best interests ) ) ( of ( ( ( both GAO ) and ) ( the agencies ) ) ) ) ) ) . ) ) ( It ( ( ( is ( in ( ( everyone 's ) ( best interest ) ) ) ) ( to ( ( have access ) ( to ( information ( in ( a ( timely manner ) ) ) ) ) ) ) ) . ) ) (ROOT (S (NP (NP (JJ Timely) (NN access)) (PP (TO to) (NP (NN information)))) (VP (VBZ is) (PP (IN in) (NP (NP (DT the) (JJS best) (NNS interests)) (PP (IN of) (NP (NP (DT both) (NNP GAO)) (CC and) (NP (DT the) (NNS agencies))))))) (. .))) (ROOT (S (NP (PRP It)) (VP (VBZ is) (PP (IN in) (NP (NP (NN everyone) (POS 's)) (JJS best) (NN interest))) (S (VP (TO to) (VP (VB have) (NP (NN access)) (PP (TO to) (NP (NP (NN information)) (PP (IN in) (NP (DT a) (JJ timely) (NN manner))))))))) (. .))) Timely access to information is in the best interests of both GAO and the agencies. It is in everyone's best interest to have access to information in a timely manner.
3 5991 5991 travel ( ( Based ( in ( ( the ( Auvergnat ( spa town ) ) ) ( of Vichy ) ) ) ) ( , ( ( the ( French government ) ) ( often ( ( ( ( proved ( more zealous ) ) ( than ( its masters ) ) ) ( in ( ( ( suppressing ( civil liberties ) ) and ) ( ( drawing up ) ( anti-Jewish legislation ) ) ) ) ) . ) ) ) ) ) ( ( The ( French government ) ) ( ( passed ( ( anti-Jewish laws ) ( aimed ( at ( helping ( the Nazi ) ) ) ) ) ) . ) ) (ROOT (S (PP (VBN Based) (PP (IN in) (NP (NP (DT the) (NNP Auvergnat) (NN spa) (NN town)) (PP (IN of) (NP (NNP Vichy)))))) (, ,) (NP (DT the) (JJ French) (NN government)) (ADVP (RB often)) (VP (VBD proved) (NP (JJR more) (NNS zealous)) (PP (IN than) (NP (PRP$ its) (NNS masters))) (PP (IN in) (S (VP (VP (VBG suppressing) (NP (JJ civil) (NNS liberties))) (CC and) (VP (VBG drawing) (PRT (RP up)) (NP (JJ anti-Jewish) (NN legislation))))))) (. .))) (ROOT (S (NP (DT The) (JJ French) (NN government)) (VP (VBD passed) (NP (NP (JJ anti-Jewish) (NNS laws)) (VP (VBN aimed) (PP (IN at) (S (VP (VBG helping) (NP (DT the) (JJ Nazi)))))))) (. .))) Based in the Auvergnat spa town of Vichy, the French government often proved more zealous than its masters in suppressing civil liberties and drawing up anti-Jewish legislation. The French government passed anti-Jewish laws aimed at helping the Nazi.
...

test_matched.tsv 数据样式说明:

  • test_matched.tsv 中的数据内容共分为 10 列,与 train.tsv 的前 10 列含义相同.

(MNLI/SNLI) 数据集的任务类型:

  • 句子对多分类任务
  • 评估指标为: ACC

7 (QNLI/RTE/WNLI) 数据集

QNLI (Qusetion-answering NLI,问答自然语言推断),自然语言推断任务。QNLI 是从另一个数据集 The Stanford Question Answering Dataset (斯坦福问答数据集,SQuAD 1.0)[3] 转换而来的。SQuAD 1.0 是有一个问题 - 段落对组成的问答数据集,其中段落来自维基百科,段落中的一个句子包含问题的答案。这里可以看到有个要素,来自维基百科的段落,问题,段落中的一个句子包含问题的答案。通过将问题和上下文(即维基百科段落)中的每一句话进行组合,并过滤掉词汇重叠比较低的句子对就得到了 QNLI 中的句子对。相比原始 SQuAD 任务,消除了模型选择准确答案的要求;也消除了简化的假设,即答案适中在输入中并且词汇重叠是可靠的提示。

样本个数:训练集 104, 743 个,开发集 5, 463 个,测试集 5, 461 个。

任务:判断问题(question)和句子(sentence,维基百科段落中的一句)是否蕴含,蕴含和不蕴含,二分类。

评价准则:准确率(accuracy)。

RTE (The Recognizing Textual Entailment datasets,识别文本蕴含数据集),自然语言推断任务,它是将一系列的年度文本蕴含挑战赛的数据集进行整合合并而来的,包含 RTE1 [4],RTE2,RTE3[5],RTE5 等,这些数据样本都从新闻和维基百科构建而来。将这些所有数据转换为二分类,对于三分类的数据,为了保持一致性,将中立(neutral)和矛盾(contradiction)转换为不蕴含(not entailment)。

样本个数:训练集 2, 491 个,开发集 277 个,测试集 3, 000 个。

任务:判断句子对是否蕴含,句子 1 和句子 2 是否互为蕴含,二分类任务。

评价准则:准确率(accuracy)。

WNLI (Winograd NLI,Winograd 自然语言推断),自然语言推断任务,数据集来自于竞赛数据的转换。Winograd Schema Challenge [6],该竞赛是一项阅读理解任务,其中系统必须读一个带有代词的句子,并从列表中找到代词的指代对象。这些样本都是都是手动创建的,以挫败简单的统计方法:每个样本都取决于句子中单个单词或短语提供的上下文信息。为了将问题转换成句子对分类,方法是通过用每个可能的列表中的每个可能的指代去替换原始句子中的代词。任务是预测两个句子对是否有关(蕴含、不蕴含)。训练集两个类别是均衡的,测试集是不均衡的,65% 是不蕴含。

样本个数:训练集 635 个,开发集 71 个,测试集 146 个。

任务:判断句子对是否相关,蕴含和不蕴含,二分类任务。

评价准则:准确率(accuracy)。

文件样式

1
* QNLI, RTE, WNLI三个数据集的样式基本相同.
1
2
3
4
- (QNLI/RTE/WNLI)/
- dev.tsv
- test.tsv
- train.tsv

文件样式说明:

  • 在使用中常用到的文件是 train.tsv, dev.tsv, test.tsv, 分别代表训练集,验证集和测试集。其中 train.tsv 与 dev.tsv 数据样式相同,都是带有标签的数据,其中 test.tsv 是不带有标签的数据.

QNLI 中的 train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
index   question    sentence    label
0 When did the third Digimon series begin? Unlike the two seasons before it and most of the seasons that followed, Digimon Tamers takes a darker and more realistic approach to its story featuring Digimon who do not reincarnate after their deaths and more complex character development in the original Japanese. not_entailment
1 Which missile batteries often have individual launchers several kilometres from one another? When MANPADS is operated by specialists, batteries may have several dozen teams deploying separately in small sections; self-propelled air defence guns may deploy in pairs. not_entailment
2 What two things does Popper argue Tarski's theory involves in an evaluation of truth? He bases this interpretation on the fact that examples such as the one described above refer to two things: assertions and the facts to which they refer. entailment
3 What is the name of the village 9 miles north of Calafat where the Ottoman forces attacked the Russians? On 31 December 1853, the Ottoman forces at Calafat moved against the Russian force at Chetatea or Cetate, a small village nine miles north of Calafat, and engaged them on 6 January 1854. entailment
4 What famous palace is located in London? London contains four World Heritage Sites: the Tower of London; Kew Gardens; the site comprising the Palace of Westminster, Westminster Abbey, and St Margaret's Church; and the historic settlement of Greenwich (in which the Royal Observatory, Greenwich marks the Prime Meridian, 0° longitude, and GMT). not_entailment
5 When is the term 'German dialects' used in regard to the German language? When talking about the German language, the term German dialects is only used for the traditional regional varieties. entailment
6 What was the name of the island the English traded to the Dutch in return for New Amsterdam? At the end of the Second Anglo-Dutch War, the English gained New Amsterdam (New York) in North America in exchange for Dutch control of Run, an Indonesian island. entailment
7 How were the Portuguese expelled from Myanmar? From the 1720s onward, the kingdom was beset with repeated Meithei raids into Upper Myanmar and a nagging rebellion in Lan Na. not_entailment
8 What does the word 'customer' properly apply to? The bill also required rotation of principal maintenance inspectors and stipulated that the word "customer" properly applies to the flying public, not those entities regulated by the FAA. entailment
...

RTE 中的 train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
index   sentence1   sentence2   label
0 No Weapons of Mass Destruction Found in Iraq Yet. Weapons of Mass Destruction Found in Iraq. not_entailment
1 A place of sorrow, after Pope John Paul II died, became a place of celebration, as Roman Catholic faithful gathered in downtown Chicago to mark the installation of new Pope Benedict XVI.Pope Benedict XVI is the new leader of the Roman Catholic Church. entailment
2 Herceptin was already approved to treat the sickest breast cancer patients, and the company said, Monday, it will discuss with federal regulators the possibility of prescribing the drug for more breast cancer patients. Herceptin can be used to treat breast cancer. entailment
3 Judie Vivian, chief executive at ProMedica, a medical service company that helps sustain the 2-year-old Vietnam Heart Institute in Ho Chi Minh City (formerly Saigon), said that so far about 1,500 children have received treatment. The previous name of Ho Chi Minh City was Saigon.entailment
4 A man is due in court later charged with the murder 26 years ago of a teenager whose case was the first to be featured on BBC One's Crimewatch. Colette Aram, 16, was walking to her boyfriend's house in Keyworth, Nottinghamshire, on 30 October 1983 when she disappeared. Her body was later found in a field close to her home. Paul Stewart Hutchinson, 50, has been charged with murder and is due before Nottingham magistrates later. Paul Stewart Hutchinson is accused of having stabbed a girl. not_entailment
5 Britain said, Friday, that it has barred cleric, Omar Bakri, from returning to the country from Lebanon, where he was released by police after being detained for 24 hours. Bakri was briefly detained, but was released. entailment
6 Nearly 4 million children who have at least one parent who entered the U.S. illegally were born in the United States and are U.S. citizens as a result, according to the study conducted by the Pew Hispanic Center. That's about three quarters of the estimated 5.5 million children of illegal immigrants inside the United States, according to the study. About 1.8 million children of undocumented immigrants live in poverty, the study found. Three quarters of U.S. illegal immigrants have children. not_entailment
7 Like the United States, U.N. officials are also dismayed that Aristide killed a conference called by Prime Minister Robert Malval in Port-au-Prince in hopes of bringing all the feuding parties together. Aristide had Prime Minister Robert Malval murdered in Port-au-Prince. not_entailment
8 WASHINGTON -- A newly declassified narrative of the Bush administration's advice to the CIA on harsh interrogations shows that the small group of Justice Department lawyers who wrote memos authorizing controversial interrogation techniques were operating not on their own but with direction from top administration officials, including then-Vice President Dick Cheney and national security adviser Condoleezza Rice. At the same time, the narrative suggests that then-Defense Secretary Donald H. Rumsfeld and then-Secretary of State Colin Powell were largely left out of the decision-making process. Dick Cheney was the Vice President of Bush. entailment

WNLI 中的 train.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
index   sentence1   sentence2   label
0 I stuck a pin through a carrot. When I pulled the pin out, it had a hole. The carrot had a hole. 1
1 John couldn't see the stage with Billy in front of him because he is so short. John is so short. 1
2 The police arrested all of the gang members. They were trying to stop the drug trade in the neighborhood. The police were trying to stop the drug trade in the neighborhood. 1
3 Steve follows Fred's example in everything. He influences him hugely. Steve influences him hugely. 0
4 When Tatyana reached the cabin, her mother was sleeping. She was careful not to disturb her, undressing and climbing back into her berth. mother was careful not to disturb her, undressing and climbing back into her berth. 0
5 George got free tickets to the play, but he gave them to Eric, because he was particularly eager to see it. George was particularly eager to see it. 0
6 John was jogging through the park when he saw a man juggling watermelons. He was very impressive. John was very impressive. 0
7 I couldn't put the pot on the shelf because it was too tall. The pot was too tall. 1
8 We had hoped to place copies of our newsletter on all the chairs in the auditorium, but there were simply not enough of them. There were simply not enough copies of the newsletter. 1

(QNLI/RTE/WNLI) 中的 train.tsv 数据样式说明:

  • train.tsv 中的数据内容共分为 4 列,第一列代表文本数据索引;第二列和第三列数据代表需要进行’是否蕴含’判定的句子对;第四列数据代表两个句子是否具有蕴含关系,0/not_entailment 代表不是蕴含关系,1/entailment 代表蕴含关系.

QNLI 中的 test.tsv 数据样式:

1
2
3
4
5
6
7
8
9
10
11
index   question    sentence
0 What organization is devoted to Jihad against Israel? For some decades prior to the First Palestine Intifada in 1987, the Muslim Brotherhood in Palestine took a "quiescent" stance towards Israel, focusing on preaching, education and social services, and benefiting from Israel's "indulgence" to build up a network of mosques and charitable organizations.
1 In what century was the Yarrow-Schlick-Tweedy balancing system used? In the late 19th century, the Yarrow-Schlick-Tweedy balancing 'system' was used on some marine triple expansion engines.
2 The largest brand of what store in the UK is located in Kingston Park? Close to Newcastle, the largest indoor shopping centre in Europe, the MetroCentre, is located in Gateshead.
3 What does the IPCC rely on for research? In principle, this means that any significant new evidence or events that change our understanding of climate science between this deadline and publication of an IPCC report cannot be included.
4 What is the principle about relating spin and space variables? Thus in the case of two fermions there is a strictly negative correlation between spatial and spin variables, whereas for two bosons (e.g. quanta of electromagnetic waves, photons) the correlation is strictly positive.
5 Which network broadcasted Super Bowl 50 in the U.S.? CBS broadcast Super Bowl 50 in the U.S., and charged an average of $5 million for a 30-second commercial during the game.
6 What did the museum acquire from the Royal College of Science? To link this to the rest of the museum, a new entrance building was constructed on the site of the former boiler house, the intended site of the Spiral, between 1978 and 1982.
7 What is the name of the old north branch of the Rhine? From Wijk bij Duurstede, the old north branch of the Rhine is called Kromme Rijn ("Bent Rhine") past Utrecht, first Leidse Rijn ("Rhine of Leiden") and then, Oude Rijn ("Old Rhine").
8 What was one of Luther's most personal writings? It remains in use today, along with Luther's hymns and his translation of the Bible.
...

  • (RTE/WNLI) 中的 test.tsv 数据样式:
1
2
3
4
5
6
7
8
9
10
11
index   sentence1   sentence2
0 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight. Horses ran away when Maude and Dora came in sight.
1 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight. Horses ran away when the trains came in sight.
2 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight. Horses ran away when the puffs came in sight.
3 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight. Horses ran away when the roars came in sight.
4 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight. Horses ran away when the whistles came in sight.
5 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight. Horses ran away when the horses came in sight.
6 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they saw a train coming. Maude and Dora saw a train coming.
7 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they saw a train coming. The trains saw a train coming.
8 Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they saw a train coming. The puffs saw a train coming.
...

(QNLI/RTE/WNLI) 中的 test.tsv 数据样式说明:

  • test.tsv 中的数据内容共分为 3 列,第一列数据代表每条文本数据的索引;第二列和第三列数据代表需要进行’是否蕴含’判定的句子对.

(QNLI/RTE/WNLI) 数据集的任务类型:

  • 句子对二分类任务
  • 评估指标为: ACC

× 请我吃糖~
打赏二维码