| Literature DB >> 26418817 |
Christopher J Fariss1, Fridolin J Linder1, Zachary M Jones1, Charles D Crabtree1, Megan A Biek1, Ana-Sophia M Ross1, Taranamol Kaur2, Michael Tsai2.
Abstract
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.Entities:
Mesh:
Year: 2015 PMID: 26418817 PMCID: PMC4587949 DOI: 10.1371/journal.pone.0138935
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The number of human rights documents by year from the four publication sources that we have collected.
The figure shows the year-by-year distribution of reports by Amnesty International (grey), Human Rights Watch (orange), the Lawyers Committee for Human Rights (blue), and the United States Department of State (green). The increasing number of reports each year coincides to both expanding coverage in the early years of the series and the increasing number of countries that enter the international state system. Some of the older documents are not easily found. We will continue to search for missing documents and eventually plan to expand this corpus to a large number of human rights publications.
Fig 2The average number of words used per human rights report by year.
The figure shows the average number of words used per human rights report by year for Amnesty International (grey), Human Rights Watch (orange), the Lawyers Committee for Human Rights (blue), and the United States Department of State (green).
Ten most important words for the PTS Amnesty International variable.
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| 1 | polic | polic | prison | kill | kill |
| 2 | death | offic | arrest | tortur | forc |
| 3 | offic | sentenc | polit | forc | arm |
| 4 | court | illtreat | amnesti | member | group |
| 5 | concern | death | trial | arm | civilian |
| 6 | alleg | court | releas | includ | human |
| 7 | illtreat | alleg | imprison | human | secur |
| 8 | appeal | law | sentenc | execut | disappear |
| 9 | servic | servic | charg | group | attack |
| 10 | committe | concern | conscienc | arrest | execut |
Ten most important words for the PTS State Department variable.
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| 1 | right | law | prison | kill | forc |
| 2 | law | ha | offici | forc | kill |
| 3 | provid | provid | presid | secur | secur |
| 4 | respect | gener | opposit | state | civilian |
| 5 | freedom | public | howev | dure | militari |
| 6 | employ | constitut | parti | militari | area |
| 7 | women | polic | hi | continu | continu |
| 8 | prohibit | court | author | group | group |
| 9 | constitut | employ | arrest | arrest | attack |
| 10 | public | women | offic | accord | section |
Ten most important words for the CIRI Extrajudicial Killing variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | kill | polic | law |
| 2 | forc | prison | provid |
| 3 | secur | offici | right |
| 4 | state | case | employ |
| 5 | militari | opposit | respect |
| 6 | civilian | presid | freedom |
| 7 | area | arrest | prohibit |
| 8 | group | did | public |
| 9 | arm | ngo | constitut |
| 10 | attack | parti | women |
Ten most important words for the CIRI Disappearance variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | forc | kill | law |
| 2 | kill | arrest | provid |
| 3 | militari | secur | court |
| 4 | civilian | forc | polic |
| 5 | secur | militari | public |
| 6 | human | tortur | employ |
| 7 | group | member | prohibit |
| 8 | area | human | women |
| 9 | member | reportedli | constitut |
| 10 | arm | continu | right |
Ten most important words for the CIRI Torture variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | kill | law | law |
| 2 | forc | provid | right |
| 3 | secur | gener | freedom |
| 4 | tortur | employ | provid |
| 5 | continu | public | respect |
| 6 | arrest | women | ha |
| 7 | human | constitut | employ |
| 8 | militari | labor | public |
| 9 | reportedli | ha | women |
| 10 | dure | court | prohibit |
Ten most important words for the CIRI Political Imprisonment variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | arrest | presid | law |
| 2 | secur | opposit | polic |
| 3 | prison | howev | right |
| 4 | polit | elect | provid |
| 5 | kill | local | respect |
| 6 | detain | ngo | constitut |
| 7 | releas | offici | gener |
| 8 | forc | parti | women |
| 9 | reportedli | presidenti | offic |
| 10 | tortur | region | prohibit |
Ten most important words for the CIRI Freedom of Assembly and Association variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | polit | polic | right |
| 2 | prison | kill | polic |
| 3 | arrest | dure | law |
| 4 | foreign | howev | provid |
| 5 | secur | court | respect |
| 6 | detain | ngo | gener |
| 7 | offici | children | constitut |
| 8 | releas | presid | investig |
| 9 | sentenc | isra | offic |
| 10 | parti | attack | labor |
Ten most important words for the CIRI Freedom of Domestic Movement variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | forc | parti | law |
| 2 | secur | opposit | right |
| 3 | reportedli | presid | polic |
| 4 | foreign | howev | provid |
| 5 | arrest | secur | investig |
| 6 | religi | arrest | case |
| 7 | offici | kill | offic |
| 8 | continu | releas | court |
| 9 | author | member | respect |
| 10 | detain | detain | gener |
Ten most important words for the CIRI Freedom of Foreign Movement and Travel variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | prison | opposit | right |
| 2 | secur | secur | polic |
| 3 | arrest | parti | law |
| 4 | polit | presid | provid |
| 5 | foreign | section | investig |
| 6 | reportedli | detain | offic |
| 7 | islam | howev | case |
| 8 | sentenc | polit | labor |
| 9 | religi | religi | respect |
| 10 | forc | releas | gener |
Ten most important words for the CIRI Freedom of Speech and Press variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | arrest | polic | right |
| 2 | polit | case | polic |
| 3 | prison | court | provid |
| 4 | secur | offic | law |
| 5 | offici | investig | respect |
| 6 | reportedli | law | labor |
| 7 | foreign | kill | constitut |
| 8 | religi | presid | gener |
| 9 | detain | roma | worker |
| 10 | sentenc | constitut | women |
Ten most important words for the CIRI Worker Rights variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | offici | polic | right |
| 2 | arrest | children | ha |
| 3 | polit | offic | respect |
| 4 | prison | law | constitut |
| 5 | foreign | case | law |
| 6 | religi | gener | freedom |
| 7 | reportedli | provid | union |
| 8 | sentenc | child | provid |
| 9 | islam | howev | thi |
| 10 | author | investig | women |
Ten most important words for the CIRI Electoral Self-determination variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | polit | opposit | polic |
| 2 | foreign | dure | right |
| 3 | arrest | presid | law |
| 4 | prison | parti | provid |
| 5 | secur | elect | offic |
| 6 | reportedli | howev | investig |
| 7 | religi | ngo | respect |
| 8 | offici | polic | labor |
| 9 | sentenc | kill | percent |
| 10 | detain | case | case |
Ten most important words for the CIRI Freedom of Religion variable.
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | religi | state | right |
| 2 | offici | law | polic |
| 3 | reportedli | court | labor |
| 4 | islam | ethnic | provid |
| 5 | foreign | traffick | constitut |
| 6 | author | secur | respect |
| 7 | sentenc | palestinian | union |
| 8 | hi | feder | investig |
| 9 | detain | dure | law |
| 10 | arrest | parliament | gener |
Ten most important words for Hathaway torture variable.
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| 1 | right | law | presid | forc | kill |
| 2 | law | provid | prison | secur | tortur |
| 3 | freedom | women | labor | kill | militari |
| 4 | respect | public | member | soviet | mani |
| 5 | provid | constitut | nation | polic | state |
| 6 | employ | ethnic | polit | state | secur |
| 7 | polit | employ | arrest | militari | regim |
| 8 | practic | respect | parti | dure | area |
| 9 | women | union | opposit | mani | human |
| 10 | work | court | howev | tortur | iraq |