{"corpus_id":62482127,"paper_sha":"021758bcb5f6bcebb392663fce68370ddf4aa722","doi":"10.5281/ZENODO.47258","arxiv_id":null,"pmid":null,"pmcid":null,"mag_id":2299732117,"dblp_id":null,"acl_id":null,"title":"Detecting translingual plagiarism and the backlash against translation plagiarists","year":2014,"publication_date":"2014-06-30","venue":"","journal":{"name":"","pages":null,"volume":"1"},"journal_issn":null,"journal_title":null,"publication_types":[],"pubmed_pub_types":null,"s2_fields_of_study":["Linguistics","Computer Science"],"reference_count":60,"citation_count":14,"influential_citation_count":2,"is_open_access":true,"arxiv_categories":null,"arxiv_license":null,"arxiv_journal_ref":null,"mesh_headings":null,"chemicals":null,"comments_corrections":null,"source_flags":1,"s2_open_access_pdf_url":"https://zenodo.org/record/47258/files/Sousa-Silva.pdf","s2_open_access_landing_url":"https://www.semanticscholar.org/paper/021758bcb5f6bcebb392663fce68370ddf4aa722","s2_open_access_license":"CCBY","s2_open_access_status":"GREEN","pmc_open_access_pdf_url":null,"pmc_open_access_landing_url":null,"pmc_open_access_license":null,"pmc_open_access_status":null,"unpaywall_open_access_pdf_url":null,"unpaywall_open_access_landing_url":null,"unpaywall_open_access_license":null,"unpaywall_open_access_status":null,"abstract":"Plagiarism detection methods have improved signiVcantly over the last decades, and as a result of the advanced research conducted by computational and mostly forensic linguists, simple and sophisticated textual borrowing strategies can now be identiVed more easily. In particular, simple text comparison algorithms developed by computational linguists allow literal, word-for-word plagiarism (i.e. where identical strings of text are reused across diUerent documents) to be easily detected (semi-)automatically (e.g. Turnitin or SafeAssign), although these methods tend to perform less well when the borrowing is offuscated by introducing edits to the original text. In this case, more sophisticated linguistic techniques, such as an analysis of lexical overlap (Johnson, 1997), are required to detect the borrowing. However, these have limited applicability in cases of ‘translingual’ plagiarism, where a text is translated and borrowed without acknowledgment from an original in another language. Considering that (a) traditionally non-professional translation (e.g. literal or free machine translation) is the method used to plagiarise; (b) the plagiarist usually edits the text for grammar and syntax, especially when machine-translated; and (c) lexical items are those that tend to be translated more correctly, and carried over to the derivative text, this paper proposes a method for ‘translingual’ plagiarism detection that is grounded on translation and interlanguage theories (Selinker, 1972; Bassnett and Lefevere, 1998), as well as on the principle of ‘linguistic uniqueness’ (Coulthard, 2004). Empirical evidence from the CorRUPT corpus (Corpus of Reused and Plagiarised Texts), a corpus of real academic and non-academic texts that were investigated and accused of plagiarising originals in other languages, is used to illustrate the applicability of the methodology proposed for ‘translingual’ plagiarism detection. Finally, applications of the method as an investigative tool in forensic contexts are discussed.","claims":[{"public_id":"cl_7ed0599edb6b7e5dd90c5ef2b7657c0d","status":"active","text":"A method for translingual plagiarism detection is proposed, grounded in translation theory, interlanguage theory, and the principle of linguistic uniqueness.","confidence":0.92,"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_7ed0599edb6b7e5dd90c5ef2b7657c0d"},{"public_id":"cl_55a26d74670ccb2b953b311b2e5a0f8c","status":"active","text":"Applications of the proposed method as an investigative tool in forensic linguistic contexts are discussed.","confidence":0.83,"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_55a26d74670ccb2b953b311b2e5a0f8c"},{"public_id":"cl_2b3f6076af35868141a16ba5cdd193e6","status":"active","text":"Lexical items tend to be translated more correctly and carried over to derivative texts, making lexical analysis central to detecting translingual plagiarism.","confidence":0.85,"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_2b3f6076af35868141a16ba5cdd193e6"},{"public_id":"cl_0f188a9a7fbb9f8230e60f036664970a","status":"active","text":"Simple text comparison algorithms can detect literal word-for-word plagiarism semi-automatically but have limited applicability when borrowing involves translation across languages.","confidence":0.88,"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_0f188a9a7fbb9f8230e60f036664970a"},{"public_id":"cl_d7f25eed69a13749c642520fdae07bb0","status":"active","text":"The CorRUPT corpus of real academic and non-academic texts investigated for cross-language plagiarism is used to empirically illustrate the applicability of the proposed methodology.","confidence":0.9,"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_d7f25eed69a13749c642520fdae07bb0"}],"concepts":[{"public_id":"co_3dcf54350bc0720e40bf29ed1ca2cd83","status":"active","name":"literal plagiarism","description":"Word-for-word textual borrowing where identical strings of text appear across different documents, detectable by simple string-matching algorithms.","types":["phenomenon"],"aliases":["word-for-word plagiarism"],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_3dcf54350bc0720e40bf29ed1ca2cd83"},{"public_id":"co_486375180a8c5ebdc9fe7f377c9fe5cc","status":"active","name":"text comparison algorithms","description":"Computational methods that compare texts for identical or near-identical strings to detect literal plagiarism, exemplified here by tools such as Turnitin and SafeAssign.","types":["method"],"aliases":["plagiarism detection tools"],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_486375180a8c5ebdc9fe7f377c9fe5cc"},{"public_id":"co_72d4f7771d7a13ce9e1402b02126f122","status":"active","name":"translation theory","description":"Theoretical frameworks concerning translation processes (Bassnett and Lefevere, 1998) used here to ground the translingual plagiarism detection methodology.","types":["theory"],"aliases":[],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_72d4f7771d7a13ce9e1402b02126f122"},{"public_id":"co_72e68778734a4255d4dc8c580d069335","status":"active","name":"translingual plagiarism detection","description":"The identification of plagiarism where a source text in one language is translated and reused without acknowledgment in a derivative text in another language.","types":["method","research problem"],"aliases":["cross-language plagiarism detection"],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_72e68778734a4255d4dc8c580d069335"},{"public_id":"co_8dc93881e9ef6c7233708dc9287ca83c","status":"active","name":"lexical overlap analysis","description":"A linguistic technique for detecting textual borrowing by measuring shared vocabulary items between documents.","types":["method"],"aliases":["lexical overlap"],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_8dc93881e9ef6c7233708dc9287ca83c"},{"public_id":"co_a323d3f19cdaa60cf959df173a3d8717","status":"active","name":"interlanguage theory","description":"A linguistic theory (Selinker, 1972) describing the intermediate language system of second-language learners, used here as a theoretical basis for the detection method.","types":["theory"],"aliases":["interlanguage"],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_a323d3f19cdaa60cf959df173a3d8717"},{"public_id":"co_d209fbf17230df958357d7f07438af88","status":"active","name":"forensic linguistics","description":"The application of linguistic analysis to legal and investigative contexts, the field in which the proposed translingual plagiarism detection method is applied.","types":["field"],"aliases":["forensic linguistic analysis"],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_d209fbf17230df958357d7f07438af88"},{"public_id":"co_f3a2d9c6cdc37f7a3a181f01bfbd2bbe","status":"active","name":"machine translation","description":"Automated translation of text between languages, identified here as a common method used by plagiarists to produce derivative texts that are then edited for grammar and syntax.","types":["method","phenomenon"],"aliases":[],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_f3a2d9c6cdc37f7a3a181f01bfbd2bbe"},{"public_id":"co_f3cc77836b4f1d51a8de0dbf0ab3c773","status":"active","name":"linguistic uniqueness","description":"A principle (Coulthard, 2004) holding that an individual's linguistic choices are sufficiently distinctive to enable text identification, applied here as a grounding for the detection methodology.","types":["principle"],"aliases":[],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_f3cc77836b4f1d51a8de0dbf0ab3c773"},{"public_id":"co_ff768eb9be9a5acf87cb1a7164b44070","status":"active","name":"CorRUPT corpus","description":"A corpus of real academic and non-academic texts investigated and accused of plagiarising originals in other languages, used here to provide empirical evidence for the proposed methodology.","types":["dataset","corpus"],"aliases":["Corpus of Reused and Plagiarised Texts"],"contributors":[{"id":170,"public_id":"gsgmdx9r6e","public_label":"pupuri (gsgmdx9r6e)","roles":["extraction"],"url":"https://sah.borca.ai/u/gsgmdx9r6e"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_ff768eb9be9a5acf87cb1a7164b44070"}],"external_ids":{"DOI":"10.5281/ZENODO.47258","ArXiv":null,"PubMed":null,"PubMedCentral":null,"MAG":2299732117,"DBLP":null,"ACL":null},"open_access":{"is_open_access":true,"pdf_url":"https://zenodo.org/record/47258/files/Sousa-Silva.pdf","landing_url":"https://www.semanticscholar.org/paper/021758bcb5f6bcebb392663fce68370ddf4aa722","source":"semantic_scholar","pdf_url_source":"semantic_scholar_open_access_pdf","license":"CCBY","status":"GREEN","reason":null},"reference_availability":{"status":"available","references_indexed":true,"full_text_available":false,"full_text_source":null,"count_basis":"semantic_scholar_metadata","extraction_status":"not_applicable","reason":null},"source":{"provider":"episteme2","base_corpus":"semantic_scholar_dump","freshness_mode":"unknown","basis":["semantic_scholar_metadata","postgres_metadata"],"limits":["paper metadata is based on indexed upstream scholarly datasets","claims and concepts are available only for extracted papers","absence of claims or concepts means no extracted graph data is available in this response"],"status":"available","degraded":false,"degraded_reasons":[],"diagnostics":{"status":"available","degraded":false,"degraded_reasons":[],"metadata_status":"available","graph_status":"available","abstract_status":"available"},"source_flags":1},"paper_id":630794,"paper_uid":"717a9e4f-e6cb-4393-8f13-b8a59b28c98b","canonical_identity":{"paper_id":630794,"paper_uid":"717a9e4f-e6cb-4393-8f13-b8a59b28c98b","identity_status":"available","lookup_basis":"semantic_scholar_external_id","compatibility_path":"corpus_id"},"url":"https://sah.borca.ai/papers/62482127"}