Legal Docket Classification: Where Machine Learning Stumbles

Published 2008 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

We investigate the problem of binary text classification in the domain of legal docket entries. This work presents an illustrative instance of a domain-specific problem where the state-of-the-art Machine Learning (ML) classifiers such as SVMs are inadequate. Our investigation into the reasons for the failure of these classifiers revealed two types of prominent errors which we call conjunctive and disjunctive errors. We developed simple heuristics to address one of these error types and improve the performance of the SVMs. Based on the intuition gained from our experiments, we also developed a simple propositional logic based classifier using hand-labeled features, that addresses both types of errors simultaneously. We show that this new, but simple, approach outperforms all existing state-of-the-art ML models, with statistically significant gains. We hope this work serves as a motivating example of the need to build more expressive classifiers beyond the standard model classes, and to address text classification problems in such non-traditional domains.

PUBLICATION RECORD

Publication year
2008
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2008-10-25
Fields of study
Law, Computer Science
Identifiers
DOI 10.3115/1613715.1613771
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Comparison
2018cited by this paper
Learning from labeled features using generalized expectation criteria
2008influential reference
Active Learning with Feedback on Features and Instances
2006cited by this paper
Distributional Word Clusters vs
2006cited by this paper
Inductive learning algorithms and representations for text categorization
2006influential reference
Incorporating prior knowledge with weighted margin support vector machines
2004cited by this paper
Distributional Word Clusters vs. Words for Text Categorization
2003cited by this paper
Text Categorization Based on Regularized Linear Classification Methods
2001cited by this paper
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
1999cited by this paper
A re-examination of text categorization methods
1999cited by this paper
A comparison of event models for naive bayes text classification
1998cited by this paper
Inductive Logic Programming: 6th International Workshop, ILP-96, Stockholm, Sweden, August 26-28, 1996, Selected Papers
1997cited by this paper
A Comparative Study on Feature Selection in Text Categorization
1997cited by this paper
A comparison of two learning algorithms for text categorization
1994cited by this paper
Automated learning of decision rules for text categorization
1994cited by this paper
FOIL: A Midterm Report
1993cited by this paper

CITED BY

A Multi-Level Feature Fusion Network Integrating BERT and TextCNN
2025cites this paper
Cracking the code: untangling the chessboard of (de)legitimization in the International Court of Justice 2024 using computational linguistics approaches
2025cites this paper
Cost–benefit analysis of deploying shallow, deep learning and generative models for legal text classification
2025cites this paper
Evaluating rule-based and generative data augmentation techniques for legal document classification
2025cites this paper
Evaluating Shallow and Deep Learning Strategies for Legal Text Classification of Clauses in Non-Disclosure Agreements
2025cites this paper
Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents
2024cites this paper
ignore at SemEval-2024 Task 5: A Legal Classification Model with Summary Generation and Contrastive Learning
2024cites this paper
Fine-Tuning MultiFit for Enhanced Legal Sentence Basis Classification
2023cites this paper
A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents
2023cites this paper
The Ethics of Automating Legal Actors
2023cites this paper
A Benchmark Dataset for Legal Language Understanding in English
2022cites this paper
From RoBERTa to aLEXa: Automated Legal Expert Arbitrator for Neural Legal Judgment Prediction
2022cites this paper
DeepParliament: A Legal domain Benchmark & Dataset for Parliament Bills Prediction
2022cites this paper
Issue Area Discovery from Legal Opinion Summaries using Neural Text Processing
2022cites this paper
Effectively Leveraging BERT for Legal Document Classification
2021cites this paper
Multi-granular Legal Topic Classification on Greek Legislation
2021cites this paper
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
2021cites this paper
Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code
2021cites this paper
Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector
2021cites this paper
On the Ethical Limits of Natural Language Processing on Legal Text
2021cites this paper
Analysis and Multilabel Classification of Quebec Court Decisions in the Domain of Housing Law
2020cites this paper
Online publication of court records: circumventing the privacy-transparency trade-off
2020cites this paper
LegalOps: A Summarization Corpus of Legal Opinions
2020cites this paper
Publication of Court Records: Circumventing the Privacy-Transparency Trade-Off
2020cites this paper
Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation
2019cites this paper
Neural Legal Judgment Prediction in English
2019cites this paper
Large-Scale Multi-Label Text Classification on EU Legislation
2019cites this paper
Litigation Analytics: Extracting and querying motions and orders from US federal courts
2019influential citation
Deep Learning for French Legal Data Categorization
2019cites this paper
An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization
2018cites this paper
A Comparative Study of Classifying Legal Documents with Neural Networks
2018cites this paper
An Ontology Driven Knowledge Block Summarization Approach for Chinese Judgment Document Classification
2018cites this paper
A sequence approach to case outcome detection
2017cites this paper
Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
2017cites this paper
Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods
2015cites this paper
Cost-effective conceptual design using taxonomies
2015cites this paper
Improved Pattern Learning for Bootstrapped Entity Extraction
2014cites this paper
Classifying Legal Questions into Topic Areas Using Machine Learning
2014cites this paper
Identifying patent monetization entities
2013influential citation
Online publication of court records: circumventing the privacy-transparency trade-off
year unknowncites this paper