Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
The paper explores the use of prompts that build synthetic training data with specific, documented attributes using large language models (LLMs) for natural language processing (NLP) tasks. The authors present an empirical study on data generation encompa
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
AutoWS-Bench-101 is a framework for evaluating automated weak supervision techniques compared to other baseline methods such as zero-shot foundation models and supervised learning, in order to help practitioners choose the best method to generate addition
Alfred: A System for Prompted Weak Supervision
The paper introduces Alfred, a system for programmatic weak supervision (PWS) that creates training data for machine learning by prompting. It enables users to encode their subject matter expertise via natural language prompts for language and vision-lang
Utilizing Weak Supervision to Infer Complex Objects and Situations in Autonomous Driving Data
The paper explores weak supervision, specifically data programming, to label training data without ground truth labels. Using cyclist detection as a case, the weak supervision method, based on rules from separate detectors, outperforms CyDet, achieving 96
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
The paper introduces Snorkel DryBell, a system that uses existing organizational knowledge as weak supervision to significantly reduce machine learning development time and cost. Building on the Snorkel framework, Snorkel DryBell offers flexible ingestion
Training Complex Models with Multi-Task Weak Supervision
The paper addresses the challenges of collecting hand-labeled training sets for complex machine learning models. Instead of manual labeling, the study uses weaker, noisier supervision sources. The proposed framework views these sources as labeling related
Multi-Resolution Weak Supervision for Sequential Data
The study introduces Dugong, the first framework addressing this by modeling multi-resolution weak supervision sources and complex correlations. Theoretically, Dugong accurately recovers unobserved parameters. In practice, it significantly outperforms tra