Home / / Datasets / / alpaca-gpt4
This dataset contains 52,000 instances of instruction-following data, all unique, generated by GPT-4 using the same prompts as in Alpaca. The data includes `instruction `, `input `, `output `, and `text `, which is a concatenation of the previous fields. The dataset was structured to be compatible with Huggingface's datasets library.
The downloaded dataset files are 48.4 MB in size and consist of 52,002 rows.
The dataset can be employed to train or fine-tune language learning models (LLMs), specifically for tasks involved in text generation, conversation, and question answering. It can be used to improve the performance of models that need to understand, follow and generate responses to unique instructions, which has relevance for various applications in Industry and AI Research.
This dataset is licensed under the Creative Commons NonCommercial (CC BY-NC 4.0).