ACQDIV

Dataset Overview:

The ACQDIV database is a collection of video and audio recordings, transcribed speech, and linguistic annotations. The dataset was created by the ACQDIV project, which aims to identify universal cognitive processes that enable language acquisition despite the substantial cross-linguistic variation found in the world's languages. The dataset contains 13 languages and is regularly updated. It was initially created with ten grammatically maximally different languages, which were identified by applying a fuzzy clustering algorithm that takes as input a set of languages and their typological feature values (e.g. grammatical case, inflectional categories, nominal synthesis). The dataset was created by Sabine Stoll, Dagmar Jung, Steven Moran, Robert Schikowski, Damián Ezequiel Blasi, Jekaterina Mažara, Guanghao You, and Anna Jancso.

Data Description:

The ACQDIV dataset contains video and audio recordings, transcribed speech, and linguistic annotations. The data is available in several formats, including wav, mp4, toolbox, elan, xml, cha, sqlite3, and RData. The dataset includes information about variables such as morphemes, syntax, and semantics, along with information on how the variables were elicited, such as through sentence repetition or story-telling tasks. A data dictionary is available for download, which provides a description of each variable, its data type, and possible values.

Use Cases:

The ACQDIV dataset can be used for various research purposes, such as studying typological diversity in language acquisition, identifying universal cognitive processes that enable language acquisition, and identifying correlations between language features and the environment. The dataset can also be used for training and testing language models and for developing computational methods for linguistic analysis. However, there are some limitations and restrictions for using the data, such as the need to obtain permission from the ACQDIV project team before publishing any research based on the dataset.

Visualizations:

Interactive visualizations of the ACQDIV data are available on the ACQDIV project website. The visualizations include charts, graphs, and maps that allow users to explore the data in different ways. The visualizations can be exported as images or downloaded for further analysis.

Data Access:

The ACQDIV dataset can be accessed through the ACQDIV project website, which provides an API sandbox for testing the API endpoints and a download page for downloading the data as a dump in multiple formats. Sample files in each format are also available for testing purposes.

Documentation:

The ACQDIV project website provides documentation on how to use the dataset, along with information about any relevant studies or publications that have used the data.

Download Links: