# VAD_tutorial **Repository Path**: ooooooooya/VAD_tutorial ## Basic Information - **Project Name**: VAD_tutorial - **Description**: Simple DNN based Voice Activity Detection (VAD) using Pytorch - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 0 - **Created**: 2019-03-16 - **Last Updated**: 2023-05-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # VAD_tutorial A pytorch implementation of full-connected DNN based voice activity detection (VAD). All the features for training and testing are uploaded. ## Requirements python 3.5+ pytorch 1.0.0 pandas 0.23.4 numpy 1.13.3 pickle 4.0 matplotlib 2.1.0 sklearn 0.20.2 ## Datasets We used the dataset collected through the following task. - No. 10063424, 'development of distant speech recognition and multi-task dialog processing technologies for in-door conversational robots' Specification - Korean read speech corpus (ETRI read speech) - Clean speech at a distance of 1m and a direction of 0 degrees - 16kHz, 16bits We uploaded multi-resolution cochleagram (MRCG) features extracted from the above dataset. [python based MRCG feature extraction toolkit](https://github.com/zouxinghao/MRCG) is used. ### * Train 10000 utterances, 100 folders (100 speakers) Size : 4.4GB ```feat_MRCG_nfilt96 - train``` ### * Test 20 utterances, 10 folders (10 speakers) Size : 18MB ```feat_MRCG_nfilt96 - test``` ## Usage ### 1. Training ```python train.py``` ### 2. Testing ```python test.py``` ## Author Youngmoon Jung (dudans@kaist.ac.kr) at KAIST, South Korea