# Wikicorpus **Repository Path**: mirrors_alvations/Wikicorpus ## Basic Information - **Project Name**: Wikicorpus - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-24 - **Last Updated**: 2026-01-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Wikicorpus ========== This repo records a list of Wikipedia-related corpora Off-the-shelf ==== - [wiki.xml](): Extracted file from English Wiki-dump (10Oct2014) using [Wikipedia_Extractor](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) Build-It-Yourself ==== - [SeedLing](https://github.com/alvations/SeedLing): a seed corpus for the Human Language Project - Lucene Wiki: After downloading `wiki.xml`, you can use `WikiIndexer.py` to index the text with [pylucene](http://stackoverflow.com/questions/24278627/building-pylucene-on-ubuntu-14-04trusty-tahr).