Web - Amazon

We provide Linux to the World


We support WINRAR [What is this] - [Download .exe file(s) for Windows]

CLASSICISTRANIERI HOME PAGE - YOUTUBE CHANNEL
SITEMAP
Audiobooks by Valerio Di Stefano: Single Download - Complete Download [TAR] [WIM] [ZIP] [RAR] - Alphabetical Download  [TAR] [WIM] [ZIP] [RAR] - Download Instructions

Make a donation: IBAN: IT36M0708677020000000008016 - BIC/SWIFT:  ICRAITRRU60 - VALERIO DI STEFANO or
Privacy Policy Cookie Policy Terms and Conditions
User:Invitatious/intindex.py - Wikipedia, the free encyclopedia

User:Invitatious/intindex.py

From Wikipedia, the free encyclopedia

This script generates an index of Wikipedia articles by the first letter of every word. Install Python from www.python.org. Save this script as intindex.py in a new folder, and put an uncompressed title data dump (just the title list) in the same folder. Execute the Python script and enter the filename of the data dump at the prompt. In about 15 minutes (time on a 2.8GHz computer with Windows XP), it should be ready. To use the abbreviation GWB for example, open the "G" folder, then open the "GW" file in Notepad or another text editor. Perform a case-sensitive search for the abbreviation all capitalized with two spaces after it. Repeat the search until all occurrences (titles) have been found.

I allow anyone to use this script for any purpose.

import sys, os, re

print "intindex.py."
print "This  script makes an index of Wikipedia articles"
print "by initials from the title list file. The list is"
print "sorted based  on  the first two characters of the"
print "abbreviation to reduce file size."

split_regex = re.compile(r"[^A-Za-z0-9]") # matches a word-seperating character
# filename_regex = re.compile(r"[^A-Za-z0-9]") # matches a character that should not be used in a filename
input_file = open(raw_input("Input filename:  "), "r") # open the input file
last_filename = "" # no output file open yet
output_file = False # no output file open yet
i = 0 # make a page counter
for page_title in input_file: # for each page title in the file...
    page_title = page_title.replace("_", " ") # convert raw title to display title
    abbreviation = "" # get ready for a new abbreviation
    title_words = split_regex.split(page_title) # split into words
    for word in title_words: # for each word in the title...
        if len(word) > 0: # if the word is not blank...
            abbreviation += word[0].upper() # get the first letter and capitalize it
    if len(abbreviation) > 2: # if the abbreviation is 2 letters long or more...
        # abbreviation = filename_regex.sub("_", abbreviation) # change unallowed characters
        output_dir = abbreviation[0:1] # build path
        if not last_filename == abbreviation[0:2]: # if this goes in a different file...
            if output_file: # if a different output file is open...
                output_file.close() # close it
            if not os.path.exists(output_dir): # if the output path doesn't exist...
                os.makedirs(output_dir) # create the directory
            output_file = open(os.path.join(output_dir, abbreviation[0:2]), "a") # open file for appending
        output_file.write(abbreviation + "  " + page_title) # write the title to the file
        last_filename = abbreviation[0:2]
        i = i + 1 # add to page counter
        if i % 5000 == 0: # if divisible by 5000...
            print "%04dK processed" % (i // 1000) # show status
input_file.close() # close the input file
Our "Network":

Project Gutenberg
https://gutenberg.classicistranieri.com

Encyclopaedia Britannica 1911
https://encyclopaediabritannica.classicistranieri.com

Librivox Audiobooks
https://librivox.classicistranieri.com

Linux Distributions
https://old.classicistranieri.com

Magnatune (MP3 Music)
https://magnatune.classicistranieri.com

Static Wikipedia (June 2008)
https://wikipedia.classicistranieri.com

Static Wikipedia (March 2008)
https://wikipedia2007.classicistranieri.com/mar2008/

Static Wikipedia (2007)
https://wikipedia2007.classicistranieri.com

Static Wikipedia (2006)
https://wikipedia2006.classicistranieri.com

Liber Liber
https://liberliber.classicistranieri.com

ZIM Files for Kiwix
https://zim.classicistranieri.com


Other Websites:

Bach - Goldberg Variations
https://www.goldbergvariations.org

Lazarillo de Tormes
https://www.lazarillodetormes.org

Madame Bovary
https://www.madamebovary.org

Il Fu Mattia Pascal
https://www.mattiapascal.it

The Voice in the Desert
https://www.thevoiceinthedesert.org

Confessione d'un amore fascista
https://www.amorefascista.it

Malinverno
https://www.malinverno.org

Debito formativo
https://www.debitoformativo.it

Adina Spire
https://www.adinaspire.com