python – information retrieval project! help in converting dictionary to postings lists


I have done preprocessing on the data (input_corpus.txt) and converted the list to dictionary

preprocessing steps:-

a. Convert document text to lowercase

b. Remove special characters. Only alphabets, numbers and whitespaces
should be present in the document.

c. Remove excess whitespaces. There should be only 1 whitespace between tokens, and no whitespace at the starting or ending of the document.

d. Tokenize the document into terms using white space tokenizer.

e. Remove stopwords from the document.

f. Perform Porter’s stemming on the tokens.

import re
import pandas as pd
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize 
from collections import defaultdict
ps = PorterStemmer()
stop = set(stopwords.words('english'))
file = open('input_corpus.txt','r')
for line in file.readlines():
    col= line.split('t')
    #print(col)
    ID=col(0)
    tokens= col(1)
    tokens=re.sub(r"(^a-zA-Z0-9 )+", ' ', tokens.lower())
    tokens=re.sub("s+"," ",tokens.lower())
    tokens=(i for i in tokens.split() if i not in stop)
    #print(tokens)
    t1 = list()
    for x in tokens:
        t1.append(ps.stem(x))
    tokens=t1
    t=t1
    #print(tokens)
    tokens= ID,tokens
    #print(tokens)
    final = {}
    for y in tokens:
        key, value =ID,t 
        final(key) = value
    print(final)
file.close()

the current output I have is:

enter image description here

After preprocessing, For each token, i need to create a postings list. Postings list must be stored as ​linked lists​. Postings of each term should be ordered by increasing document ids. Can someone help me out how to do this?

enter image description here