python – “object of type ‘float’ has no len()” error in Spacy nlp() on CSV file made of strings

I am getting a “object of type ‘float’ has no len()” error while using Spacy’s nlp() on a CSV file encrypted in cp1252. This error seems quite straightforward, but all the text in this file was put there as strings. Most of the text clearly is fine because nlp() runs for quite a while before generating an error and because I am able to run my program on parts of my data.

Here is the code generating the error :

path=r"C:UsersarthuDropboxBig_data_general_informationPopulismNYTTestjCSVFile"+str(year)+".csv"
    df = pd.read_csv(path,encoding='cp1252')
    saved_column1=df('Content')
    saved_column2=df('Title')

    for content in saved_column1:
        Article_Counter=Article_Counter+1
        text=nlp(content)
        Status=False
        for word in text:
            Word_Counter=Word_Counter+1
            Title_Word_Counter=Title_Word_Counter+1
            l_word=str(word.lemma_)
            if l_word in L_Keywords:
                Keyword_Counter=Keyword_Counter+1
                Title_Keyword_Counter=Title_Keyword_Counter+1
                Status=True
                if l_word in count: 
                    count(l_word)=count(l_word)+1
                else:
                    count(l_word)=1
        if Status==True:
            Keyword_Article_Counter=Keyword_Article_Counter+1
            h.append('True')
        else:
            h.append('False')

Here is the code that downloaded the data :

           q =  queue.Queue()
            for elem in h:
                q.put(elem)
            def download():
                while not q.empty():
                    link=q.get() 
                    res = requests.get(link)
                    soup=bs4.BeautifulSoup(res.text,'lxml')
                    try: 
                        time.sleep(0.5)
                            
                        date = soup.select('.css-1u1psjv.epjyd6m3')
                        date = str(date(0).getText())
                           
                        title = soup.select('.css-1vkm6nb.ehdk2mb0')
                        title=str(title(0).getText())
                           
                        content0 = soup.select('.css-158dogj.evys1bk0')
                        content=''
                                        
                        for par in content0:
                            content = content+''+str(par.getText())
                        writer.writerow((date,title,content))
                    except:
                        print('Not an article')
            def main():
                with concurrent.futures.ThreadPoolExecutor() as executor:
                    (executor.submit(download) for i in range(0,10,1))    

            if __name__ == '__main__':
                main()
            print('Day',i,'month',j,'completed.')
file.close()
df = pd.read_csv(path,encoding='cp1252')
saved_column=df('Title')
x=len(saved_column)
for i in range(0,x,1):
    j=i+1
    while j<x:
        if saved_column(i)==saved_column(j):
            df=df.drop((i))
            break
        j=j+1
new_path=str(path)
df.to_csv(new_path,index=False,encoding='cp1252')

Here is the detailed error message:

runfile(‘C:/Users/arthu/OneDrive/Documents/Recherche/Word_Count.py’, wdir=’C:/Users/arthu/OneDrive/Documents/Recherche’)
Traceback (most recent call last):

File “C:UsersarthuOneDriveDocumentsRechercheWord_Count.py”, line 43, in
text=nlp(content)

File “C:ProgramDataAnaconda3libsite-packagesspacylanguage.py”, line 437, in call
if len(text) > self.max_length:

TypeError: object of type ‘float’ has no len()

Note that :

a) Ideally, I wouldn’t have to download the data all over again (it takes days of automated web crawling).

b) I can live with dropping some articles (lines) of my CSV file.