python – The fastest way to search existing column structure data structure indexes as lists

I asked this question here: https://stackoverflow.com/q/55640147/5202255 and I was told to post on this forum. I would like to know if my solution can be improved or if there is another approach to the problem. Any help is really appreciated!

I have a pandas database in which column values ​​exist in the form of lists. Each list has several elements and an element can exist on several lines. An example of a dataframe is:

``````X = pd.DataFrame ([(1,['a','b','c']), (2,['a','b']), (3,['c','d'])]columns =['A','B'])

X =
A B
0 1  [a, b, c]
1 2  [a, b]
2 3     [c, d]
``````

I want to find all the lines, that is, the data frame indexes, corresponding to the elements of the lists, and create a dictionary from them. Do not consider column A here, because column B is the one that interests you! So, the element has a ## 147 ## appears in the index 0,1, which gives {& # 39;[0,1]}. The solution for this sample data structure is:

``````Y = {a}[0,1], & B,[0,1], & C;[0,2]& # 39; re & # 39 ;:}
``````

I have written a code that works well and I can get a result. My problem is rather related to the speed of calculation. My actual data frame has about 350,000 rows and the lists in the B & # 39; can contain up to 1,000 items. But at the moment, the code works for several hours! I wondered if my solution was very inefficient.
Any help with a faster and more effective way will be really appreciated!
Here is my solution code:

``````import itertools
import pandas as pd
X = pd.DataFrame ([(1,['a','b','c']), (2,['a','b']), (3,['c','d'])]columns =['A','B'])
B_dict = []
for idx, val in X.iterrows ():
B = val['B']
B_dict.append (dict (zip (B,[[idx]]* len (B))))
B_dict = [{k: list(itertools.chain.from_iterable(list(filter(None.__ne__, [d.get(k) for d in B_dict])))) for k in set (). union (* B_dict)}]print (& # 39; Result: & # 39 ;, B_dict)
``````

Exit

``````Result: {& # 39; , & C; [0, 2], & B, [0, 1], & # 39; a & # 39 ;: [0, 1]}
``````

The code for the last line of the for loop has been borrowed from this address https://stackoverflow.com/questions/45649141/combine-values-of-same-keys-in-a-list-of-dicts and https: /stackoverflow.com/questions/16096754/remove-none-value-from-a-list-without-removing-the-0-value

Posted on