I’m looking for a way to speed this code up. It took about 4.5 hours to run on ~20k decks. I’m open to restructuring my SQL query, but feel adjusting the python would be more effective. The goal of this code is to see how many times a pair of cards occurs in all the decks. For example if `(“card”, “name”) appears in 27/100 decks, it returns .27.
This code takes every deck in a format, and returns the cards in those decks. This gives me a list of a lot of cards ((cardId, "card name", deckId), ... (cardId "card name", deckId))
. I use the deckId
to make a bunch of lists representing each deck (newList
). Using those lists I get each unique card and put it into a separate list (superSet
).
From there I loop through superSet
twice to get each possible pair. Then loop through newList
to see if a pair is in a deck.
I know it’s somewhat complicated and I probably didn’t explain it the best. I’ll gladly update this post to clarify anything I can.
from classes.general import Database
def main():
dbm = Database()
with dbm.con:
dbm.cur.execute("""SELECT c.id, c.name, ctd.deckId FROM cards c
JOIN cardToDeck ctd ON ctd.cardId = c.id
JOIN deckToEvent dte ON dte.deckId = ctd.deckId
JOIN eventToFormat etf ON etf.eventId = dte.eventId
WHERE etf.formatId = 5
ORDER BY ctd.deckId""")
decks = dbm.cur.fetchall()
#print(decks)
#I'm not entirely sure how this works
#Somehow it takes everything with the same deckId (index:2) and puts them into one list
values = set(map(lambda x:x(2), decks))
newlist = ((y(1) for y in decks if y(2) == x) for x in values)
#print(newlist(0))
superSet = ()
for l in newlist:
for c in l:
if c not in superSet:
superSet.append(c)
#print(superSet)
count = 0
for x in superSet:
for y in superSet:
if x is y:
continue
#print(x)
#print(y)
for d in newlist:
#print(d)
if x in d and y in d:
count += 1
deckPerc = count / len(newlist)
if deckPerc != 0.0:
print(count / len(newlist)) #needs to go into the db
count = 0
if __name__== "__main__":
main()