vba – Document "Search Engine" in Excel

I've created a "Brute-force search engine" based on the file name, as well as an additional keyword describing said document. I had to create one because my company's resources were messy and not fully structured. It was therefore difficult to search for a document using the simple Windows search tool.

What I did in the first place is to simply extract the file path of all the documents using CMD and DIR, and then generate it in a csv file that I processed.
Then, using an excel formula, extract the file name from the path and also delete the file type. I've added an extra column for the Keywords part, which is divided by commas.

Once the search is started, the search term is divided into individual words via spaces. The same goes for the file name and keywords. After that, a simple loop for iterates each word and checks if it matches. If it is, a counter is added. Once done, the data is copied to a temporary sheet.

After checking all available file paths and moving the corresponding result on the temporary sheet, I sort them according to the counter, so that the highest match stays up. Then I copy the results (including the path) to the main sheet using the search box, display them, and add a hyperlink to allow clicking on it.

Secondary Research Result ()
Dim x As Range, and As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Names dim () as variant, namesdup () as variant
Dim search () As String, keyword () As String, namesraw () As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long

OptimizeVBA True & # 39; greatly speeds up treatment

searchval = Worksheets ("Sheet1"). Range ("E8"). Value-Get the text of the search box.
With Worksheets ("Sheet3") & # 39; Prepared to place the results in a table.
Set tbl = .ListObjects ("tblSearch")
Set sortcol = .Range ("tblSearch[sort]")
Finish by

With worksheets ("Sheet2")
search = Split (Trim (searchval), "") divides search terms by spaces
lrow2 = .Cells (Rows.count, 1) .End (xlUp) .Row
For each range x in ("A2: A" & lrow2) Iterate all values ​​in Sheet2
account = 0
lrow = Worksheets ("Sheet3"). Cells (Rows.count, 1) .End (xlUp) .Row + 1 & # 39; get the last line in Sheet2
keyword () = Split (.Range ("d" & x.Row), ",") divides keywords with a comma
namesraw () = Split (Replace (Replace (Replace (Replace (.Range), "c" & x.Row), "-", ""), "", ""), " _ "," ")," ") & # 39; divides names into spaces, removing all unwanted characters

& # 39; This section converts the String array above into a Variant array
ReDim namesdup (LBound (namesraw) to UBound (namesraw))
Dim Index As Long
For index = LBound (namesraw) to UBound (namesraw)
namesdup (index) = namesraw (index)
Next index
end section

names () = RemoveDupesColl (namesdup ())
"We need to remove the duplicates of the name search, because that affects accuracy.
For example, if you search for something with the word "loc", the name of the file that repeats the word multiple times will get the best results.

& # 39; // The search function starts here
& # 39; This first part compares each word entered in the search box and each word in the keyword column of Sheet2.
For i = LBound (keyword) to UBound (keyword) Iterate the number of keywords in a given line.
For j = LBound (search) at UBound (search) - Iterate the number of words in the search box

If UCase (search (j)) as "*" & UCase (keyword (i)) & "*" or UCase (keyword (i)) as "*" & UCase (search (j)) & " * "Then" compare the search term and the keyword
Sheets ("Sheet3"). Range ("A", "B", and .row) .Value = .Range ("A" & x.Row, "B" & x.Row) .Value & Copy A & B to Sheet3.
account = account + 1
Sheets ("Sheet3"). Range ("C" & lrow) .Value = count & put a count on Sheet3
Sheets ("Sheet3"). Range ("D" & lrow) .Value = .Range ("E" & x.Row) .Value & # 39; Copy D to Sheet3
End if


For k = LBound (names) to UBound (names) - Iterates the number of names separated from the document name.
For the LBound (search) at UBound (search) Iterate the number of words in the search box
If Len (names (k)) <= 3 And Len(names(k)) > 1 Then, • Prevents to get the best results to be part of a long word, for example: the word LOC will be found on all the words that have it, like "LOCATION" .
If UCase (search (l)) = UCase (names (k)) or UCase (names (k)) = UCase (search (l)) Then it's about it. a short word, it must be identical to the desired term.
Sheets ("Sheet3"). Range ("A", "B", and): .Value = .Range ("A" & x.Row, "B" & x.Row) .Value
account = account + 1
Sheets ("Sheet3"). Range ("C" & lrow) .Value = count
Sheets ("Sheet3"). Range ("D" & lrow) .Value = .Range ("E" & x.Row) .Value
End if
If (UCase (search (l)) as "*" & UCase (names (k)) & "*" or UCase (names (k)) as "*" & UCase (search (l)) & "*") And Len (names (k))> 2 And Len (search (l))> 2 Then compare the search term and the name of the document
Sheets ("Sheet3"). Range ("A", "B", and): .Value = .Range ("A" & x.Row, "B" & x.Row) .Value
account = account + 1
Sheets ("Sheet3"). Range ("C" & lrow) .Value = count
Sheets ("Sheet3"). Range ("D" & lrow) .Value = .Range ("E" & x.Row) .Value
End if
End if
& # 39; // The search function ends here

Next X

Finish by

With tbl.Sort & # 39; sort everything according to the number to get the best result on top
.SortFields.Add Key: = sortcol, SortOn: = xlSortOnValues, order: = xlDescending
.Header = xlYes
Finish by

End Sub

Under copysearch ()
Dim Linkrange As Range, As Range
Dim namerange As Range
Dim hyp As hyperlink
Hyps as hyperlinks

With worksheets ("Sheet1")
Sheets ("Sheet3"). Range ("A2: D21"). Copy - Copy the first 20 results.
.Range ("D13"). PasteSpecial Paste: = xlPasteValues ​​& # 39; and paste them on Sheet1
Application.CutCopyMode = False

Set linkrange = .Range ("D13: D32")
Set namerange = .Range ("E13: E32")

For each c in the naming range Iterate all cells from E13 to E32
c.ClearHyperlinks & # 39; Remove all hypertext links if there are any
If c <> "" Then "Make sure you do not add a hyperlink on empty cells
c.Hyperlinks.Add c, .Range ("D" & c.Row) Adds a hyperlink based on the value of D.
If .Range ("G" & c.Row) .Value = True Then & # 39; Check if the value of G is True
.Range ("E" & c.Row) .Font.Color = vbWhite & # 39; The link is valid. It is therefore white.
.Range ("E" & c.Row) .Font.Color = vbRed & # 39; The link is invalid, colored in red, must be updated.
End if
End if

.Range ("E13: E32"). Font.Underline = False
.Range ("E13: E32"). Font.name = "Cambria"

Finish by

OptimizeVBA False

End Sub

Everything works as expected, although I think the code can still be optimized. Please note that I only touch VBA and have some VB.Net experience but I am not really a programmer. However, as I understand the formatting, I still made sure that my coding could always be understood. The comments are added everywhere because I am only a temporary employee of the company and I would like to pass it on to someone else just in case.

My difficulties when I started this:

  • Singular / plural terms: This is why there is a statement "Or" which
    inverts the variables of the "Like" statement as a resolution.
  • 2-3 letter words lying inside big words: As we use
    acronyms or shortcuts, there are times when there are results that
    are found that it is not really related to what is supposed to show.
    So, if additional statements have been added specifically for the short
  • Repetition of words in file names: for some reason, there are file names that repeat the same word several times (the acronyms), which has the effect of distorting the final result because it corresponds several times. I've used an online code to remove duplicates via the collection method. Therefore, it was necessary to convert the array to a variant.
  • Opening write-protected documents: Unfortunately, this has not been resolved, but a pop-up window in Word that requires read-only opening does not appear above the Excel program. Excel meanwhile, does not respond until the contextual box is answered. Wait too long and an OLE error will appear. One solution is to open another Word program and the pop-up window will appear.