PhaultLines

How to query an on-disk SearchKit index with PyObjC

OS X comes with a framework called SearchKit that applications can use to index content and support full-text searches. I occasionally run into situations where I want to take advantage of an application’s existing SearchKit index so that I can perform my own full-text searches programmatically. The most recent case arose this weekend while I was attempting to build an Alfred workflow for searching Quiver notes.

Quiver uses SearchKit internally to expose full-text search through its own user interface, but it doesn’t have native Spotlight integration yet. To find notes faster, I wanted to make an Alfred script filter that queries the SearchKit index and displays the results. It only took me a few minutes to figure out where Quiver caches its SearchKit index when I started poking around the ~/Library folder. The only remaining obstacle to querying it myself is that I’d need to figure out the actual name of the index in order to read the file with the SearchKit APIs.

Apple doesn’t provide a standard method for retrieving the index name from the SearchKit index file itself. Fortunately, I’m not the first person to tackle the problem. A quick internet search turned up a very helpful blog post by Tim Schröder that includes a detailed explanation of the issue and a snippet of Objective-C code that manually extracts the data from the correct position in a SearchKit binary index file. Armed with Tim’s snippet, I figured out that “index” is the (rather unsurprising) name of Quiver’s index.

I decided to write my SearchKit query script in Python, using PyObjC to access the SearchKit APIs. Even though PyObjC often results in slightly ugly code, I’ve had good luck with it in the past. It seems to work out of the box in most Mac environments and it lets me use native platform APIs without compiling anything. With the help of Apple’s SearchKit programming guide and SearchKit API reference, I put together the following Python code:

import os.path, Cocoa, SearchKit
from CoreFoundation import CFURLGetString

def search(query, fileName, indexName):
  file = Cocoa.NSURL.fileURLWithPath_(fileName)
  index = SearchKit.SKIndexOpenWithURL(file, indexName, False)
  group = SearchKit.SKSearchGroupCreate([index])

  results = SearchKit.SKSearchCreate(index, query, SearchKit.kSKSearchRanked)
  busy, items, scores, count = \
      SearchKit.SKSearchFindMatches(results, 20, None, None, 1.0, None)

  for score, item in zip(scores, items):
    doc = SearchKit.SKIndexCopyDocumentForDocumentID(index, item)
    url = CFURLGetString(SearchKit.SKDocumentCopyURL(doc))
    name = SearchKit.SKDocumentGetName(doc)
    props = SearchKit.SKIndexCopyDocumentProperties(index, doc)
    print url, name, props

The function performs a search on the specified index using the provided query string. When you write your query, you can use boolean operators (&, |, !) and the wildcard symbol (*). For my Alfred workflow, I chose to enclose the user’s query in wildcard symbols in order to find partial matches. The search returns a score for each item to convey the relevance of the match. The URL that the function displays for each result points to the file that contains the match.

My Alfred workflow reads the JSON metadata file for each Quiver note that matches the user’s query. When the user selects a note in Alfred, the workflow uses the quiver:// URL scheme to open the note. You can download the complete Alfred workflow here.

In addition to Apple’s documentation, I found several other useful resources that proved helpful while I was writing the search code. NSHipster has a great introductory tutorial by Matt Thompson that explains how to use SearchKit. I also learned a lot from a SearchKit example included in the PyObjC documentation.