{"id":20644,"date":"2025-09-18T18:08:26","date_gmt":"2025-09-18T18:08:26","guid":{"rendered":"https:\/\/www.directimpactsolutions.com\/?p=20644"},"modified":"2025-10-08T13:04:24","modified_gmt":"2025-10-08T13:04:24","slug":"filemaker-ai-and-entity-matching-pt-2","status":"publish","type":"post","link":"https:\/\/www.directimpactsolutions.com\/en\/filemaker-ai-and-entity-matching-pt-2\/","title":{"rendered":"FileMaker AI and Entity Matching \u2013 Pt. 2"},"content":{"rendered":"<h4 class=\"wp-block-heading\">A technical approach for leveraging Semantic Find to solve the difficult problem of fuzzy matching<\/h4><div class=\"wp-block-uagb-info-box uagb-block-1c3842f4 uagb-infobox__content-wrap  uagb-infobox-icon-above-title uagb-infobox-image-valign-top\"><div class=\"uagb-ifb-content\"><div class=\"uagb-ifb-icon-wrap\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M0 256C0 114.6 114.6 0 256 0C397.4 0 512 114.6 512 256C512 397.4 397.4 512 256 512C114.6 512 0 397.4 0 256zM371.8 211.8C382.7 200.9 382.7 183.1 371.8 172.2C360.9 161.3 343.1 161.3 332.2 172.2L224 280.4L179.8 236.2C168.9 225.3 151.1 225.3 140.2 236.2C129.3 247.1 129.3 264.9 140.2 275.8L204.2 339.8C215.1 350.7 232.9 350.7 243.8 339.8L371.8 211.8z\"><\/path><\/svg><\/div><div class=\"uagb-ifb-title-wrap\"><h3 class=\"uagb-ifb-title\"><strong>UPCOMING EVENT: Using AI to Handle Duplicates in FileMaker<\/strong>&nbsp;<\/h3><\/div><p class=\"uagb-ifb-desc\">Join Direct Impact Solutions for an exclusive live webinar to learn how AI can simplify searching your data, detect and clean duplicates, and prevent them before they happen.<br><br>\ud83d\udcc5 <strong>Date:<\/strong> September 30, 2025<br><strong>\u23f0 Time:<\/strong> <strong>11:00 AM Pacific \/ 2:00 PM Eastern \/ 8:00 PM Central Europe<\/strong>&nbsp;<br>\ud83d\udd17 <strong>Reserve your place here:<\/strong> <a href=\"https:\/\/zoom.us\/webinar\/register\/WN_iI0mOhzDRNGzCWZAgtOqBQ\" target=\"_blank\" rel=\"noreferrer noopener\">Webinar Registration<\/a><br><br><strong>Host: <\/strong>David Weiner, Senior Developer and Project Manager at Direct Impact Solutions<br><strong>Languages:<\/strong> English audio with simultaneous French subtitles<br><br>Don\u2019t miss this unique opportunity to interact with our experts and explore the full potential of AI in FileMaker for your organization!<\/p><\/div><\/div><figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"357\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image.png\" alt=\"\" class=\"wp-image-20645\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image.png 975w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-300x110.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-768x281.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-600x220.png 600w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/figure><p><a href=\"https:\/\/www.directimpactsolutions.com\/en\/filemaker-ai-and-entity-matching-pt-1\/\">In part 1 of this article<\/a>, I described how FileMaker\u2019s <strong>Semantic Find<\/strong> differs from its regular find, what <a href=\"https:\/\/neo4j.com\/blog\/graph-database\/what-is-entity-resolution\/\" target=\"_blank\" rel=\"noreferrer noopener\">Entity Matching<\/a> is (and why it\u2019s important for FileMaker users), and how Semantic Find can make deduplication tasks easier. Here in part 2, I\u2019ll cover the specifics of an entity matching approach in FileMaker using Semantic Find to perform fuzzy matching. I\u2019ll only briefly touch on how to set up Semantic Find in a FileMaker solution, since there are <a href=\"https:\/\/blog.beezwax.net\/filemaker-semantic-search-part-1-fundamental-power\/\" target=\"_blank\" rel=\"noreferrer noopener\">plenty<\/a> of <a href=\"https:\/\/medium.com\/transforming-digital\/a-deep-dive-into-filemaker-2024s-new-semantic-search-functionality-e24b2ed24ffd\" target=\"_blank\" rel=\"noreferrer noopener\">other<\/a> <a href=\"https:\/\/www.teamdf.com\/blogs\/semantic-search\/\" target=\"_blank\" rel=\"noreferrer noopener\">articles<\/a> covering this topic in detail, but I <em>will<\/em> go over the design choices specific to entity matching.<\/p><h2 class=\"wp-block-heading\">What Is Entity Matching?<\/h2><p>There\u2019s a great overview of Entity Matching at the link above, but if you aren\u2019t familiar with it (and you haven\u2019t read part 1 of this article), it\u2019s the process of determining if two records refer to the same real-world entity. It\u2019s required in a wide variety of contexts, and is crucial for the deduplication of data. It has also been a continual and vexing problem in the FileMaker systems I work with daily.<\/p><p>The basic problem is to be able to compare two (or more) records and figure out, programmatically, whether they\u2019re actually referring to the same thing. Is \u201cHanna C Beuler\u201d the same person as \u201cHannah Bueler\u201d?<\/p><p><strong>Record 1: Hanna C Bueler, born on 03\/12\/1990<\/strong><\/p><p>Record 2: <strong>Hannah Beuler, born on 12\/03\/1990<\/strong><\/p><p>In FileMaker, using the normal \u201cexact match\u201d find, it would be difficult to even <em>locate<\/em> these two records to compare to each other (since their spelling and dates of birth are different), let alone determine if they\u2019re a match. A human user could see pretty quickly that these are likely the same person, but a simple FileMaker search can\u2019t do that. This is where Semantic Find comes in.<\/p><h2 class=\"wp-block-heading\">Looking for a Match<\/h2><p>The usual process of looking for duplicates would be to search multiple fields (or some kind of concatenated calculation field) for an exact match:<\/p><pre class=\"wp-block-code\"><code>Enter Find Mode &#91; Pause: Off ]\nSet Field &#91; Person::FirstName ; \u201cHanna\u201d ]\nSet Field &#91; Person::LastName ; \u201cBueler\u201d ]\nSet Field &#91; Person::DOB ; \u201c3\/12\/1990\u201d ]\nPerform Find &#91; ]<\/code><\/pre><p>Perhaps this goes without saying, but if you find what you\u2019re looking for using this type of standard search, then you don\u2019t need to do anything with semantic find! You\u2019ve found a match, and you can move on. However, if your initial search turns up no exact matches, it\u2019s time to try semantic find to locate a close (but not exact) match. Semantic find is a game-changer for finding tough-to-match records once you\u2019ve exhausted the simple \u201cexact match\u201d find.<\/p><h2 class=\"wp-block-heading\">Setting up FileMaker for Entity Matching<\/h2><p>In order to perform entity matching using Semantic Find in FileMaker, we first need to create <strong>vector embeddings<\/strong> for our match data.<\/p><h2 class=\"wp-block-heading\">Embeddings<\/h2><figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"487\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-1.png\" alt=\"\" class=\"wp-image-20648\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-1.png 975w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-1-300x150.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-1-768x384.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-1-600x300.png 600w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/figure><p>A vector embedding is a numerical representation of data that captures its semantic meaning. We store embeddings of our search data in a container field. FileMaker\u2019s Semantic Find then compares these vectors with your search query, returning a similarity score. So, the first thing to do is decide what data you\u2019d like to match against and add container fields to store their embeddings.<\/p><p>While embeddings can be stored in the search table itself, I prefer to store them in a separate table in a one-to-one relationship with their data table. &nbsp;Using FileMaker 22\u2019s built-in support for either OpenAI, Anthropic, or Cohere, this is as easy as using the \u2018Insert Embedding in Found Set\u2019 script step and choosing the fields you\u2019d like to create embeddings for:<\/p><pre class=\"wp-block-code\"><code>Insert Embedding in Found Set &#91; Account Name: \u201cMy_Semantic_Find\u201d ; Embedding Model: $$loaded_model ; Source Field: Person::ConcatenatedFirstLastDOB ; Target Field: EmbeddingTable::VectorizedFirstLastDOB ; Replace target contents ]<\/code><\/pre><p>I opted to use <a href=\"https:\/\/ai.google.dev\/gemini-api\/docs\/embeddings\" target=\"_blank\" rel=\"noreferrer noopener\">Google\u2019s Gemini embedding model<\/a> instead of one of the built in model providers, since<\/p><ol style=\"list-style-type:lower-alpha\" class=\"wp-block-list\"><li>I like Google\u2019s tools<\/li>\n\n<li>The model\u2019s performance and accuracy are excellent<\/li>\n\n<li>The embeddings are half the size of OpenAI\u2019s (6k vs. 12k), and<\/li>\n\n<li>Their embedding model is <a href=\"https:\/\/ai.google.dev\/gemini-api\/docs\/pricing#gemini-embedding\" target=\"_blank\" rel=\"noreferrer noopener\">free<\/a><\/li><\/ol><p>Using Gemini requires a bit more work, since it requires making a REST call with the \u2018Insert from URL\u2019 script step, parsing the resulting vector data, and storing the embedding in a container field as a binary file (using the \u2018GetEmbeddingAsFile\u2019 function), but it\u2019s not too difficult if you\u2019re familiar with making API calls in FileMaker:<\/p><figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"481\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-2-1024x481.png\" alt=\"\" class=\"wp-image-20651\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-2-1024x481.png 1024w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-2-300x141.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-2-768x361.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-2-600x282.png 600w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-2.png 1054w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><p>Once we\u2019ve added our embeddings, we can begin to search against our data table using Semantic Find.<\/p><h2 class=\"wp-block-heading\">Embedding considerations<\/h2><p>Entity matching is different from an ordinary semantic find, since we\u2019re not so much looking for similarity in <em>meaning<\/em> as we are looking for just plain similarity. We want to match \u201c<strong>Hannah Beuler<\/strong>\u201d with \u201c<strong>Hanna C Bueler<\/strong>\u201d, not with [<em>several paragraphs about the life of Hanna C Bueler<\/em>]. This being the case, we only need to create vector embeddings of a selection of fields used for matching. And since the fields we select will typically be pretty small (as opposed to large blocks of text), the <a href=\"https:\/\/medium.com\/@bishalmukherjee2\/llm-tokens-what-they-are-and-why-you-should-care-7d97c2130141\" target=\"_blank\" rel=\"noreferrer noopener\">token count<\/a> will remain small as well.<\/p><p>So how do we pick what to vectorize for entity matching? There are probably any number of ways to approach this, but the method I\u2019ve used with success is to create a calculated field of a short match phrase that, if identical to your query, would represent a perfect match. So, if you\u2019re searching for a person, you might create a calculation field consisting of FirstName, LastName, DateOfBirth, and PhoneNumber. If you\u2019re searching for an organization, you might use OrganizationName, Address, and PhoneNumber. For a catalog product, you might use ItemDescription and Size. Here are some basic guidelines:<\/p><ul class=\"wp-block-list\"><li>Choose discrete fields that may vary slightly from an exact match, but that are also good identifiers of a specific entity.<\/li>\n\n<li>Use small text fields instead of large text strings to get higher match percentage. This is different from semantic find where we\u2019re looking for meaning &#8212; in which case, we want to see how much our query matches all of the text. For entity matching, multiple short text strings that all have a high match will be more effective.<\/li>\n\n<li>Semantic Find is case sensitive, so for better accuracy, eliminate case differences where case doesn\u2019t matter (\u2018FIRSTNAME LASTNAME\u2019 should always be 100% match with \u2018Firstname Lastname\u2019, but an acronym like \u2018USIC\u2019 shouldn\u2019t be a good match for \u2018Music\u2019 ).<\/li><\/ul><h2 class=\"wp-block-heading\">Fuzzy Matching: Step by Step<\/h2><p>Once you\u2019ve chosen your match field(s) and created vector embeddings of them, the process for entity matching using Semantic Find goes like this:<\/p><ol class=\"wp-block-list\"><li><strong>Blocking<\/strong> &#8211; Perform a standard FileMaker find on a single match field to create a subset of records that has a greater likelihood of containing the match you\u2019re looking for.<\/li>\n\n<li><strong>Semantic Find<\/strong> &#8211; Perform Semantic Find on this subset of records, collecting any matches in a variable.<\/li>\n\n<li><strong>Repeat<\/strong> &#8211; Loop back over steps 1-2 several more times, blocking on a <em>different match field each time<\/em>, and appending the new results (skipping any records that come up again).<\/li>\n\n<li><strong>Process Results<\/strong> &#8211; Take the results and process them how you see fit.<\/li><\/ol><p>This process should yield one or more records that were exactly matched to a portion of the search query, but <em>fuzzy matched to the rest of the search query. <\/em>Ideally, one of them has a high enough match score that you can safely call it a match!<\/p><figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"974\" height=\"712\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-3.png\" alt=\"\" class=\"wp-image-20654\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-3.png 974w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-3-300x219.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-3-768x561.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-3-600x439.png 600w\" sizes=\"auto, (max-width: 974px) 100vw, 974px\" \/><\/figure><h2 class=\"wp-block-heading\">Step 1 \u2013 Blocking<\/h2><p>Because semantic find performance can be slow against large data sets, it\u2019s generally necessary to reduce the size of your initial found set. This is called <a href=\"https:\/\/towardsdatascience.com\/entity-resolution-identifying-real-world-entities-in-noisy-data-3e8c59f4f41c\/#:~:text=Cluster%20Evaluation-,Overview%20of%20Entity%20Resolution,records%20that%20share%20similar%20attributes%2C%20making%20the%20subsequent%20comparison%20more%20efficient.,-2.%20Block%20Processing\" target=\"_blank\" rel=\"noreferrer noopener\">blocking<\/a>, and it aims to pare down the number of records to search in order to find what we\u2019re looking for:<\/p><blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Blocking is the first step in entity resolution that groups similar records together based on certain attributes. By doing so, the process narrows its search to only consider comparisons within each block, rather than examining all possible record pairs in the dataset. This significantly reduces the number of comparisons and accelerates the ER process. As it skips many comparisons, it possibly leads to missed true matches. Therefore, Blocking should achieve a good balance between efficiency and accuracy. (<a href=\"https:\/\/towardsdatascience.com\/entity-resolution-identifying-real-world-entities-in-noisy-data-3e8c59f4f41c\/#:~:text=the%20subsequent%20steps.-,Blocking,Therefore%2C%20Blocking%20should%20achieve%20a%20good%20balance%20between%20efficiency%20and%20accuracy.,-In%20this%20section\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Tomonori Masui, Sept 21, 2023 \u2013 Towards Data Science<\/em><\/a>) It takes <em>significantly<\/em> less time to perform a semantic find against 1,000 records than it does against 100,000 records, so this process is essential<\/p><\/blockquote><p>It takes <em>significantly<\/em> less time to perform a semantic find against 1,000 records than it does against 100,000 records, so this process is essential.<\/p><figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"491\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-4.png\" alt=\"\" class=\"wp-image-20657\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-4.png 975w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-4-300x151.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-4-768x387.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-4-600x302.png 600w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/figure><p>Start by identifying some method to eliminate the largest number of records from your initial found set. The biggest tradeoff between efficiency and accuracy may be in how you decide to do this. The idea is to get a smaller set of records that have a higher likelihood of containing the match you\u2019re looking for, in which we can search further using Semantic Find. Some examples:<\/p><ul class=\"wp-block-list\"><li>If you\u2019re looking in a Contacts table, and you know that the contact you\u2019re looking for is in the \u201cBilling\u201d category, then start by&nbsp; reducing your found set to just those records that are type \u201cBilling\u201d.<\/li>\n\n<li>If you\u2019re trying to find a duplicate person named \u201cHanna Beuler\u201d, try reducing your found set to all records with first name \u201cHanna\u201d; the second time around you might search for all records that have a last name \u201cBeuler\u201d; A third round could narrow the set to people born in 1990.<\/li>\n\n<li>Searching in an Organizations table, you could narrow your first round of blocking by City, the second by Postal Code, and a third by OrganizationType.<\/li><\/ul><h2 class=\"wp-block-heading\">Step 2 &#8211; Perform semantic find(s) on each blocking set<\/h2><p>This step is pretty straightforward &#8212; in each found set that results from a blocking step, perform a semantic find on your embedding field. Since you want to compare your start text with the embedding field, you need to create an embedding on the fly of your starting text. In our \u201cHanna Beuler\u201d example, the embeddings in our large table of 100k Person records are created from a concatenated field that consists of First Name, Middle Initial, Last Name, and Date of Birth (in ISO format) that looks like this:<\/p><p><strong>Hannah C Bueler 1990-12-03<\/strong><\/p><p>When we begin our semantic find, we\u2019ll need to take our search string \u201cHanna Beuler, 3\/12\/1990\u201d, and create an embedding out of it using the exact same format we used for our embedding fields, so it looks like this:<\/p><p><strong>Hanna Beuler 1990-03-12<\/strong><\/p><p>The embedding must also be created using the same model you used to create all of the embeddings in your search table. Use the \u2018Insert Embedding\u2019 script step on this single search string, and instead of storing it in a container field in a table, you\u2019ll store this embedding in a global container field:<\/p><figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1017\" height=\"234\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-5.png\" alt=\"\" class=\"wp-image-20660\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-5.png 1017w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-5-300x69.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-5-768x177.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-5-600x138.png 600w\" sizes=\"auto, (max-width: 1017px) 100vw, 1017px\" \/><\/figure><p>\u2026and then use that global field in the semantic find as the \u2018Vector\u2019 parameter in the \u2018Perform Semantic Find\u2019 script step:<\/p><figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1017\" height=\"190\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-6.png\" alt=\"\" class=\"wp-image-20663\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-6.png 1017w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-6-300x56.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-6-768x143.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-6-600x112.png 600w\" sizes=\"auto, (max-width: 1017px) 100vw, 1017px\" \/><\/figure><p>The recommended settings for the \u2018Perform Semantic Find\u2019 step are as follows:<\/p><pre class=\"wp-block-code\"><code>Query by: Vector Data\t\nVector: &#91;your global embedding field]  \/\/this is the search query\nRecord Set: Current Found Set  \/\/this is the set of records reduced by blocking\nTarget field: &#91;your embedding field]  \/\/this is where your embeddings are stored\nReturn count: 10 \/\/this may vary, but keep it relatively small\nCosine similarity condition: greater than\nCosine similarity value: .85 \/\/my tests indicate that &lt; 85% match is not very good\nSave result: $result_variable  \/\/store results in a variable<\/code><\/pre><h2 class=\"wp-block-heading\">Step 3 \u2013 Repeat blocking process<\/h2><p>Now that you\u2019ve searched a single match field for an exact match, performed a semantic find within this found set on the embedded field and stored the results in a variable, repeat this process with a NEW match field. Each time we come up with a reduced found set, we perform a semantic find on this found set, using the same global embedding field, and append the new results to our<strong> $Results <\/strong>variable (skipping over any duplicates).<\/p><p>Our hope is that out of these multiple searches (none of which on their own would yield a strong match), at least one of the found sets will contain a partial match to the actual \u201c<strong>Hannah C Bueler<\/strong>\u201d record, which we can then match to using Semantic Find.<\/p><p>If, after this process, we still haven\u2019t found any close matches, we may assume that either a) any duplicate records are perhaps too different to easily locate, or b) there are no duplicates to be found.<\/p><h2 class=\"wp-block-heading\">Step 4 &#8211; Process the results of the semantic find<\/h2><p>At this point, the process should leave us with a JSON variable containing an array of matches that look something like this:<\/p><pre class=\"wp-block-code\"><code>&#91;\n  {\"recordId\":\"99920\",\"similarity\":0.988967784539129},\n  {\"recordId\":\"87405\",\"similarity\":0.829515648500795},\n  {\"recordId\":\"49850\",\"similarity\":0.802658597565103},\n  {\"recordId\":\"45721\",\"similarity\":0.817432345089549}\n]<\/code><\/pre><p>What you choose to do with these results will depend on your specific use case. Since the purpose of Entity Matching is to search for duplicates, handling them will depend on your business logic. The cosine similarity score tells you how close the vector embeddings were to each other, and gives a good indication of whether you have a likely match. While all Large Language Models will have some variation, of the ones I\u2019ve tested that return the best results (<a href=\"https:\/\/ai.google.dev\/gemini-api\/docs\/embeddings\" target=\"_blank\" rel=\"noreferrer noopener\">Gemini \u201ctext-embedding-004\u201d<\/a>, <a href=\"https:\/\/platform.openai.com\/docs\/guides\/embeddings\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI \u201ctext-embedding-3-small\u201d,<\/a> and <a href=\"https:\/\/huggingface.co\/jinaai\/jina-embeddings-v4-vllm-text-matching\" target=\"_blank\" rel=\"noreferrer noopener\">JinaAI \u201cjina-embeddings-v4-vllm-text-matching\u201d<\/a>), a cosine similarity below ~85 does not seem to be useful at all for entity matching. Given this, you might consider any of the following for those matches that are greater than 0.85 similarity:<\/p><ul class=\"wp-block-list\"><li>Display the matches for the user in a virtual list.<\/li>\n\n<li>Create a join table that connects an initial search record to its matched records in a separate table.<\/li>\n\n<li>Immediately merge duplicate records that have a high enough similarity score.<\/li>\n\n<li>Automatically create a new record if no high matches are returned.<\/li>\n\n<li>Flag records for a manual review later.<\/li><\/ul><h2 class=\"wp-block-heading\">What I did (my proof-of-concept)<\/h2><p>I built and tested an entity matching proof-of-concept in Claris FileMaker 22 to find Persons in a database of ~102k records. The idea is based on a use case in which incoming laboratory results, which contain patient information, need to be matched against a large table of existing Person records for public health investigation.<\/p><figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"843\" height=\"560\" src=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-8.png\" alt=\"\" class=\"wp-image-20669\" srcset=\"https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-8.png 843w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-8-300x199.png 300w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-8-768x510.png 768w, https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/image-8-600x399.png 600w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/figure><h2 class=\"wp-block-heading\">Environment<\/h2><p>The test file has four tables:<\/p><ul class=\"wp-block-list\"><li><strong>Patient<\/strong> \u2013 holds our input data<\/li>\n\n<li><strong>Person<\/strong> \u2013 the full set of Person records to search against<\/li>\n\n<li><strong>Embeddings<\/strong> \u2013 contains vector embeddings of Person records<\/li>\n\n<li><strong>SearchJoin<\/strong> \u2013 a join table to connect the results of our search with the input record<\/li><\/ul><p>There are also a couple of global fields that are in their own table, but they could be located anywhere.<\/p><p>The file is hosted on a FM Server version 22.0.1, which has recently been optimized specifically for performing semantic find on the server. I tried this on earlier server versions and performance was not good, so it\u2019s recommended that you use the latest version of FileMaker server to take advantage of the recent improvements to semantic find. Offloading your searches to the server using PSoS is highly recommended. A typical search using the techniques described here typically takes only a few seconds, but with various adjustments to blocking and server conditions, this could vary.<\/p><p>Starting with a single person record as the \u2018input\u2019 (the Patient record), I search the Person table to locate a likely match. I begin by blocking against four different fields (First Name, Last Name, Date of Birth, and Phone number), and run the semantic find on each smaller found set using an embedding of the combined Full Name and Date of Birth. This yields quite good results, and can even identify likely matches when all three data points (First, Last, DOB) are slightly different! With slight variations in only one or two of these points, performance is remarkable.<\/p><p>When a list of matches is returned that are higher than 85% similarity, I create records in a join table that displays the matching records are their scores for the user. At this point they might choose to merge the two records, or delete one and keep the other.<\/p><h2 class=\"wp-block-heading\">Conclusion<\/h2><p>The Semantic Find features in FileMaker 22, when used as described here, can yield excellent results when used for entity matching as a first step in a data deduplication process. Hopefully, this gives you a starting point to deploy your own \u201cFuzzy Matching\u201d module in your FileMaker systems along the path to solving tough entity matching issues. It can prove to be a true game changer.<\/p><p><\/p>","protected":false},"excerpt":{"rendered":"<p>A technical approach for leveraging Semantic Find to solve the difficult problem of fuzzy matching In part 1 of this article, I described how FileMaker\u2019s Semantic Find differs from its regular find, what Entity Matching is (and why it\u2019s important for FileMaker users), and how Semantic Find can make deduplication tasks easier. Here in part &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/www.directimpactsolutions.com\/en\/filemaker-ai-and-entity-matching-pt-2\/\"> <span class=\"screen-reader-text\">FileMaker AI and Entity Matching \u2013 Pt. 2<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":27,"featured_media":20560,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","footnotes":""},"categories":[331,29],"tags":[],"class_list":["post-20644","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-low-code"],"uagb_featured_image_src":{"full":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search.png",800,350,false],"thumbnail":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search-150x150.png",150,150,true],"medium":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search-300x131.png",300,131,true],"medium_large":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search-768x336.png",768,336,true],"large":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search.png",800,350,false],"1536x1536":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search.png",800,350,false],"2048x2048":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search.png",800,350,false],"woocommerce_thumbnail":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search-300x300.png",300,300,true],"woocommerce_single":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search-600x263.png",600,263,true],"woocommerce_gallery_thumbnail":["https:\/\/www.directimpactsolutions.com\/wp-content\/uploads\/2025\/09\/ai-search-100x100.png",100,100,true]},"uagb_author_info":{"display_name":"Yacine Aklil","author_link":"https:\/\/www.directimpactsolutions.com\/en\/author\/yacine-aklildirectimpactsolutions-com\/"},"uagb_comment_info":0,"uagb_excerpt":"A technical approach for leveraging Semantic Find to solve the difficult problem of fuzzy matching In part 1 of this article, I described how FileMaker\u2019s Semantic Find differs from its regular find, what Entity Matching is (and why it\u2019s important for FileMaker users), and how Semantic Find can make deduplication tasks easier. Here in part&hellip;","_links":{"self":[{"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/posts\/20644","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/comments?post=20644"}],"version-history":[{"count":2,"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/posts\/20644\/revisions"}],"predecessor-version":[{"id":20707,"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/posts\/20644\/revisions\/20707"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/media\/20560"}],"wp:attachment":[{"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/media?parent=20644"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/categories?post=20644"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.directimpactsolutions.com\/en\/wp-json\/wp\/v2\/tags?post=20644"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}