Introduction

The Semantic Filter Pattern aims to use the LLM to filter, from a generic text, a subset of information. The filter can be applied either to “filter in” so keeping, or “filtering out”, so removing the specific content. The idea is that:

  1. The LLM will receive a filter rule
  2. The LLM will receive a text
  3. The outcome of the LLM will be either the filtered text or the “remaining” text if the request is removing the filtered content.

The Prompt

The prompt is based on a contextual statement given before any input that can be summarised in

Filter this information to remove [RULE]

when the intention is removing the filtered content

Or, alternately

Filter this information to keep only [RULE]

when the intention is keeping the filtered content

or, alternately

Filter this information to [DO SOME ACTION] based on [RULE]

if the request is altering the data.

Some examples

The pattern can be used to keep or to remove a set of personal data. As example, in case of removal or alteration

Clean and anonymize the following data by removing all personally identifiable information and other details that could lead to identification. Remove any sentitive information linked with healtcare or private data Remove all the addresses and the location of the customers

or, in case of keeping

Keep only the critical feedback from the customers Keep only the jokes from this text Keep only the quotes of famous people from this text

and so on. What is important is that the rule is specificically defined for the domain associated on the main text.

As example, in healthcare, the focus will be on diseases, personal data and private NIN numbers, while in a sales document focus of the filter will be on customers’s data and revenues.

How to use the prompt

First create the filter rule, either targeting to keep or remove some information but absolutely tailored to the domain of the data that needs to be filtered.

The give the instruction to the LLM to apply the filter and compare the results

Some Tests

To run some tests we need some data. As example the following (fake) list of patients, with bio and disease will provide the basic information.

  1. Maria Thompson (Age 45) – Type 2 Diabetes Maria works as a school administrator and enjoys gardening and cooking for her family. She lives in a suburban neighborhood and is actively trying to improve her lifestyle through daily walks and healthier eating habits.
  2. James Carter (Age 62) – Hypertension James is a retired factory supervisor who now spends his time fishing and woodworking. He lives with his wife and enjoys quiet mornings reading the newspaper and tending to his backyard plants.
  3. Aisha Khan (Age 29) – Asthma Aisha is a graphic designer who works remotely and loves digital art and photography. She lives in an apartment in the city and enjoys yoga, journaling, and café-hopping on weekends.
  4. Robert Miller (Age 54) – Chronic Obstructive Pulmonary Disease (COPD) Robert is a former construction worker who now runs a small home repair business. He enjoys watching classic movies, fixing old radios, and spending time with his grandchildren.
  5. Elena Garcia (Age 37) – Rheumatoid Arthritis Elena is a community librarian who loves organizing reading programs for children. She enjoys painting, nature walks, and volunteering at local cultural events.
  6. David Wilson (Age 71) – Alzheimer’s Disease David is a retired history teacher who once traveled extensively across Europe. He now lives with his daughter and enjoys listening to classical music and looking through old photo albums.
  7. Nina Patel (Age 24) – Iron Deficiency Anemia Nina is a university student studying biomedical engineering. She enjoys volunteering, participating in student clubs, and exploring new cafés around campus.
  8. Michael Brown (Age 48) – Coronary Artery Disease Michael runs a small logistics company and spends most of his free time coaching his son’s soccer team. He enjoys road trips and restoring old cars in his garage.
  9. Sophia Lee (Age 33) – Hypothyroidism Sophia is a freelance translator who works with multiple international clients. She enjoys language learning, journaling, and practicing calligraphy in her free time.
  10. Omar Hassan (Age 58) – Chronic Kidney Disease (Stage 3) Omar owns a neighborhood grocery store and is well known in his community. He enjoys early morning walks, reading historical novels, and spending evenings with his family.

While the filtering rule will be:

  1. Anonymize the data by removing all personally identifiable information and other details that could lead to identification of the subjects

  2. Keep only the critical diseases from the following data

Both rules will be given to the LLM before receiving the list above.

ChatGPT 5.2

giving the list and the first rule to chatGPT the result is the following

Here’s an anonymized version of your data. I’ve removed names, specific occupations, exact locations, and other identifying personal details >while keeping the relevant medical information and general lifestyle context:

  1. Subject 1 (Age 45) – Type 2 Diabetes The individual works in an administrative role and enjoys gardening and cooking. They live in a suburban area and are actively improving their lifestyle through daily walks and healthier eating habits.
  2. Subject 2 (Age 62) – Hypertension The individual is retired and spends time on hobbies like fishing and woodworking. They live with family and enjoy quiet mornings reading and tending to their backyard.
  3. Subject 3 (Age 29) – Asthma The individual works remotely in a creative field and enjoys digital art and photography. They live in an urban apartment and like yoga, journaling, and visiting cafés on weekends.
  4. Subject 4 (Age 54) – Chronic Obstructive Pulmonary Disease (COPD) The individual previously worked in construction and now runs a small home-based business. They enjoy watching classic movies, fixing electronics, and spending time with grandchildren.
  5. Subject 5 (Age 37) – Rheumatoid Arthritis The individual works in a community-focused role and enjoys organizing programs for children. They also like painting, nature walks, and volunteering at local cultural events.
  6. Subject 6 (Age 71) – Alzheimer’s Disease The individual is retired from teaching and previously traveled extensively. They live with family and enjoy listening to classical music and looking through old photo albums.
  7. Subject 7 (Age 24) – Iron Deficiency Anemia The individual is a university student studying biomedical engineering. They enjoy volunteering, participating in student clubs, and exploring new cafés around campus.
  8. Subject 8 (Age 48) – Coronary Artery Disease The individual runs a small business and spends free time coaching youth sports. They enjoy road trips and restoring old vehicles.
  9. Subject 9 (Age 33) – Hypothyroidism The individual works as a freelance translator. They enjoy language learning, journaling, and practicing calligraphy.
  10. Subject 10 (Age 58) – Chronic Kidney Disease (Stage 3) The individual owns a local store and is active in the community. They enjoy morning walks, reading, and spending time with family. If you want, I can also generalize the ages into ranges and remove even occupations to make it fully de-identified for research purposes. >This would make it extremely difficult to trace back to anyone.

So the outcome is a quite anonymous list. Interesting the note that is asking if we want to push further the semantic filtering.

On the second test, the one linked with keeping the content, with the same list, the answer is

  1. Maria Thompson 45 Type 2 Diabetes School administrator Gardening, cooking, daily walks, healthier eating
  2. James Carter 62 Hypertension Retired factory supervisor Fishing, woodworking, reading, gardening
  3. Robert Miller 54 COPD Home repair business Classic movies, fixing radios, grandchildren
  4. David Wilson 71 Alzheimer’s Disease Retired history teacher Classical music, photo albums, family
  5. Michael Brown 48 Coronary Artery Disease Logistics company owner Coaching soccer, road trips, restoring cars
  6. Omar Hassan 58 Chronic Kidney Disease (Stage 3) Grocery store owner Morning walks, historical novels, family time

So a list of patiens with, according to the LLM, critical diseases.

Claude - Haiku 4.5

Claude Haiku 4.5, with the same list and the first filter is answering:

  1. Subject P001 – Age Group: 40s – Type 2 Diabetes Administrative role. Engages in outdoor activities and cooking. Attempting lifestyle modifications through regular exercise and dietary changes.
  2. Subject P002 – Age Group: 60s – Hypertension Retired from supervisory position. Leisure activities include fishing and woodworking. Living with spouse. Enjoys reading and gardening.
  3. Subject P003 – Age Group: 20s – Asthma Design professional working remotely. Creative hobbies include digital art and photography. Urban dwelling. Physical activities include yoga and journaling.
  4. Subject P004 – Age Group: 50s – Chronic Obstructive Pulmonary Disease (COPD) Former construction worker. Currently self-employed in repair services. Hobbies include classic films and electronics restoration. Has grandchildren.
  5. Subject P005 – Age Group: 30s – Rheumatoid Arthritis Works in library services. Organizes community programs. Engages in artistic pursuits, outdoor activities, and volunteer work.
  6. Subject P006 – Age Group: 70s – Alzheimer’s Disease Retired educator. Previously traveled internationally. Currently lives with adult family member. Enjoys classical music and reminiscence activities.
  7. Subject P007 – Age Group: 20s – Iron Deficiency Anemia Student in biomedical field. Participates in volunteer work and student organizations. Enjoys social activities and exploration of local venues.
  8. Subject P008 – Age Group: 40s – Coronary Artery Disease Self-employed in transportation/logistics sector. Engages in youth coaching activities. Leisure interests include travel and vehicle restoration.
  9. Subject P009 – Age Group: 30s – Hypothyroidism Language services professional. Hobbies include language learning, writing, and artistic pursuits.
  10. Subject P010 – Age Group: 50s – Chronic Kidney Disease (Stage 3) Business owner in retail/grocery sector. Morning exercise routine. Enjoys reading and family time.Haiku

It is interesting that Claude returns also the specific fields that haven been semantically filtered

Filtered

  • All names (replaced with Subject IDs: P001-P010)
  • Specific ages (converted to age groups like “40s,” “60s”)
  • Specific job titles and company details (generalized to role types)
  • Geographic identifiers (suburban neighborhood, city, campus, community references)
  • Family relationships and specific details (daughter, wife, grandchildren)
  • Hobbies that could narrow identification (e.g., “school administrator” → “Administrative role”)
  • Location-specific activities (reading newspaper, café names)

Preserved:

  • Medical diagnoses (essential for healthcare purposes)
  • General lifestyle and activity levels
  • Functional information relevant to health management
  • Age ranges (useful for epidemiological analysis)
  • Broad occupational categories

On the second test the answer is the following

Based on medical severity and urgency, here are the critical diseases from the dataset:

  1. David Wilson (Age 71) – Alzheimer’s Disease - Neurodegenerative disease with progressive cognitive decline and high mortality risk
  2. Robert Miller (Age 54) – Chronic Obstructive Pulmonary Disease (COPD) - Serious respiratory condition with significant mortality risk and quality of life impact
  3. Michael Brown (Age 48) – Coronary Artery Disease - Life-threatening cardiovascular condition with high risk of heart attack and sudden death
  4. Omar Hassan (Age 58) – Chronic Kidney Disease (Stage 3) - Progressive condition that can lead to kidney failure and require dialysis or transplantation

Interesting thing, in this case, the result is slighty different from ChatGPT.

Mistral

Same data and same questions to Mistral, and the following table for the first filter:

Anonymized Subject Data

Subject ID Age Condition Occupation Type Hobbies/Interests (Generalized) Living Situation (Generalized)
001 45 Type 2 Diabetes Administrative Outdoor activities, cooking Suburban area
002 62 Hypertension Retired (supervisory) Crafts, reading, gardening With spouse
003 29 Asthma Creative/Remote Work Art, wellness activities, social outings Urban apartment
004 54 Chronic Obstructive Pulmonary Disease (COPD) Skilled Trade/Small Business Classic entertainment, hands-on hobbies, family time Home-based
005 37 Rheumatoid Arthritis Public Service Creative arts, outdoor activities, community involvement Urban area
006 71 Alzheimer’s Disease Retired (Education) Music, reminiscing With family
007 24 Iron Deficiency Anemia Student Volunteering, social activities, exploring local venues Campus area
008 48 Coronary Artery Disease Business Owner Sports coaching, automotive hobbies, travel Suburban area
009 33 Hypothyroidism Freelance/Linguistic Language arts, writing, creative hobbies Urban area
010 58 Chronic Kidney Disease (Stage 3) Retail/Small Business Walking, reading, family time Community-focused

Provides a table with the filtered and anonymized data.

On the second filter

Critical Diseases

Name Age Critical Disease
Robert Miller 54 Chronic Obstructive Pulmonary Disease (COPD)
David Wilson 71 Alzheimer’s Disease
Michael Brown 48 Coronary Artery Disease
Omar Hassan 58 Chronic Kidney Disease (Stage 3)

the answer of Mistral regarding the critical diseases is the same of Claude.

Conclusions

The semantic filter pattern aims to identify, keep, alter or remove some information from data according to a well define rule in the filter itself. While the removal of information seems to be a easier filter, the keeping must be really well-defined. In any case the result must be checked as the LLM could not get the exact semantic meaning suggested.