Semantic segmentation is crucial for accurately identifying anatomical structures and pathological anomalies in medical images, playing a vital role in diagnostics, treatment planning, and disease progression monitoring. Despite significant advancements, the development of flexible and generalizable algorithms that can adapt to the diverse shapes, sizes, and textures of various anatomical regions remains challenging. In this work, we introduce the Dynamic ENTity Segmentation (DiENTeS) model, which leverages Local-global Transformers for 3D medical segmentation. Our model utilizes a transformer-based backbone to extract localized features and propagate them to form a comprehensive global representation. Additionally, we incorporate language features to guide the segmentation process, enabling the generation of specialized convolutional kernels for each category. This approach allows DiENTeS to tackle semantic segmentation as a class-agnostic entity segmentation problem. We validate our method using the ToothFairy2 Challenge, demonstrating its effectiveness in segmenting multiple structures in the maxillofacial region. We will make our code and models publicly available.