Proteomic techniques now measure thousands of proteins circulating in blood at population scale, but successful translation into clinically useful protein biomarkers is hampered by our limited understanding of their origins. Here, we use machine learning to systematically identify a median of 20 factors (range: 1-37) out of >1800 participant and sample charateristics that jointly explained an average of 19.4% (max. 100.0%) of the variance in plasma levels of ~3000 protein targets among 43,240 individuals. Proteins segregated into distinct clusters according to their explanatory factors, with modifiable characteristics explaining more variance compared to genetic variation (median: 10.0% vs 3.9%), and factors being largely consistent across the sexes and ancestral groups. We establish a knowledge graph that integrates our findings with genetic studies and drug characteristics to guide identification of potential drug target engagement markers. We demonstrate the value of our resource by identifying disease-specific biomarkers, like matrix metalloproteinase 12 for abdominal aortic aneurysm, and by developing a widely applicable framework for phenotype enrichment (R package: https://github.com/comp-med/r-prodente ). All results are explorable via an interactive web portal ( https://omicscience.org/apps/prot_foundation ).