UK Biobank Pharma Proteomics Project
Data Source
Section titled “Data Source”ID: ukb_ppp
URL: https://doi.org/10.7303/syn51364943
License: CC BY
Citation:
Datasets
Section titled “Datasets”Ancestries
Section titled “Ancestries”AFR= AfricanAMR= AmericanALL= All ancestriesCSA= Central-South AsianEAS= East AsianEUR= EuropeanMID= Middle Eastern
Harmonization
Section titled “Harmonization”is_palindromic | in_dbsnp | reported_strand | was_flipped | Outcome |
|---|---|---|---|---|
False | True | 'forward' | False | Non-palindromic, effect_allele == alt_allele and other_allele == ref_allele -> no changes. |
False | True | 'forward' | True | Non-palindromic, effect_allele == ref_allele and other_allele == alt_allele -> swap alleles, flip beta and effect_allele_frequency. |
False | True | 'reverse' | False | Non-palindromic, reverse_complement(effect_allele) == alt_allele and reverse_complement(other_allele) == ref_allele -> take reverse complement of alleles, beta and effect_allele_frequency unchanged. |
False | True | 'reverse' | True | Non-palindromic, reverse_complement(effect_allele) == ref_allele and reverse_complement(other_allele) == alt_allele -> swap and take reverse complement of alleles, flip beta and effect_allele_frequency. |
False | False | NULL | NULL | Non-palindromic, no variant match in reference -> unharmonized. Alleles, beta, and effect_allele_frequency unchanged. Proceed with caution. |
True | True | 'forward' (inferred from non-palindromic consensus) | False | Palindromic, effect_allele == alt_allele and other_allele == ref_allele -> no changes. Strand is inferred rather than known definitively, proceed with slight caution. |
True | True | 'forward' (inferred from non-palindromic consensus) | True | Palindromic, effect_allele == ref_allele and other_allele == alt_allele -> swap alleles, flip beta and effect_allele_frequency. Strand is inferred rather than known definitively, proceed with slight caution. |
True | True | 'reverse' (inferred from non-palindromic consensus) | False | Palindromic, reverse_complement(effect_allele) == alt_allele and reverse_complement(other_allele) == ref_allele -> take reverse complement of alleles, beta and effect_allele_frequency unchanged. Strand is inferred rather than known definitively, proceed with slight caution. |
True | True | 'reverse' (inferred from non-palindromic consensus) | True | Palindromic, reverse_complement(effect_allele) == ref_allele and reverse_complement(other_allele) == alt_allele -> swap and take reverse complement of alleles, flip beta and effect_allele_frequency. Strand is inferred rather than known definitively, proceed with slight caution. |
True | False | NULL | NULL | Palindromic, no variant match in reference -> unharmonized. Alleles, beta, and effect_allele_frequency unchanged. Proceed with caution. |
assays
Section titled “assays”| Column | Type | Description |
|---|---|---|
partition | INT | Dummy partition column to enable compatibility with Cloudflare R2 SQL API. Ignore. |
ukb_ppp_id | TEXT | Unique assay identifier defined by UKB-PPP. |
olink_id | TEXT | Olink platform assay identifier (e.g. OID30809). |
gene_symbol | TEXT | HGNC gene symbol of the gene coding the target protein (e.g. GLP1R, APOE, TNF). |
panel | TEXT | Olink panel containing the assay. |
panel_lot | TEXT | Olink lot identifier of the panel. |
dilution_factor | INT | Dilution factor of the assay. |
block | TEXT | Identifier of the 96-well block within the 384-plex Olink panel. |
in_expansion_set | BOOLEAN | Whether the assay was part of the original set (false) or added in the expansion set (true). |
protein_id | TEXT | UniProt accession (e.g. P43220, P02649), or multiple accessions joined by underscore for multi-chain protein complexes. |
filename | TEXT | Name of the tar archive file containing pQTL results for the assay. |
| Column | Type | Description |
|---|---|---|
partition | INT | Dummy partition column to enable compatibility with Cloudflare R2 SQL API. Ignore. |
gene_id | TEXT | Ensembl gene identifier (e.g. ENSG00000112164). |
gene_symbol | TEXT | HGNC gene symbol (e.g. GLP1R, APOE). |
chromosome | TEXT | Chromosome on which the gene is located. |
start_position | INT | Gene start position in assembly GRCh38 coordinates. |
end_position | INT | Gene end position in assembly GRCh38 coordinates. |
strand | TEXT | Strand of the gene. |
| Column | Type | Description |
|---|---|---|
ancestry | TEXT | Population ancestry code. Partition column. |
protein_id | TEXT | UniProt accession. Partition column. |
panel | TEXT | Olink panel containing the assay. Partition column. |
chromosome | TEXT | Chromosome on which the variant is located. |
position | INT | Variant position in assembly GRCh38 coordinates. |
effect_allele | TEXT | Effect allele. Harmonized to equal the dbSNP alt_allele. |
other_allele | TEXT | Non-effect allele. Harmonized to equal the dbSNP ref_allele. |
beta | FLOAT | Effect size estimate. |
standard_error | FLOAT | Standard error of the effect size estimate. |
effect_allele_frequency | FLOAT | Frequency of the effect allele in the ancestry group. |
neg_log_10_p_value | FLOAT | -log10(p-value). Higher = more significant. Genome-wide significance threshold is ~7.3. |
variant_id | TEXT | Unique UKB-PPP variant identifier (e.g. 4:180574458:A:G:imp:v1). |
info | FLOAT | Imputation quality score (0-1, higher is better). |
n | INT | Sample size. |
chi_squared | FLOAT | Chi-squared test statistic. |
proteins
Section titled “proteins”| Column | Type | Description |
|---|---|---|
partition | INT | Dummy partition column to enable compatibility with Cloudflare R2 SQL API. Ignore. |
protein_id | TEXT | UniProt accession (e.g. P43220, P02649), or multiple accessions joined by underscore for multi-chain protein complexes. |
protein_name | TEXT | Full protein name (e.g. Glucagon-like peptide 1 receptor). NOT a gene symbol — do not search for gene names here. |
uniprot_accessions | TEXT[] | List of constituent UniProt accessions. A single-element list, or a multi-element list for multi-chain protein complexes. Cannot be filtered in queries (array type). |
variants
Section titled “variants”| Column | Type | Description |
|---|---|---|
variant_id | TEXT | Unique UKB-PPP variant identifier (e.g. 4:180574458:A:G:imp:v1). |
rsid | TEXT, nullable | dbSNP rsID (e.g. rs123456). May be null for novel variants. |
chromosome | TEXT | Chromosome on which the variant is located. Partition column. |
position_grch37 | INT | Variant position in assembly GRCh37 coordinates. |
position_grch38 | INT | Variant position in assembly GRCh38 coordinates. |
effect_allele | TEXT | Effect allele of pQTL results. Harmonized to equal the dbSNP alt_allele. |
other_allele | TEXT | Non-effect allele of pQTL results. Harmonized to equal the dbSNP ref_allele. |
strand | TEXT, nullable | Strand of the variant. |
is_palindromic | BOOLEAN | Variant alleles are A/T or C/G. These require specific treatment during harmonization. |
dbsnp_build | TEXT | Version of dbSNP used to harmonize variant alleles (e.g. b156). |
in_dbsnp | BOOLEAN | Variant has a match in dbSNP (allowing for swapping of alleles). |
was_flipped | BOOLEAN, nullable | Variant was flipped relative to source UKB-PPP pQTL files during harmonization. |