grg_pheno_sim API
grg_pheno_sim is a phenotype simulator for GRGs, or Genotype Representation Graphs
Phenotype Simulation
This file simulates the phenotypes overall by combining the incremental stages of simulation on GRGs.
- grg_pheno_sim.phenotype.add_covariates(grg, covariates, cov_effects, **sim_kwargs)
- Wrapper around sim_phenotypes that adds covariate effects:
Y = genetic_value + covariate_value + environmental_noise
- Parameters:
grg (pygrgl.GRG) – The GRG used for phenotype simulation.
covariates (Union[pandas.DataFrame, numpy.ndarray]) –
Covariate matrix C.
- If DataFrame:
Must have one row per individual.
If it includes ‘individual_id’, merge is done by ID.
Otherwise, row order must match sim_phenotypes output.
- If ndarray:
Shape (n_individuals, n_covariates), row order matches phenotypes.
cov_effects (numpy.typing.ArrayLike) – Coefficient vector α (length must equal number of covariates).
sim_kwargs – Keyword arguments passed directly to sim_phenotypes (heritability, num_causal, normalize_phenotype, etc.).
- Returns:
Same as sim_phenotypes output with two new columns:
covariate_value
phenotype (updated)
- Return type:
pandas.DataFrame
- grg_pheno_sim.phenotype.convert_to_phen(phenotypes_df, path, include_header=False)
This function converts the phenotypes dataframe to a CSV file.
- Parameters:
phenotypes_df – The input pandas dataframe containing the phenotypes.
path – The path at which the CSV file will be saved.
include_header – A boolean parameter that indicates whether headers have to be included. Default: False.
- grg_pheno_sim.phenotype.sim_phenotypes(grg, model=<grg_pheno_sim.model.GRGCausalMutationModelNormal object>, num_causal=None, random_seed=42, normalize_phenotype=False, normalize_genetic_values_before_noise=False, heritability=None, user_mean=None, user_cov=None, normalize_genetic_values_after=False, save_effect_output=False, effect_path=None, standardized_output=False, path=None, header=False, standardized=False)
Function to simulate phenotypes in one go by combining all intermittent stages.
- Parameters:
grg (pygrgl.GRG) – The GRG on which phenotypes will be simulated.
model – The distribution model from which effect sizes are drawn. Depends on the user’s discretion. Default model used is the standard Gaussian.
num_causal – Number of causal sites simulated. Default value used is num_mutations.
random_seed – The random seed used for causal mutation simulation. Default: 42.
normalize_phenotype – Checks whether to normalize the phenotypes. Default: False.
normalize_genetic_values_before_noise – Checks whether to normalize the genetic values prior to simulating environmental noise (True if yes). Depends on the user’s discretion. Default: False.
heritability – Takes in the h2 features to simulate environmental noise (set to None if the user prefers user-defined noise) and 1 is the user wants zero noise.
user_defined_noise_parameters – Parameters used for simulating environmental noise taken in from the user.
normalize_genetic_values_after – In the case where the h2 feature is not used, this checks whether the user wants genetic values normalized at the end (True if yes). Default: False.
save_effect_output – This boolean parameter decides whether the effect sizes will be saved to a .par file using the standard output format. Default: False.
effect_path – This parameter contains the path at which the .par output file will be saved. Default: None.
standardized_output – This boolean parameter decides whether the phenotypes will be saved to a .phen file using the standard output format. Default: False.
path – This parameter contains the path at which the .phen output file will be saved. Default: None.
header – This boolean parameter decides whether the .phen output file contains column headers or not. Default: False.
standardized – This boolean parameters decides whether the simulation uses standardized genotypes.
- Returns:
Pandas dataframe with resultant phenotypes. The dataframe contains the following:
causal_mutation_id
individual_id
genetic_value
environmental_noise
phenotype
Binary Phenotype Simulation
This file simulates binary phenotypes on GRGs by using the usual simulation methods and then converting continuous phenotypes to binary phenotypes. =======
- grg_pheno_sim.binary_phenotype.sim_binary_phenotypes(grg, population_prevalence, model=<grg_pheno_sim.model.GRGCausalMutationModelNormal object>, num_causal=1000, random_seed=42, normalize_genetic_values_before_noise=False, heritability=None, user_mean=None, user_cov=None, normalize_genetic_values_after=False, save_effect_output=False, effect_path=None, standardized_output=False, path=None, header=False, standardized=False)
Function to simulate phenotypes in one go by combining all intermittent stages. Since the function simulates binary phenotypes, we add a Gaussian threshold check at the very end to convert continuous values to binary values.
- Parameters:
grg (pygrgl.GRG) – The GRG on which phenotypes will be simulated.
model – The distribution model from which effect sizes are drawn. Depends on the user’s discretion.
num_causal – Number of causal sites simulated.
population_prevalence – The prevalence of the condition in the general population. 0.1 means 1 in 10 individuals have the condition.
random_seed – The random seed used for causal mutation effect simulation.
normalize_genetic_values_before_noise – Checks whether to normalize the genetic values prior to simulating environmental noise (True if yes). Depends on the user’s discretion. Default: False.
heritability – Takes in the h2 features to simulate environmental noise (set to None if the user prefers user-defined noise) and 1 is the user wants zero noise.
user_mean – Mean parameter used for simulating environmental noise taken in from the user.
user_cov – Covariance parameter used for simulating environmental noise taken in from the user.
normalize_genetic_values_after – In the case where the h2 feature is not used, this checks whether the user wants genetic values normalized at the end (True if yes). Default: False.
save_effect_output – This boolean parameter decides whether the effect sizes will be saved to a .par file using the standard output format. Default value is False.
effect_path – This parameter contains the path at which the .par output file will be saved. Default: None.
standardized_output – This boolean parameter decides whether the phenotypes will be saved to a .phen file using the standard output format. Default value is False.
path – This parameter contains the path at which the .phen output file will be saved. Default: None.
header – This boolean parameter decides whether the .phen output file contains column headers or not. Default: False.
standardized – This boolean parameters decides whether the simulation uses standardized genotypes.
- Returns:
Pandas dataframe with resultant binary phenotypes. The dataframe contains the following:
causal_mutation_id
individual_id
genetic_value
environmental_noise
phenotype
- grg_pheno_sim.binary_phenotype.sim_binary_phenotypes_custom(grg, input_effects, population_prevalence, random_seed=42, normalize_genetic_values_before_noise=False, heritability=None, user_mean=None, user_cov=None, normalize_genetic_values_after=False, save_effect_output=False, effect_path=None, standardized_output=False, path=None, header=False, standardized=False)
Function to simulate phenotypes in one go by combining all intermittent stages. This function accepts custom effect sizes instead of simulating them using the causal mutation models. Since the function simulates binary phenotypes, we add a Gaussian threshold check at the very end to convert continuous values to binary values.
- Parameters:
grg (pygrgl.GRG) – The GRG on which phenotypes will be simulated.
input_effects – The custom effect sizes dataset.
population_prevalence – The prevalence of the condition in the general population. 0.1 means 1 in 10 individuals have the condition.
normalize_genetic_values_before_noise – Checks whether to normalize the genetic values prior to simulating environmental noise (True if yes). Depends on the user’s discretion. Default: False.
heritability – Takes in the h2 features to simulate environmental noise (set to None if the user prefers user-defined noise) and 1 is the user wants zero noise.
user_defined_noise_parameters – Parameters used for simulating environmental noise taken in from the user.
normalize_genetic_values_after – In the case where the h2 feature is not used, this checks whether the user wants genetic values normalized at the end (True if yes). Default: False.
save_effect_output – This boolean parameter decides whether the effect sizes will be saved to a .par file using the standard output format. Default: False.
effect_path – This parameter contains the path at which the .par output file will be saved. Default: None.
standardized_output – This boolean parameter decides whether the phenotypes will be saved to a .phen file using the standard output format. Default: False.
path – This parameter contains the path at which the .phen output file will be saved. Default: None.
header – This boolean parameter decides whether the .phen output file contains column headers or not. Default: False.
- Returns:
Pandas dataframe with resultant binary phenotypes. The dataframe contains the following:
causal_mutation_id
individual_id
genetic_value
environmental_noise
phenotype
Simulation with Multiple GRG Files
This module simulates phenotypes on multiple GRGs by using the usual simulation methods.
- grg_pheno_sim.multi_grg_phenotype.sim_phenotypes_multi_grg_ram(grg_files, model, num_causal_per_file, random_seed, normalize_phenotype, normalize_genetic_values_before_noise, population_prev, heritability, user_mean, user_cov, normalize_genetic_values_after, save_effect_output, effect_path_list, standardized_output, path, header)
Simulate phenotypes by loading all GRGs into RAM simultaneously.
- Parameters:
grg_files (List[str]) – List of paths to GRG files to be processed.
model – The distribution model from which effect sizes are drawn. Depends on the user’s discretion.
num_causal_per_file – Number of causal sites simulated for each file (same for each GRG).
random_seed – The random seed used for causal mutation simulation.
normalize_phenotype – Checks whether to normalize the phenotypes. Default: False.
normalize_genetic_values_before_noise – Checks whether to normalize the genetic values prior to simulating environmental noise (True if yes). Depends on the user’s discretion. Default: False.
heritability – Takes in the h2 features to simulate environmental noise (set to None if the user prefers user-defined noise) and 1 is the user wants zero noise.
user_defined_noise_parameters – Parameters used for simulating environmental noise taken in from the user.
normalize_genetic_values_after – In the case where the h2 feature is not used, this checks whether the user wants genetic values normalized at the end (True if yes). Default: False.
save_effect_output – This boolean parameter decides whether the effect sizes will be saved to a .par file using the standard output format. Default: False.
effect_path – This parameter contains the path at which the .par output file will be saved. Default: None.
standardized_output – This boolean parameter decides whether the phenotypes will be saved to a .phen file using the standard output format. Default: False.
path – This parameter contains the path at which the .phen output file will be saved. Default: None.
header – This boolean parameter decides whether the .phen output file contains column headers or not. Default value is False.
- grg_pheno_sim.multi_grg_phenotype.sim_phenotypes_multi_grg_sequential(grg_files, model, num_causal_per_file, random_seed, normalize_phenotype, normalize_genetic_values_before_noise, population_prev, heritability, user_mean, user_cov, normalize_genetic_values_after, save_effect_output, effect_path_list, standardized_output, path, header)
Simulate phenotypes by processing GRGs sequentially to reduce memory usage.
- Parameters:
grg_files (List[str]) – List of paths to GRG files to be processed
model – The distribution model from which effect sizes are drawn. Depends on the user’s discretion.
num_causal_per_file – Number of causal sites simulated for each file (same for each GRG).
random_seed – The random seed used for causal mutation simulation.
normalize_phenotype – Checks whether to normalize the phenotypes. Default: False.
normalize_genetic_values_before_noise – Checks whether to normalize the genetic values prior to simulating environmental noise (True if yes). Depends on the user’s discretion. Default: False.
heritability – Takes in the h2 features to simulate environmental noise (set to None if the user prefers user-defined noise) and 1 is the user wants zero noise.
user_defined_noise_parameters – Parameters used for simulating environmental noise taken in from the user.
normalize_genetic_values_after – In the case where the h2 feature is not used, this checks whether the user wants genetic values normalized at the end (True if yes). Default: False.
save_effect_output – This boolean parameter decides whether the effect sizes will be saved to a .par file using the standard output format. Default value is False.
effect_path – This parameter contains the path at which the .par output file will be saved. Default: None.
standardized_output – This boolean parameter decides whether the phenotypes will be saved to a .phen file using the standard output format. Default: False.
path – This parameter contains the path at which the .phen output file will be saved. Default: None.
header – This boolean parameter decides whether the .phen output file contains column headers or not. Default: False.