LiftOver plink

Function Definition

Convert plink files between various assemblies.
This function will take a set of text plink files (ped + map) and apply liftover to it.

Things to consider

You need a set of text files. Use plink --bfile binary_file --recode --out text_file to do so. You need your ped and map files to be named the same.
You need to install liftover and plink either from the official websites here and here or through conda (with the bioconda channel): use conda install ucsc-liftover plink
You need the appropriate chain file:
The new files are located in the same folder as the old files and with the suffix “liftedover”

How does it work

This script works by:

Creating a bed file with all the position from the MAP file
LiftOver the bed file created in step 1 and create a new lifted bed file.
Generate a new set of plink files. (And remove SNPs that failed to liftover)
Update the map file generated in step 3 with the position from the bed files generated in step 2
Remove temporary files if asked

import pandas as pd 
from subprocess import run
from os import remove

def liftover_plink(MAP_FILE, CHAIN_FILE, remove_tmp = True):
    # READ OLD MAP FILE
    old_map_df = pd.read_csv(MAP_FILE, sep="\t", header=None)

    #CREATE THE NEW BED FILE TO BE LIFTED OVER
    new_bed_file = old_map_df[[0, 3, 3, 1]]
    new_bed_file.columns = ["#chr", "start", "stop", "rsid"]
    new_bed_file["stop"] = new_bed_file["stop"]+1 # NEED A RANGE OR IT DOESNT WORK

    number_of_initial_probes = new_bed_file.shape[0]

    # CHECK CHROMOSOME FORMAT 
    # UCSC NEEDS THE chr PREFIX
    if isinstance(new_bed_file["#chr"].loc[0], str): # IF THE NAME IS A STRING
        if "chr" in new_bed_file["#chr"].loc[0]:
            pass
        else:
            new_bed_file["#chr"] = "chr" + new_bed_file["#chr"].astype(str)
            print("Added chr to chromosome/seq/contig names")
    else: 
        new_bed_file["#chr"] = "chr" + new_bed_file["#chr"].astype(str)
        print("Added chr to chromosome/seq/contig names")

    new_bed_file.to_csv("tmp_before_bed.bed", index = None, sep = "\t")

    # RUN LIFTOVER FIRST
    run(["liftover", "tmp_before_bed.bed", CHAIN_FILE, "tmp_after_bed.bed", "unmapped.txt"], capture_output=False)

    # GET THE FAILED ONES 
    failed_positions = []
    with open("unmapped.txt","r") as infile:
        with open("failed_probes.txt", "w") as outfile:
            for line in infile.readlines():
                line = line.strip()
                if not line.startswith("#"):
                    failed_positions.append(line.split("\t")[-1])
                    outfile.write(line.split("\t")[-1]+"\n")
    print("{} out of {} ({:.2f}%) position failed to liftover".format(len(failed_positions), number_of_initial_probes, (len(failed_positions)/ number_of_initial_probes) * 100))

    # Remove Failed probes using plink
    print("Removing failed probes with plink an generating new files")
    run(["plink", "--file", MAP_FILE.replace(".map", ""), "--exclude", "failed_probes.txt" ,"--recode", "--out", MAP_FILE.replace(".map", ".liftedover")], capture_output=True)

    # READ LIFTED BED 
    lifted_bed_df = pd.read_csv("tmp_after_bed.bed", sep="\t", header = None)
    lifted_dict = dict(zip(lifted_bed_df[3], lifted_bed_df[1]))

    # UPDATE FINAL MAP FILE 
    print("Updating MAP file")
    final_map_df = pd.read_csv(MAP_FILE.replace(".map", ".liftedover.map"), sep="\t", header = None)
    final_map_df[3] = final_map_df[1].map(lifted_dict) 
    final_map_df.to_csv(MAP_FILE.replace(".map", ".liftedover.map"), sep="\t", header = None, index = None)

    # REMOVING TMP FILES
    if remove_tmp:
        print("Removing temporary files")
        remove("tmp_before_bed.bed")
        remove("tmp_after_bed.bed")
        remove("unmapped.txt")
        remove("failed_probes.txt")

Example Usage

Use these as examples:

liftover_plink(CHAIN_FILE = "hg18ToHg19.over.chain.gz", # The chain file between assemblies
               MAP_FILE = "example_file.hg18.map", # the map file WITH MAP EXTENSION
               remove_tmp = True) # Removing or not the temporary files