Skip to content

Adding support for Spacewalk files in tadbit model #393

@iagomaceda

Description

@iagomaceda

Hi,

It would be nice to add support for the Spacewalk binary files as an output for 'tadbit model'.

I already coded the function needed using the ensembl of models as a base:

def write_spacewalk(model_ensemble : StructuralModels, sw_filename : str):
    """
    This function writes a spacewalk file from a model ensemble.

    Parameters
    ----------
    model_ensemble : StructuralModels
        A model ensemble object.
    sw_filename : str
        The name of the spacewalk file to be written.
    """

    region_chr = model_ensemble.description["chromosome"]
    region_start = model_ensemble.description["start"]
    region_reso = model_ensemble.description["resolution"]

    metadata = {"format" : "sw1",
                'version' : '1.0.0', 
                'author' : 'tadbit',
                'date' : str(datetime.now()),
                "name" : f"{model_ensemble.description['project']}_{model_ensemble.description['identifier']}",
                "genome" : str(model_ensemble.description['assembly'])}

    with h5py.File(sw_filename, 'w') as swbf:

        header_group = swbf.create_group('Header')
        header_group.attrs.update(metadata)
        header_group.attrs['point_type'] = 'single_point'

        root = swbf.create_group(metadata['name'])
        genomic_position_group = root.create_group('genomic_position')
        spatial_position_group = root.create_group('spatial_position')

        region_list = [[region_chr, str(region_start + (i - 1) * region_reso), str(region_start + i * region_reso)] for i in range(0, model_ensemble.nloci + 1)]

        genomic_position_group.create_dataset('regions', data=region_list)

        live_contact_map_vertices = np.empty((0, 3), dtype=float)
        
        for model_index, model in enumerate(model_ensemble, start=0):
            
            xyz = np.array([model['x'], model['y'], model['z']], dtype=float).T
            live_contact_map_vertices = np.vstack((live_contact_map_vertices, xyz))

            dataset_name = f't_{model_index}'
            spatial_position_group.create_dataset(dataset_name, data=xyz)

        root.create_dataset('live_contact_map_vertices', data=live_contact_map_vertices)

If you could add it (and the argument to get it) it would be very nice. Feel free to change variable names to fit the rest of the code.

Thanks,

Iago

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions