Bomshteyn Consulting

Extracting Hyperlink URL and TEXT into separate columns in excel

Imagine you're working with an Excel cell containing a hyperlink. This link might display inviting text like "Click Here," yet it directs you to a website, for instance, "https://www.bomshteyn.com", upon clicking.

Suppose you're interested in extracting this URL and anchor text as separate columns using an Excel formula. It's important to note that, as far as I'm aware, Excel doesn't offer a native function specifically for this task.

But this can be achieved using this small snippet of Python:

# Load the Excel file
new_file_path = '/mnt/data/test.xlsx'
new_wb = load_workbook(new_file_path)
new_sheet = new_wb.active

# Reinitialize lists to store publication names and URLs
new_publication_names = []
new_publication_urls = []

# Iterating through the cells in column B (excluding the header row) of the new file
for row in new_sheet.iter_rows(min_row=2, max_col=2):
    cell = row[0]  # Only the first column in the row
    new_publication_names.append(cell.value)
    new_publication_urls.append(cell.hyperlink.target if cell.hyperlink else None)

# Creating a new DataFrame with the extracted data
new_extracted_df = pd.DataFrame({
    'PUBLICATION_NAME': new_publication_names,
    'PUBLICATION_URL': new_publication_urls
})

new_extracted_df.head()

# Saving the extracted data to a new Excel file
output_file_path = '/mnt/data/extracted_publications.xlsx'
new_extracted_df.to_excel(output_file_path, index=False)