Exporting Twitter Bookmarks via Network Tab & PHP to Markdown
by Craig McCreath
My wife came to me this morning with an issue. She had hundreds of Twitter bookmarks and no clear way of getting them out. She's into Obsidian like me, and wanted all her tweets out of her profile into something she could keep in her vault.
Unfortunately, I'm hearing that it's taking days to get an export from Twitter right now, so I had to find other solutions to get her data out... and fast.
The repo lives here: https://github.com/fusedreality/twitter-bookmark-export. It's dead hacky, but hopefully some will find it useful.
Finding a solution
I found a Gist by divyajyotiuk which took a list of JSON files from Twitter and converted these into Markdown. This takes data from Twitter's bookmark page directly, using their GraphQL responses for each page. However, the way suggested was slow and potentially error-prone as you had to save each response individually.
By using another Gist by duncangh to endlessly scroll Twitter you can auto-scroll down the Bookmarks page until you have all your content. On testing I've found that Twitter is experiencing some issues loading just now, but everthing still works even if you click 'Retry' on the Bookmarks page, since it's not refreshing the page.
Most browsers now support a 'HAR Export'. Think of this as a JSON file that records every network request made during your visit to a a webpage. This includes all request URLs, and ultimately also their content. By using both together, you have a single file with all your bookmarks!
HAR conversion to JSON
The original Python script can't use the HAR file directly however, it's straightforward to convert into individual JSON files again. I setup a standalone 'har.php' file that would read the HAR file, strip out only the Bookmarks responses, and only if their requests were successful.
For somebody with over 90 pages of bookmarks, this saved a fair bit of time and manual entry process so far and ensured they were all correctly ordered.
Lastly, to JSON
The original Python script worked fairly well but I ended up rewriting in PHP, adding in a few custom elements:
- I don't trust that https://t.co will exist for much longer, so using the extended legacy data I've brought in the extended URLs.
- I've added the timestamps to when the tweets were created, so you can understand the tweet in context.
- I want the whole bookmark text, so I modified my script not to truncate this.
I'm happy I managed to find a solution that worked well, and only took around an hour to sort out. People's pre-existing Gists provided a powerful tool to get started and get what was needed.
There's other export tools available for Twitter available, but this allowed the data in it's rawest form and without needing to use Chrome and install extensions. I also still have the HAR and JSON files saved, so if I need to tweak things or change the output - I have that control.