Bluesky RSS announcer
A Bluesky announcement script
that I wrote in python
:
Consuming and parsing an RSS web feed. Embed and post to Bluesky using AT Protocol also known Authenticated Transfer Protocol or ATProto.
So a blog, podcast provider or similar may set up a
cron
job (or workflow, action or similar)
that routinely tells their followers if there are news to consume!
A reasonable effort is made to avoid spamming;
will not post anything recently posted,
will not post old RSS entries,
will not post more posts than configured,
will only post if --no-dry-run
is configured.
Quick and dirty hack, but useful to us!
Index:
- Usage
- Secrets Management
- RSS
- Bluesky login
- Bluesky list recent posts
- Bluesky list embeds in recent posts
- Bluesky post new announcements
- Dependencies
Usage
./bsky-rss-bot.py -h
usage: bsky-rss-bot.py [-h] --url URL --handle HANDLE --secret SECRET --secret-type {arg,env,file} [--dry-run | --no-dry-run] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--days DAYS] [--posts POSTS]
bluesky bot
options:
-h, --help show this help message and exit
--url URL URL to lib-syn RSS feed, e.g. https://sakerhetspodcasten.se/index.xml
--handle HANDLE bluesky handle
--secret SECRET bluesky secret
--secret-type {arg,env,file}
bluesky secret type
--dry-run, --no-dry-run
dry-run inhibits posting (default: True)
--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--days DAYS Maximum days back in RSS history to announce
--posts POSTS Maximum posts to emit, avoid spamming
Hope this help was helpful! :-)
Example:
./bsky-rss-bot.py \
--url https://sakerhetspodcasten.se/index.xml \
--handle blaufish.bsky.social \
--days 60 \
--dry-run \
--secret .bluesky2.secret \
--secret-type file
Example output:
2025-02-04 10:53:03,238 INFO Request feed from https://sakerhetspodcasten.se/index.xml
2025-02-04 10:53:03,277 INFO RSS candidate: Säkerhetspodcasten #275 - Ostukturerat V.6
2025-02-04 10:53:03,277 INFO RSS candidate: Säkerhetspodcasten #274 - Fyra fantastiska frågor
2025-02-04 10:53:03,277 INFO RSS candidate: Säkerhetspodcasten #273 - Ostrukturerat V.50
2025-02-04 10:53:03,607 INFO Bluesky lookup: blaufish.bsky.social=did:plc:y25e3xvbgsjuqcxjdybktovi
2025-02-04 10:53:04,886 INFO Disregard already published: https://sakerhetspodcasten.se/posts/sakerhetspodcasten_275_ostukturerat_v_6/
2025-02-04 10:53:04,886 INFO Disregard already published: https://sakerhetspodcasten.se/posts/sakerhetspodcasten_274_fyra_fantastiska_fragor/
2025-02-04 10:53:04,886 INFO Disregard already published: https://sakerhetspodcasten.se/posts/sakerhetspodcasten_273_ostrukturerat_v_50/
2025-02-04 10:53:04,886 INFO Terminating normally. Thanks for All the Fish!
Secrets Management
The tool requires a single secret: the Bluesky app login password.
I did not want to hard code how this secret is obtained. What is most convenient may depend on user taste and deployment environment.
The command line usage help includes;
--secret SECRET bluesky secret
--secret-type {arg,env,file}
bluesky secret type
Adding options to the python command line argument parser is easy:
parser.add_argument('--secret-type',
dest = 'secret_type',
required = True,
choices = ['arg', 'env', 'file'],
help = 'bluesky secret type')
Implementing multiple different possible sources of secrets was also easy:
secret = None
match args.secret_type:
case "arg":
content = args.secret
secret = content.strip()
case "env":
content = os.environ[args.secret]
secret = content.strip()
case "file":
with open(args.secret, "r") as f:
content = f.read();
secret = content.strip()
case "_":
logger.error(f"TODO implement!")
return
RSS
The first step is to identify RSS items that are candidates for Bluesky announcement.
We establish a threshold
, i.e. candidates posted args.days
ago or later;
threshold = None
def main():
global threshold
#...
threshold = datetime.now() - timedelta(days=args.days)
candidates = process_rss(args.url)
if len(candidates) < 1:
logger.info("No candiates, exiting")
return
process_rss(args.url)
will generate a list of RSS items that meet the threshold:
def process_rss(url):
candidates = []
logger.info(f"Request feed from {url}")
rss = feedparser.parse(url)
entries = rss['entries'];
for entry in entries:
candidate = process_entry(entry)
if (candidate):
candidates.append(entry)
return candidates
Each RSS entry will be evaluated if it meets the threshold using process_entry(entry)
:
def process_entry(e):
link = e['link']
published_parsed = e['published_parsed']
published = e['published']
title = e['title']
ts = time.mktime( published_parsed )
dt = datetime.fromtimestamp(ts)
if dt > threshold:
logger.info(f"RSS candidate: {title}")
return True
else:
logger.debug(f"RSS skipping old entry: {title}")
return False
The published_parsed
is a python-friendly version of RSS item.pubDate
,
that reasonably easy can be converted to a datetime
object and compared to threshold
.
So, the rest of the code work with recent RSS entries only, nothing old will be processed later on.
Bluesky login
Login is pretty easy; you need to username/handle args.handle
,
and the secret
application password:
client = atproto_client.client.client.Client()
client.login(args.handle, secret)
Bluesky list recent posts
We want to avoid re-posting / spamming the same entries again and again. Therefor we will look up the recent posts from our announcement user:
did = bsky_lookup(args.handle)
logger.info(f"Bluesky lookup: {args.handle}={did}")
posts = bsky_posts(client, did)
The user lookup is just a wafer-thin wrapper around HandleResolver
:
def bsky_lookup(_id):
resolver = atproto_identity.handle.resolver.HandleResolver()
did = resolver.resolve(_id)
return did
Similarly, the post lookup is just a wafer-thin wrapper around client.get_author_feed()
:
def bsky_posts(client, did):
responses = client.get_author_feed(
actor=did,
filter='posts_and_author_threads',
limit=30
)
return responses
Bluesky list embeds in recent posts
Now we create a simple list of all URIs that are included
in a recent app.bsky.embed.external#view
embedding:
tweeted = []
for entry in posts.feed:
post = entry.post
record = post.record
embed = post.embed
if embed is None:
continue
if embed.py_type != 'app.bsky.embed.external#view':
continue
external = embed.external
uri = external.uri
tweeted.append(uri)
logger.debug(f"Bluesky embeded: {uri}")
Bluesky post new announcements
Now, it is time to loop through all candidates
that met the threshold.
We will skip old candidate.link
entries that are on the already tweeted/posted list.
posts = 0
for candidate in candidates:
if posts >= args.posts:
logger.info(f"Stopping posting after reaching post limit: {posts}")
break
announce = True
for old in tweeted:
if candidate.link == old:
logger.info(f"Disregard already published: {old}")
announce = False
break
if announce:
logger.debug(f"Prepare post: {candidate.link}")
bsky_post(client, candidate, args.dryrun)
posts = posts + 1
logger.info("Terminating normally. Thanks for All the Fish!")
Posting the announcement is rather easy using client.send_post()
,
AppBskyEmbedExternal.Main
and AppBskyEmbedExternal.External
.
NOTE: I did run into some complications with reading an outdated documentation somewhere, so I had a few
__pydantic_validator__
issues until I got the code correctly aligned with the current SDK)
def bsky_post(client, candidate, dryrun):
c_title = candidate.title
c_description = candidate.description
c_uri = candidate.link
#...specific to our podcast, ignore...
c_desc = re.sub(r"Lyssna mp3, längd: ", "", c_description)
c_desc = re.sub(r" Innehåll ", " ", c_desc)
logger.debug(f"c_title: {c_title}")
logger.debug(f"c_description: {c_description}")
logger.debug(f"c_desc: {c_desc}")
logger.debug(f"c_uri: {c_uri}")
embed_external = models.AppBskyEmbedExternal.Main(
external = models.AppBskyEmbedExternal.External(
title = c_title,
description = c_desc,
uri = c_uri,
)
)
text = "📣 " + c_title + " 📣 " + c_desc
text300 = truncate( text, 300 )
if dryrun:
logger.info(f"Dry-run post: {c_uri}")
else:
logger.info(f"Post: {c_uri}")
post = client.send_post( text=text300, embed=embed_external )
logger.info(f"post.uri: {post.uri}")
logger.info(f"post.cid: {post.cid}")
Truncating the text to the 300-limit is done as follows:
def truncate( text, maxlen ):
if len(text) < maxlen:
return text
idx = text.rfind(" ", 0, maxlen-3)
return text[:idx] + "..."
An observant code reviewer can easily come up with cases where the truncation will break on evil input… So room for future improvements!
Dependencies
feedparser
.
Consumes RSS using feedparser.
This part was fairly easy as I have a bunch of earlier RSS processing projects,
parts of the code was just copy-paste similar code.
atproto
.
Reads and Posts to Bluesky using The AT Proto SDK for Python.