Many people can get confused between siteminder URL encoding and other URL encoding.
Today, there are many internet components and technologies that are capable fo doing URL encoding.
Some may take UTF-8, others may not.
Here we are trying to expose the logic behind siteminder URL encoding, so people can separate its algorithm functions from others.
Same information can be found in support site as well.
Description:
What is the algorithm used by the Webagent to encode / decode URL?
Solution:
WebAgent encode an URL:
==================
If the URL contains any of the following characters, webagent will Encode it:
' '(space), '&', '+', '?', '%', or '$'.
First, the URL is prepended with '$SM$'.
Next, the following rules are applied in order:
' 'is replaced with '%20'
'&' is replaced with '%26'
'+' is replaced with '%2b'
'?' is replaced with '%3f'
'@' is replaced with '%40'
'"' is replaced with '"' (no changes/encoding)
'=' is replaced with '%3d'
'%' is replaced with '$%' or '-%'
Case of '$' and '-'
'$' is used as delimiter for traditional agents / legacy encoding = yes
'$' is replaced with '$$'
'$$' is replaced with '$$$$'
'$$$' is replaced with '$$$$$$'
'-' is replaced with '-' (no changes/encoding)
When decoding, it will reverse the logic, agent will remove single $ sign for every one it added before.
'-' is used as delimiter for framework agents / legacy encoding = No
'-' is replaced with '--'
'--' is replaced with '----'
'---' is replaced with '------'
'$' is replaced with '%24'
When decoding, it will reverse the logic, agent will remove single hyphen sign for every one it added before.
Scenario with traditional agents / legacy encoding = yes
URL being encoded is:
http://server.domain.com/resource?P1=A+B&P2=Space%20Here
SM-Encoded, it becomes:
$SM$http%3a%2f%2fserver%2edomain%2ecom%2fresource%3fP1=A%2bB%26P2=Space$%20Here
WebAgent decode an URL:
==================
If the URL starts with '$SM$', then scan the string from the beginning. If the current character is '$', skip to the next character and return it. If the current character is %, then read the next TWO characters and return the urldecoded value. Otherwise return the current character. The algorithm will not urldecode a value such as $%20,
because the % will have been skipped by the first case.
So, if the URL being decoded is:
$SM$http%3a%2f%2fserver%2edomain%2ecom%2fresource%3fP1=A%2bB%26P2=Space$%20
Here first strip off the $SM$:
http%3a%2f%2fserver%2edomain%2ecom%2fresource%3fP1=A%2bB%26P2=Space$%20
then parse down the string until we find a '$' or a '%':
http%3a%2f%2fserver%2edomain%2ecom%2fresource%3fP1=A%2bB%26P2=Space$%20
At this point, we see a %. So, we urldecode the % and the next two characters and then continue:
http:%2f%2fserver%2edomain%2ecom%2fresource%3fP1=A%2bB%26P2=Space$%20Here
Again, we see a %. Repeat:
http:/ %2fserver%2edomain%2ecom%2fresource%3fP1=A%2bB%26P2=Space$%20Here
Repeat (x times)
http://server.domain.com/resource?P1=A+B&P2=Space
Now, we see a '$' character, that means we return the next character and continue scanning.
http://server.domain.com/resource?P1=A+B&P2=Space%20Here
And now we've reached the end of the string. This is the SM-Decoded value.
Scenario with Framework agents / legacy encoding = FALSE
http://server.domain.com/protected/HeaderDumper.asp?1%202&3+4?5%6$7@8"9=10-11--12---13
becomes
SM-HTTP%3a%2f%2fserver%2edomain%2ecom%2fprotected%2fHeaderDumper%2easp%3f1-%202%263%2b4%3f5-%6%247%408"9%3d10--11----12------13