We solved 3 challenges in this CTF.

rsa-buffet

There are 10 public keys and 5 ciphertexts using them. If you can decode any three you can get a flag.
rsa-buffet

This challenge involves two cryptographic systems: RSA and Shamir’s Secret Sharing. One must first recover the private key of enough of the RSA public keys to successfully decrypt 3 of the 5 given ciphertexts. There is a convenient little library on github that can help you do exactly this, by sequentially executing well-known attacks on RSA (Wiener’s, Hastad’s, Fermat’s, factordb, etc). Using my fork of this library, I ran each of the public keys through the program, and successfully recovered 3 private keys.

$ python RsaCtfToolcli.py --publickey ../rsa-buffet/key-1.pem --private
$ python RsaCtfToolcli.py --publickey ../rsa-buffet/key-2.pem --private
$ python RsaCtfToolcli.py --publickey ../rsa-buffet/key-3.pem --private

This writes the recovered private keys to a file. Now that we have 3 private keys, we can start trying to decrypt some ciphertexts. I edited the main section of the provided encrypt.py file to perform the decryption.

if __name__ == "__main__":
    k_text = open(sys.argv[1], 'r').read()
    c = open(sys.argv[2], 'rb').read()
    k = RSA.importKey(k_text)
    print(decrypt(k, c))

After some trial-and-error, I found the following key-ciphertext decryption pairs: key1 + ciphertext5, key2 + ciphertext1, key3 + ciphertext4.

$ python encrypt.py key1 ciphertext-5.bin
Congratulations, you decrypted a ciphertext!  One down, two to go :)
5-7d29041c468b680fcff93c16011a2869f17de75b929b787503b412becde0321ec72fe1e499f2150a1dacb9a5f701c0b37470049dd560cef5163543469817971f50782f763f0b05ab7088f7ae
5-a7a1e271cf263279cece532b540545fa539b0f3650e2929163b02ee5459debdc53c1e07149eb2153015bb5c88e6270e8
5-149480c5c75cbe320564adfa432ac8ea241e048ed39c8bc6be14ca80c392487f43a7882075d785d62cb314ea6c89a6b5f28adfa56ec481e124567b88241de2a6cabcc7ec9de3acac8be5375b
5-7285289084282d559573f68eef10191091d76d6670014202670651f867cd2bc8640a86eef1c1e482affc7ae801fa446956c2186972fb6b7bac88c91d050c9d3cca
$ python encrypt.py key2 ciphertext-1.bin
Congratulations, you decrypted a ciphertext!  One down, two to go :)
1-32a1cd9f414f14cff6685879444acbe41e5dba6574a072cace6e8d0eb338ad64910897369b7589e6a408c861c8e708f60fbbbe91953d4a73bcf1df11e1ecaa2885bed1e5a772bfed42d776a9
1-e0c113fa1ebea9318dd413bf28308707fd660a5d1417fbc7da72416c8baaa5bf628f11c660dcee518134353e6ff8d37c
1-1b8b6c4e3145a96b1b0031f63521c8df58713c4d6d737039b0f1c0750e16e1579340cfc5dadef4e96d6b95ecf89f52b8136ae657c9c32e96bf4384e18bd8190546ff5102cd006be5e1580053
1-c332b8b93a914532a2dab045ea52b86d4d3950a990b5fc5e041dce9be1fd3912f9978cad009320e18f4383ca71d9d79114c9816b5f950305a6dd19c9f458695d52
$ python encrypt.py key3 ciphertext-4.bin
Congratulations, you decrypted a ciphertext!  One down, two to go :)
4-4a87367d053c533fd995032ed1e651487cb5dc1e0b1cb70a7662b152c73650f039a60f391a52f2413f43bd54eb7b12c41b42f31ac557edd4bfe46a396a8cdbe19dc9d8121924f43be51c976d
4-abbbcee71f140198ff8c50f51069465075979c31d32b052e7ae82ec7f6783aef7b41a597f9504d3340967b8d70cbe5a3
4-35fbbe40058e20463547b363d1f164c0bbbb97cfd9ffe7619bce31a59392f0e9625a2cd035276e09c4df3c0932f22bd322f16e375c7c7fd88da0f972832707eb549ff1e776db37649019ebce
4-12b466934911986bda845d8d26710a12250d210546f46716c78d7a17b1f2c893b95b934c8c7beafcf81a3123eb2ea05ca89101b23349e455794a8d56608c8ee49dd

Now with enough secrets from the secret-share, I was able to recover the flag.

from secretsharing import PlaintextToHexSecretSharer as SS

files = ["ciphertext5.txt", "ciphertext1.txt", "ciphertext4.txt"]
secrets = []
for fn in files:
    a = []
    with open(fn, "r") as f:
        a = f.readlines()

    a = a[1:] # strip the "Congratulation" line
    for i in range(0,len(a)):
        a[i] = a[i].strip("\n")

    secrets.append(a)

for i in range(0, 2):
    a = SS.recover_secret([secrets[0][i], secrets[1][i], secrets[2][i]])
    print(a)

Running this little script produces some out, and the flag.

And another one's down, and another one's down, and another one bites the dust!

Three's the magic number!  FLAG{ndQzjRpnSP60NgWET6jX}.

sanity-check

ruby -e 'puts ("bkp{flaglol}").gsub("flaglol", "welcome_to_2017")'

Run the code in in the terminal to get the flag: bkp{welcome_to_2017}.

Prudentialv2

This challenge presents us with a webpage whose only contents is a typical “Login” form. The form has two fields, one marked “Name” and the other “Password” just as you might expect to see anywhere else on the Web. Presumably, we need to enter the correct values into the fields to acquire the flag.

Rather than spending time making random guesses, which would almost certainly be a waste, the very first thing we should do is get some more information about how the page is constructed. This is most easily accomplished with any decent Web browser’s “View source” features. Reading the source of the page shows a typical HTML document, with a linked stylesheet and the form we saw:

<html>
<head>
    <title>level1</title>
    <link rel='stylesheet' href='style.css' type='text/css'>
</head>
<body>


<section class="login">
    <div class="title">
        <a href="./index.txt">Level 1</a>
    </div>

    <form method="get">
        <input type="text" required name="name" placeholder="Name"/><br/>
        <input type="text" required name="password" placeholder="Password" /><br/>
        <input type="submit"/>
    </form>
</section>
</body>
</html>

One line in particular stands out: there is a hyperlink (created with the <a> HTML element) to a file called index.txt. We can simply load that page in our Web browser to have a look at it.

This file is almost identical to the source code we just viewed. Since its name is index.txt, and its contents is almost exactly the same as the first web page we loaded, a reasonable guess is that this is in fact the same source code as the other page. The fact that we can see the source itself, rather than its evaluated output, is almost certainly caused by the fact that the file extension has been changed to .txt. This tells the server to treat the file as “plain text” rather than executable code, and so it gives us the full file contents itself.

There is one one difference between the contents of this page and the previous one: there is an additional block of code near the middle of the file right after the opening <body> tag. Its code looks like this:

<?php
require 'flag.php';

if (isset($_GET['name']) and isset($_GET['password'])) {
    $name = (string)$_GET['name'];
    $password = (string)$_GET['password'];

    if ($name == $password) {
        print 'Your password can not be your name.';
    } else if (sha1($name) === sha1($password)) {
      die('Flag: '.$flag);
    } else {
        print '<p class="alert">Invalid password.</p>';
    }
}
?>

This is an embedded PHP code snippet and is presumably the code being evaluated and executed on the first page. You need some familiarity with PHP to understand what this code is doing, but there isn’t much of it, so even if you don’t have PHP experience it’s not too hard to look up each and every command in the PHP documentation. Let’s examine each one briefly by manually reading the code and referring to the PHP documentation to learn what each line is doing.

The first line of code is simply the PHP opening tag itself (<?php), and the last line is the PHP closing tag (?>), “which tell PHP to start and stop interpreting the code between them.”

The very next line uses the require statement, whose documentation says it “is identical to include”. Following the link to include, we read that it “includes and evaluates the specified file.” In this case, the “specified file” is flag.php. Since the file we are examining is index.txt, and we have assumed that this is the same code running at the website’s homepage (because most servers are configured to use files named either index.html or index.php as a default), we can surmise that flag.php here refers to the file accessible at http://54.202.82.13/flag.php. Unfortunately, loading that page reveals a blank screen; it wasn’t a “Not Found” error, so the file is probably actually there, but it doesn’t output anything useful either.

Moving on, the next line is an if condition using the isset() function twice, once on the $_GET['name'] variable and the second time on the $_GET['password'] variable. These variable names (name and password) match the login form’s two fields. The $_GET part is a PHP “superglobal”, which “are built-in variables that are always available in all scopes.” Even more specifically, the $_GET superglobal in particular is “an associative array of variables passed to the current script via the URL parameters.” So this line simply checks if both a name and a password parameter was included in the query string portion of the URL used to access this page.

We can easily test this assumption by typing arbitrary values into the login form. Sure enough, if we submit the form with the values testname and testpassword, our browser loads the URL http://54.202.82.13/?name=testname&password=testpassword. Interestingly, we are also greeted with the message “Invalid password.” This message exists in the source code, which we’ll examine more closely in just a second.

Returning to the PHP code we’re reading, we can see the next two lines are setting two PHP variables, one called $name and one called $password, both taken from the URL via the PHP superglobal variable:

    $name = (string)$_GET['name'];
    $password = (string)$_GET['password'];

The important thing to note about these lines is that both variables are type cast to strings. This simply forces the values of the URL to be interpreted by PHP as string values. In other words, if we entered 1 as the “name” in the form field, this type cast ensures that the name is treated as the PHP literal value '1' (notice the single quotes), rather than as the numeric integer value 1. Same with the “password” field. The purpose of doing this is to prevent type confusion (or “type juggling”) attacks; seeing this code alerts us to the fact that this avenue of attack is (probably) already sealed off.

Next, the code checks whether the $name variable, set earlier, “is equal to” the $password variable, also set earlier:

    if ($name == $password) {
        print 'Your password can not be your name.';
    }

If the variables are determined to be equal, the code prints “Your password can not be your name.” Again, we can test our understanding of this easily enough: just enter the same value in the login form’s “name” field as the value for its “password” field. Using test for both values and submitting the form confirms this result.

The next conditional is of primary interest because it is clearly dealing with the flag:

    } else if (sha1($name) === sha1($password)) {
      die('Flag: '.$flag);
    }

Here we see another PHP function used twice: the sha1() function. The documentation says this function is used to “calculate the sha1 hash of a string.” If the result of this function applied to the $name variable is identical (because of the triple-equals, ===, meaning not merely “equal” but identical) to the result of this same function applied to the $password variable, then the code will die(), which exits the script while outputing the given message. In this case, the message is the flag.

So we now know that our challenge is to provide input through the login form that meets the following criteria:

  • the “name” and “password” fields must not be equal, and
  • the two different values entered must result in an identical value after having the sha1() function applied to them.

The PHP manual simply says that this function “calculates the sha1 hash of a string.” This code therefore tests whether the SHA-1 hash of the $name and $password variable are identical and only outputs the flag if they are. (This is not really close to how normal “login” mechanisms work, so this isn’t actually a login form. It’s just a cryptography riddle that requires some knowledge of PHP-backed Web pages to attempt.)

:beginner: If you’ve never encountered a “hash” in this context before, this challenge will seem incredibly complex. Searching the Internet for “hash” probably reveals more pages about cannabis than computer security at first, but adding “computer” to your searches will get you many, many, many more relevant results. Another tip for beginners, especially if you’ve found the regular Wikipedia article impenetrable, is to check if there is a “simple” version of the Wikipedia page. In this case, you’re in luck: the Simple Wikipedia article for “Cryptographic hash function” is relatively straightforward by comparison.

In the end, a “hash” in the context of computer security is simply a value that (“cryptographically”) represents some other value. The idea, in theory, is that two different original values will never be represented by the same two ultimate values after they have been “hashed.” In practice, however, weaknesses in cryptographic hash functions (whether by design due to flaws in their algorithm, or by mistakes introduced through their actual implementation) sometimes result in two different values being hashed to the same value. When this happens, the hash function is said to be “cryptographically broken,” because if you can hash two different values and get the same resulting value, then there is little point to the hash from a security perspective in the first place.

It’s now clear that the point of this challenge is to provide two different inputs whose SHA1 hash values are identical. But since the whole point of a hash function is to produce different outputs for different inputs, how are we to do this?

If the programmer of this code were a little less careful, it might have been possible to trick their code into hashing two identical inputs (which would, of course, produce two identical outputs). However, as we read, they were in fact careful about their variable types (by explicitly type casting the input variables to be strings) and about their equality checks (by using the === operator to check for exact sameness, not mere equivalence). The only remaining option is to use a collision attack. That is, we must provide two different inputs that nevertheless produce the same hash value as their output. This, of course, is exactly what cryptographic hash algorithms are designed to make difficult.

To understand exactly how difficult, though, it’s worth taking some time to further understand just a little bit of what a hash algorithm actually does. (I am not a cryptographer and even if I was, a full dissection of cryptographic hash functions is beyond the scope of this document, so I’m going to focus merely on the perspective of an end-user.) The SHA-1 algorithm always produces 160 bits of output, regardles of how much input it was given. You can see this yourself by using the shasum(1) command available on most UNIX-like systems, or with the File Checksum Integrity Verifier application provided by Microsoft for Windows systems:

$ for input in "test" "othertest"; do echo -n "$input" | shasum --algorithm 1; done;
a94a8fe5ccb19ba61c4c0873d391e987982fbbd3  -
3fcab50babaf08fc09223d11ff2b6bf3a4ee5c27  -

In the above example, two inputs of different length are provided. The first input (test) is only four characters long, and produces hexadecimal output that is forty characters long. The second input (othertest) is nine characters long, but also produces hexadecimal output that is forty characters long. A single hexadecimal character can be one of sixteen possible values (0 through 9, which is ten possibilities, or the values a through f, which is an additional six, totalling sixteen). This means that each character position shown above offers sixteen bits (mathematically, 161 = 16). Stringing forty sixteen-bit characters one after the other produces a number that is equal to 1640. That number is 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976, which is the total number of possible outputs the SHA-1 algorithm can produce. I don’t even know how to pronounce that number verbally.

Taken at face value, this is a daunting task. At the time of this writing, contemporary consumer-grade computers are simply not fast enough to search that many possibilities for two inputs whose outputs are both the same resulting value when run through the SHA-1 algorithm. A cluster of supercomputers could probably do it in some amount of time, but even they might need much more time than the duration of the Capture The Flag game (which only lasts a mere 48 hours), meaning that a brute-force search is impractical, at least with respect to acquiring the flag from this challenge. We have to use some much faster way to find a hash collision in the SHA-1 algorithm.

The fastest way to do this, of course, is to ask the Internet if anyone else has already done this. A simple Internet search for SHA-1 hash collision turns up numerous hits. Indeed, a SHA-1 hash collision has been found and published. On February 23rd, 2017, Google published a post stating:

we are releasing two PDFs that have identical SHA-1 hashes but different content.

Since the challenge’s PHP code does not check that its input matches a specific hash, only that the result of two hashing operations are the same as each other, these PDFs are exactly what we need: two different inputs that have identical SHA-1 hashes. In this case, the inputs are PDF files, shattered-1.pdf and shattered-2.pdf. Both produce the SHA-1 hash value 38762cf7f55934b34d179ae6a4c80cadccbb7f0a. We can use one PDF for the form field’s “name” parameter so that the code’s call to sha1($name) produces the aforementioned hash, and the other PDF for the form field’s “password” parameter, which will make the call to sha1($password) return an identical result.

Let’s first verify that this does, in fact, actually work. We can download the PDFs and run shasum over them as we did with our test input using Bash brace expansion to save some typing:

$ shasum --algorithm 1 shattered-{1,2}.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf

As a sanity check, we can use a different algorithm to make sure that these files are in fact not the same file, or more simply use the diff command to check for differences:

$ shasum --algorithm 256 shattered-{1,2}.pdf
2bb787a73e37352f92383abe7e2902936d1059ad9f1ba6daaa9c1e58ee6970d0  shattered-1.pdf
d4488775d29bdef7993367d541064dbdda50d383f89f0aa13a6ff2e0894ba5ff  shattered-2.pdf
$ diff shattered-{1,2}.pdf
Binary files shattered-1.pdf and shattered-2.pdf differ

Great, we have two identical SHA-1 hash values despite having two files whose contents are different, as evidenced by the SHA-256 hash values and as reported by diff.

The next step is to take these PDF files and provide them as input to the challenge’s PHP code. A Web browser’s user interface doesn’t actually allow us to place a file into a textual form field, so this isn’t going to be as simple as filling in the form. Of course, we don’t actually care about the form’s specifics at all, since it’s the result of submitting the form field that we actually care about. That result is simply accessing a URL with a query string. Of course, a Web browser doesn’t exactly let us put a PDF file into the address bar, either, so we’ll need some other way of taking the file contents in the PDFs and translating them into a URL.

A further complication is the fact that the PDF files are, well, files. In contrast, a URL is text. We somehow need a way to represent a file’s raw content in the form of a URL. The answer to this conundrum is a special URL syntax called URL-encoding, or sometimes called percent-encoding, because each encoded sequence begins with a percent (%) character. Following the % character is a pair of hexadecimal digits signifying the value of an 8-bit byte. In other words, putting %41 into any part of a URL is interpreted to mean “a byte value of 0x41, in hex.” (Hexadecimal numbers are often distinguished from other base digits with the 0x prefix.) Using the ASCII and UTF-8 character encoding standards, this maps to the capital letter A. Similarly, inserting %20 into any part of a URL ultimately maps to a literal SP (“single space”) in these same character sets. Of course, nothing is limiting us to entering values that only map to characters we can type with a standard keyboard layout. The percent-encoding scheme allows us to represent arbitrary binary data in the same way.

Let’s take a closer look at the PDF files to understand how we might do this.

If we opened a PDF file in a PDF viewer, we would see a graphical representation of the binary data in the file after it is interpreted by the PDF viewer application itself. That’s not what we want, because we have no way of translating that back into individual byte sequences for use in a URL. Instead, we want to see the binary data in the PDF files before it is rendered “as a PDF” to our screen.

The tools for doing this are called “hex editors” or “binary file editors.” As the name implies, these are file editors that let us inspect a file’s raw data, without any special program interpretation. A similar kind of program is a “hex dump viewer,” which simply displays the contents of certain data (like files) in the way a hex editor would (usually referred to as a “hexdump”), but doesn’t offer editing capabilities. These simpler programs will do for us, since we just want to inspect the contents of the PDFs, not change them.

A popular hexdump viewer available by default on most UNIX-like systems is xxd(1). (Windows users have multiple options as well, but they are often not included by default. For example, Microsoft provides a program called Binary Editor in their Visual Studio development suite.) In the simplest form of xxd’s operation, we simply provide a file for it to inspect, or feed it some data from a previous command. For example, here’s how we can prove to ourselves that the hexadecimal value 41 is, in fact, interpreted as a capital letter A:

$ echo -n 'A' | xxd
0000000: 41                                       A

As you can see, we supply the single input character A to the echo command and use a pipe to feed that character to the xxd command. The xxd program then inspects its input and shows it to us in raw hexadecimal format. The numbers to the left of the colon are “line” numbers (counting as byte offsets, represented in hexadecimal themselves, and starting from 0). Then we are shown the inspected contents itself, which in this example is the single byte 41 (which, remember, is in hex, not decimal). Finally, way over on the right is an ASCII representation of the raw data. In this case, since we supplied ASCII text as input to the xxd program to begin with, we see the same ASCII on the right.

Next, let’s use xxd to inspect the raw data in the PDF files. Since the files are pretty big (about 413KB each), let’s limit xxd’s output to the first one-hundred fifty bytes of the shattered-1.pdf file:

$ xxd -l 150 shattered-1.pdf 
0000000: 2550 4446 2d31 2e33 0a25 e2e3 cfd3 0a0a  %PDF-1.3.%......
0000010: 0a31 2030 206f 626a 0a3c 3c2f 5769 6474  .1 0 obj.<</Widt
0000020: 6820 3220 3020 522f 4865 6967 6874 2033  h 2 0 R/Height 3
0000030: 2030 2052 2f54 7970 6520 3420 3020 522f   0 R/Type 4 0 R/
0000040: 5375 6274 7970 6520 3520 3020 522f 4669  Subtype 5 0 R/Fi
0000050: 6c74 6572 2036 2030 2052 2f43 6f6c 6f72  lter 6 0 R/Color
0000060: 5370 6163 6520 3720 3020 522f 4c65 6e67  Space 7 0 R/Leng
0000070: 7468 2038 2030 2052 2f42 6974 7350 6572  th 8 0 R/BitsPer
0000080: 436f 6d70 6f6e 656e 7420 383e 3e0a 7374  Component 8>>.st
0000090: 7265 616d 0aff                           ream..

Here we can see the binary values of the PDF data itself. It begins, as all PDF files do, with a header that declares the version number of the PDF file format the rest of the document conforms to. In this case, it’s %PDF-1.3, which we can see in the far right column of xxd’s output. Notice that as before, this right-most column simply represnts the same data as the middle set of columns, but converted into ASCII. We can again prove this to ourselves by taking the very first character of the PDF header (which in this case is an ASCII %) and running it through xxd as before:

$ echo -n '%' | xxd
0000000: 25                                       %

Sure enough, the hexadecimal value 0x25 maps to an ASCII %. We can do the same thing with the next character, the P in “PDF” to see what an ASCII-encoded P’s value in hex is:

$ echo -n 'P' | xxd
0000000: 50                                       P

So an ASCII % is hexadecimal 25 and an ASCII P is hexadecimal 50. These two bytes are the first bytes in every PDF file. Converting this hexdump into a URL’s percent-encoded string is now relatively straightforward: we simply insert a % character between every two hex numbers in the middle set of columns and remove the spaces. That is to say, the first group of digits, which is hexadecimal 2550, becomes %25%50 in a URL, and so on down the line.

Recall that the target code is available at http://54.202.82.13/ and expects to receive a URL with two query string parameters, name and password. That means we need to construct a URL that looks like http://54.202.82.13/?name=<URL-ENCODED-CONTENT-OF-FIRST-PDF>&password=<URL-ENCODED-CONTENT-OF-SECOND-PDF>, with the two <URL-ENCODED-CONTENT…> parts replaced with the actual percent-encoded sequences of the raw binary data from their respective PDF files.

Recall also that the PDF files are pretty large by human standards. They are 416 kilobytes each. That’s four hundred sixteen thousand (“kilo”) bytes, way more than anyone can be expected to encode “by hand.” Instead of manually URL-encoding the hexdump, copying its output, and pasting it into a Web browser’s address bar (which might not even work anyway; can your clipboard handle that much text?), let’s just ditch the Web browser. After all, a Web browser is just one way to make HTTP network requests, and that’s all we’re trying to do.

Another way to make HTTP requests is to use the curl(1) program, also often available by default on most UNIX-like systems. (It can also be installed on Windows or any other system by downloading the appropriate file for your operating system.) This program simply provides a command-line interface to make the same kind of network requests that your Web browser does but without the graphical interface provided by the browser.

In our case, we want to use it to automatically perform the URL-encoding of the PDF files, placing those URL-encoded byte sequences into an appropriate URL, and then making the network request itself. If you’ve never used curl before, you might struggle with its command line options and syntax. A careful reading of its manual page will usually clear up any confusion, but there are a lot of options, so there’s no substitute for hands-on experimentation. In our case, we want to use its --header, --get, and --data-urlencode options.

:beginner: One great way to learn how to use curl is to take advantage of certain Web browser’s developer tools’s “Copy as cURL” feature. This super-handy shortcut makes it easy to construct a curl command line invocation that mimics what your Web browser just did. You can use this to learn the semantics of the command line program more gently, without needing to spend hours reading the manual.

The curl command we’ll use is:

curl 'http://54.202.82.13/' --header 'Host: 54.202.82.13' --get --data-urlencode name@shattered-1.pdf --data-urlencode password@shattered-2.pdf

Breaking this down:

  • The first word (curl) invokes the command.
  • The second word, i.e., the first command line argument ('http://54.202.82.13/'), is the base URL to which we are going to send a network request. In this case, that’s simply the target URL we’ve been examining throughout the challenge.
  • The third word (--header) is a long option indicating that the next argument will be an HTTP Header.
  • That argument is the Host HTTP Request Header and its actual value, which in this case is the same as the address of the target server.
  • The next word (--get) is a long option that tells curl to use the GET HTTP Request Method, and to treat any of the various --data options as query string parameters.
  • Following that is another long option (--data-urlencode) which instructs curl to URL-encode the data referred to by the very next word.
  • That very next word is name@shattered-1.pdf. This is curl-specific syntax that sets the parameter called name to be the value of the data read from the file (indicated by the @) called shattered-1.pdf.
  • Finally, we do the same with the second parameter: --data-urlencode password@shattered-2.pdf.

Executing this command, however, does not give us what we were hoping for:

$ curl 'http://54.202.82.13/' --header 'Host: 54.202.82.13' --get --data-urlencode name@shattered-1.pdf --data-urlencode password@shattered-2.pdf
<html>
<head><title>414 Request-URI Too Large</title></head>
<body bgcolor="white">
<center><h1>414 Request-URI Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

Thankfully, this error is extremely clear: the URL was too long (a “URI” is a superset of a “URL”), and the server refused to process all of it. After all, the URL was larger than 416KB * 2 bytes. That’s a very long URL. However, we are now presented with a new problem: if we cannot use the full contents of the two PDFs, will we still have two inputs whose SHA-1 hashes are the same? Fortunately for us, the answer is still yes. To see why, we need to refer to the actual academic paper published by the researchers who provided the two PDF files we are using.

In §2 of their paper, headlined “Our contributions,” the authors describe their work as “an identical-prefix collision attack, where a given prefix P is extended with two distinct near-collision block pairs such that they collide for any suffix S.” An “identical-prefix collision attack” simply means two input values that start with identical data. (It is a form of chosen-prefix collision attack.) “Two distinct near-collision block pairs” simply means that two different values are then appended to this identical prefix. Finally, “such that they collide for any suffix” means that any follow-on data appended to these now different byte sequences will produce the same hash, regardless of the rest of this appended data. In other words, we don’t need to use the full PDF files. We just need some amount of data from the start of both files. Specifically, we need the identical prefix and the near-collision block pairs. We can discard the suffix because it will have zero effect on the hashed output.

How much data from the start of the PDFs do we need? This, too, is answered by their paper. Page 4 of their paper prominently features a hexdump called “Table 2” and is helpfully labelled “Identical prefix of our collision.” The hexdump they show is the first 192 bytes of each PDF. We can inspect the first 192 bytes ourselves to confirm that they are indeed identitcal across the two files:

xxd -l 192 shattered-1.pdf > 1-pdf.192 # save a hexdump from the first PDF to the file 1-pdf.192
xxd -l 192 shattered-2.pdf > 2-pdf.192 # and do the same for the second
diff {1,2}-pdf.192                     # and then show any differences between the two

There will be no output from diff because the two prefixes are identical, as promised. This also means these bytes alone are not enough to acquire the flag. Since these two byte streams are identical, we should expect an identical SHA-1 hash. Moreover, the code in the challenge rejects identical input values for the $name and $password variables. So, next, we need to append the “near-collision block pairs.” These are also helpfully displayed under “Table 1,” which also displays (a pair of) hexdumps (one for the first PDF and one for the second). It is on Page 3 and is also helpfully labelled “Colliding message blocks for SHA-1.” We can simply count the number of bytes in one of the pairs, which are helpfully grouped as hexadecimal digits in a table: 16 across and 4 down. Two groups of 16 * 4 = 64 * 2 = 128. So the near-collision block pairs are each 128 bytes long.

Putting this together with the identical prefix of 192 bytes gives us a total of 320 bytes (because a 192 byte prefix plus two 64 byte collision block pairs gives a total of 320 bytes). This means we only need the first 320 bytes from each PDF to produce a collision. Naturally, let’s test this!

We’ll need to create two new files, let’s call them name.dat and password.dat. The name.dat file should be the first 320 bytes of the shattered-1.pdf, and the password.dat file will be the first 320 bytes of the shattered-2.pdf file. We know we can see a hexdump of raw file contents using xxd, but another useful feature of xxd is the ability to do the reverse. That is, it can take a hexdump and convert it into a raw binary file. This means we can take the output of xxd and pipe that output back into a second invocation of xxd to truncate a file at an exact byte offset.

The -r option to xxd is what “reverses” its operation:

# Put the first 320 bytes of shattered-1.pdf into name.dat
xxd -l 320 shattered-1.pdf | xxd -r > name.dat
# Do the same with the second PDF, but put those bytes into password.dat
xxd -l 320 shattered-2.pdf | xxd -r > password.dat

Let’s confirm that these two files have different contents:

$ diff name.dat password.dat
Binary files name.dat and password.dat differ

And now, for the moment of truth, let’s compute the SHA-1 hash for these two different 320 byte streams:

$ shasum --algorithm 1 {name,password}.dat
f92d74e3874587aaf443d1db961d4e26dde13e9c  name.dat
f92d74e3874587aaf443d1db961d4e26dde13e9c  password.dat

Presto! We have created the shortest possible byte streams with this prefix and these collision blocks whose contents hash to the same SHA-1 output.

:bulb: If you want to prove this to yourself you could repeat the procedure a second time, but this time take one byte less from each file, and hash those. The hashes will be different:

$ xxd -l 319 shattered-1.pdf | xxd -r > name.dat
$ xxd -l 319 shattered-2.pdf | xxd -r > password.dat
$ diff {name,password}.dat
Binary files name.dat and password.dat differ
$ shasum --algorithm 1 {name,password}.dat
aed0141331c04c118a5eb3823cd567f7f10c5a59  name.dat
2e872ce1efb97b5641dbb37b7b88777602418f20  password.dat

So as you can see, we need at least 320 bytes from each PDF.

Now that we have what are effectively our hash collision payload files, we simply need to rerun the curl command from earlier with the new, shorter files in place of the full PDFs:

$ curl 'http://54.202.82.13/' --header 'Host: 54.202.82.13' --get --data-urlencode name@name.dat --data-urlencode password@password.dat
<html>
<head>
    <title>level1</title>
    <link rel='stylesheet' href='style.css' type='text/css'>
</head>
<body>

Flag: FLAG{AfterThursdayWeHadToReduceThePointValue}

For our hard work, we are rewarded with the flag! :D

[Back to Top]