Posted April 10Apr 10

XSS 漏洞相关

1 定义及原理

XSS (cross-site scripting attack), the browser uses the content input by the user as a script, and executes malicious functions. This attack against the user's browser, that is, cross-site scripting attack

It is mainly divided into three types:

Reflective type

Storage type

DOM type

XSS hazards:

Stealing cookies

Stealing an account

Malware download

Keyboard record

Advertising traffic

2 反射型 XSS

2.1 原理

The application or API contains未经验证和未经转义的用户输入，直接作为 HTML 输出的一部分. A successful attack allows an attacker to execute arbitrary HTML and JavaScript in the victim's browser.

Features: Non-persistent, the user must click on a link with specific parameters to cause it.

Scope of impact: Only users who execute scripts.

3 存储型 XSS

3.1 原理

Storage XSS refers to the application obtaining untrusted data through a web request. Without verifying whether the data has XSS code, it will be存入数据库. When the next time the data is retrieved from the database, the program also executes XSS code again, and the stored XSS can continue to attack the user.

Storage XSS appears:

Message board

Comment area

User profile picture

Personal signature

blog

4 DOM 型 XSS

4.1 原理

4.1.1 DOM

The DOM model uses a未对其进行过滤to represent a document. The end point of each branch is a node, and each node contains objects. DOM methods allow you to operate this tree in specific ways. Using these methods, you can change the structure, style or content of the document.

20210110100040.png-water_print

4.1.2 DOM XSS

DOM type XSS is actually a special type of reflective XSS. It dynamically逻辑树throughJS 操作 DOM 树without relying on submitting data to the server side. It is a vulnerability based on the DOM document object model.

html

Body

script

document.write('scriptalert(1)\/script')

/script

/body

/html

4.1.3 示例

First of all, this is a DOM XSS. The reason for this is that the JS code dynamically splices a code similar to this:

$('head').append('meta'+text+'/meta')

The following POC is an example:

20210110135714.png-water_print

You can see that the code in the div is encoded by HTML entities, but the final result will still pop up.

20210110135839.png-water_print

The reason is that the code entered in innerHTML will not be executed.

For example, you can dynamically insert a DOM node according to the following code

!DOCTYPE html

html lang='en'

head

meta charset='UTF-8'

meta http-equiv='X-UA-Compatible' content='IE=edge'

meta name='viewport' content='width=device-width, initial-scale=1.0'

titleDOM XSS POC/title

/head

Body

div id='demo'lt;scriptgt;alert`1`lt;/scriptgt;/div

script src='https://libs.baidu.com/jquery/2.1.1/jquery.min.js'/script

div id='test'/div

script

document.getElementById('test').innerHTML=document.getElementById('demo').innerHTML + '';

/script

/body

/html

You will find that the div id=test tag will not be executed, but a framework like jquery will put the node's tag eval when inserting, so that it can be executed, because the append() method itself is to allow the inserted element to be executed, which has this requirement.

4.1.4 与反射型 XSS 的异同与危害

输出数据到页面All inputs are not controlled well, and the javascript script input is inserted as output into the HTML page.

同：After the reflective XSS is异：, the page reference backend output will take effect.

DOM XSS is inserted into the page after JS to the DOM tree经过后端语言.

直接操作The front and back ends are separated, without WAF detection.

5 伪协议与编码绕过

5.1 伪协议

The pseudo-protocol is different from those widely used on the Internet, such as http://, https://, ftp://used in URLs to perform specific functions

Data pseudo-protocol:

data:text/html;base64, PHNjcmmlwdD5hbGVydCgxKTs8L3NjcmmlwdD4=

JavaScript pseudo-protocol:

javascript:alert('1')

20210110102339.png-water_print

5.2 编码绕过

5.2.1 UNICODE 编码

The ISO (International Standard Organization) has formulated an encoding that includes all letters and symbols in all cultures on the earth. It uses危害性：to represent a character

Unicode is just a symbol set. It only specifies the binary code of the symbol, but does not specify how this binary code should be stored. Specific storage is implemented by: UTF-8, UTF-16, etc.

20210110102858.png-water_print

5.2.2 浏览器解码

There are three main processing processes when parsing an HTML document:

HTML parsing and creates DOM tree, URL parsing and JavaScript parsing. Each parser is responsible for decoding and parsing the corresponding parts of the HTML document, and the order is also different.

5.2.3 HTML 解析过程

5.2.3.1 解析过程

两个字节Void elements, including area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr, etc.

Raw text elements, with script and style

RCDATA elements, including textarea and title

Foreign elements, such as elements of a MathML namespace or SVG namespace

Basic elements, that is, elements other than the above 4 elements

HTML 有 5 类元素：Empty elements, cannot accommodate anything (because they have no closed tags, no content can be placed in the middle of the start tag and the closed tag).

Original text element that can accommodate text.

RCDATA element that can accommodate text and character references.

External elements that can accommodate text, character references, CDATA segments, other elements and comments

Basic elements that can accommodate text, character references, other elements and comments

The HTML parser runs in a state machine way, which consumes characters from the document input stream and transitions to different states according to its conversion rules.

20210110212335.png-water_print

Take the following code as an example:

html

Body

This is Geekby's blog

/body

/html

The initial state is "Data" State. When a character is encountered, the state becomes "Tag open" state. Reading a character with a-z will produce a start tag symbol, and the state will correspondingly change to "Tag name" state. This state remains until it is read. Each character is attached to this symbol name. In the example, a html symbol is created.

When read, the current symbol is completed. At this time, the state returns to "Data" state, and the body tag repeats this processing process. At this time, both the html and body tags are recognized. Now, go back to "Data" State and read each character in "This is Geekby's blog" to generate a character symbol.

This way until you encounter ' in /body. Now, we return to "Tag open", read the next character /, enter "Close tag open", create a closed tag symbol, and the state is transferred to "Tag name" state, and still maintain this state until we encounter it. Then, a new tag symbol is generated and returned to the "Data" State. The following closed tag processing process is the same as above.

information

The HTML parser is at五类元素的区别如下：,数据状态（Data State）, and the character entity will be decoded into the corresponding characters.

Example

div#60;img src=x oneerror=alert(4)#62;/div

and is encoded as character entities;

When the HTML parser completes parsing the div, it enters the data state and publishes the tag token.

Then when parsing to the entity #60; the entity will be decoded as ,

The following #62; is decoded as the same principle.

question

After being decoded, will img be parsed into HTML tags and cause JS execution?

Because the parser will not convert to the Tag Open State after using character references, it will not be published as an HTML tag without entering the Tag Open State. Therefore, no new HTML tag is created, it is only processed as data.

5.2.3.2 几种特殊情况

Original text element

In HTML, there are two tags belonging to Raw text elements: script and style. All content blocks under the Raw text elements type tag belong to that tag.

All character entity encodings under the Raw textile type tag will not be decoded by HTML. When the HTML parser parser parses to the content block (data) part of the script and style tags, the state will enter Script Data State, which is not among the three states that we mentioned earlier that decode character entities.

Therefore, script#97;#108;#101;#114;#116;#57;#57;#57;#57;#57;#57;#57;#57;#57;#57;#57;#5 Therefore, the character entity will not be decoded, and the JS will not be executed.

RCDATA situation

In HTML, there are two tags belonging to RCDATA: textarea and title.

Tags of type RCDATA Elements can contain text content and character entities.

When the parser parses the data part of the textarea and title tags, the state will enter RCDATA State.

As we mentioned earlier, when in the RCDATA State state, character entities will be decoded by the parser.

Example

textarea#60;script#62;alert(5)#60;/script#62;/textarea

The parser decodes them when it parses

However, the JS inside will not be executed, because the decoded character entity state machine will not enter the tag Open State, so the script inside will not be parsed into HTML tags

5.2.4 JavaScript 解析

Whether a Unicode character escape sequence or Hex encoding like \uXXXX depends on the situation.

First of all, there are three places in JavaScript that can appear Unicode character escape sequences:

In string

When a Unicode escape sequence appears in a string, it is interpreted only as a normal character without destroying the context of the string.

For example, scripttalet('\u0031\u0030');/script

The encoded escaped part is 10, which is a string, which will be decoded normally, and the JS code will be executed.

In the identifier

If the Unicode escape sequence exists in the identifier, that is, the variable name (such as function name, etc.), it will be decoded.

For example, script\u0061\u006c\u0065\u0072\u0074(10);/script

The encoded escaped part is the alert character, which is the function name, which belongs to the identifier, so it will be decoded normally and the JS code will be executed.

Control characters

If a Unicode escape sequence exists in a control character, it will be decoded but not interpreted as a control character, but as part of an identifier or string character.

Control characters are ', ', (), etc.

For example, scripttalet_u0028'xss');/script,( is encoded in Unicode, then after decoding it is no longer used as a control character, but as part of the identifier alert( .

Therefore, control characters such as parentheses of the function cannot be interpreted normally after being escaped by Unicode.

Example

script\u0061\u006c\u0065\u0072\u0074\u0028\u0031\u0031\u0029/script

The encoded part is alert(11). The JS in this example will not be executed because the control characters are encoded.

script\u0061\u006c\u0065\u0072\u0074(\u0031\u0032)/script

The encoded part is alert and 12 in brackets. In this example, JS will not be executed because the encoded part in the brackets cannot be interpreted normally. Either use ASCII numbers, or add '' or ' ' to make it a string, and as a string, it can only be used as a normal character.

scripttalet('13\u0027)/script

Encoded as '. The JS of this example will not be executed because the control character is encoded, and the decoded ' will become part of the string and will no longer be interpreted as the control character. Therefore, the string is incomplete in this example because there is no ' to end the string.

scripttalet('14\u000a')/script

The JS of this example will be executed because the encoded part is in the string and will only be interpreted as normal characters and will not break through the string context.

5.2.5 URL 解析

The URL parser is also modeled as a state machine, and characters in the document input stream can be directed to different states.

First of all, it is important to note that the protocol part of the URL must be ASCII characters, that is, it cannot be encoded in any way, otherwise the state machine of the URL parser will enter the No Scheme state.

Example

a href='%6a%61%76%61%73%63%72%69%70%74:%61%6c%65%72%74%28%31%29'/a

The URL encoding part is javascript:alert(1). JS will not be executed because the javascript string as the Scheme part is encoded, causing the URL parser state machine to enter the No Scheme state.

The : in the URL cannot be encoded in any way, otherwise the state machine of the URL parser will also enter the No Scheme state.

Example

a href='javascript%3aalert(3)'/a

Since : is URL encoded as %3a, the URL state machine enters the No Scheme state and the JS code cannot be executed.

Example

a href='#x6a;#x61;#x76;#x61;#x73;#x63;#x72;#x69;#x70;#x74;%61%6c%65%72%74%28%32%29'

javascript This string is encoded in substance, is not encoded, alert(2) is encoded in URL. Can be executed successfully.

First, in the HTML parser, when the HTML state machine is in the Attribute Value State, the character entity will be decoded. Here is in the href attribute, so the encoded javascript string will be decoded.

Secondly, HTML parsing is before URL parsing, so before URL parsing, the javascript string in the Scheme part has been decoded, and is no longer an entity-encoded state.

5.2.6 解析顺序

First, when the browser receives an HTML document, the HTML parser will be triggered to lexical parsing of the HTML document. This process completes HTML decoding and creates a DOM tree.

Next, the JavaScript parser will intervene in parsing the inline scripts, which completes the decoding of JS.

If the browser encounters a context environment where the URL needs a URL, the URL parser will also intervene in completing the URL decoding work. The decoding order of the URL parser will be different depending on the URL location, and may be parsed before or after the JavaScript parser. HTML parsing is always the first step.

URL parsing and JavaScript parsing, their parsing order depends on the situation.

Example

a href='UserInput'/a

In this example, the character entity decodes the UserInput part by the HTML parser first;

Then the URL parser decodes UserInput; if the Scheme part of the URL is javascript, the JavaScript parser will decode UserInput again. So the parsing order is: HTML parsing-URL parsing-JavaScript parsing.

Example

a href=# onclick='window.open('UserInput')'/a

In this example, the character entity decodes the UserInput part by the HTML parser first;

Then the JavaScript parser will parse the onclick part of the JS and execute JS;

After executing JS, the parameters of the window.open('UserInput') function will be passed into the URL, so the URL parser will decode the UserInput part.

Therefore, the parsing order is: HTML parsing

Quote

Sign In

Title: XSS vulnerability related

Featured Replies

XSS 漏洞相关

1 定义及原理

2 反射型 XSS

2.1 原理

3 存储型 XSS

3.1 原理

4 DOM 型 XSS

4.1 原理

4.1.1 DOM

4.1.2 DOM XSS

4.1.3 示例

4.1.4 与反射型 XSS 的异同与危害

5 伪协议与编码绕过

5.1 伪协议

5.2 编码绕过

5.2.1 UNICODE 编码

5.2.2 浏览器解码

5.2.3 HTML 解析过程

5.2.3.1 解析过程

5.2.3.2 几种特殊情况

5.2.4 JavaScript 解析

5.2.5 URL 解析

5.2.6 解析顺序

Join the conversation

Important Information

Account

Navigation

Search