Basics
Unlike the combination of HTML and CSS, XSL-FO is a unified presentational language. It has no semantic markup in the way it is meant in HTML. And, unlike CSS which modifies the default presentation of an external XML or HTML document, it stores all of the document's data within itself.
The general idea behind XSL-FO's use is that the user writes a document, not in FO, but in an XML language. XHTML, DocBook, and TEI are all possible examples. Then, the user obtains an XSLT transform, either by writing one themselves or by finding one for the document type in question. This XSLT transform converts the XML into XSL-FO.
Once the XSL-FO document is generated, it is then passed to an application called an FO processor. FO processors convert the XSL-FO document into something that is readable, printable or both. The most common output of XSL-FO is a PDF file or as PS, but some FO processors can output to other formats like RTF files or even just a window in the user's GUI displaying the sequence of pages and their contents.
The XSLT language itself was originally conceived only for this purpose; it is now in widespread use for more general XML transformations. This transformation step is taken so much for granted in XSL-FO that it is not uncommon for people to call the XSLT that turns XML into XSL-FO the actual XSL-FO document itself. Even tutorials on XSL-FO tend to be written with XSLT commands around the FO processing instructions.
The XSLT transformation step is exceptionally powerful. It allows for the automatic generation of a table of contents, linked references, an index, and various other possibilities.
An XSL-FO document is not like a PDF or a PostScript document. It does not definitively describe the layout of the text on various pages. Instead, it describes what the pages look like and where the various contents go. From there, an FO processor determines how to position the text within the boundaries described by the FO document. The XSL-FO specification even allows different FO processors to have varying responses with regard to the resultant generated pages.
For example, some FO processors can hyphenate words to minimize space when breaking a line, while others choose not to. Different processors may even use different hyphenation algorithms, ranging from very simple to more complex hyphenation algorithms that take into account whether the previous or next line also is hyphenated. These will change, in some borderline cases quite substantially, the layout of the various pages. There are other cases where the XSL-FO specification explicitly allows FO processors some degree of choice with regard to layout.
This differentiation between FO processors, creating inconsistent results between processors is often not a concern. This is because the general purpose behind XSL-FO is to generate paged, printed media. XSL-FO documents themselves are usually used as intermediaries, mostly to generate either PDF files or a printed document as the final form to be distributed. This is as opposed to how HTML is generated and distributed as a final form directly to the user. Distributing the final PDF rather than the formatting language input (whether HTML/CSS or XSL-FO) means on the one hand that recipients aren't affected by the unpredictability resulting from differences among formatting language interpreters, while on the other hand means that the document cannot easily adapt to different recipient needs, such as different page size or preferred font size, or tailoring for on-screen versus on-paper versus audio presentation.
Read more about this topic: XSL Formatting Objects